diff --git a/.github/workflows/check-llms-files.yml b/.github/workflows/check-llms-files.yml new file mode 100644 index 00000000..fd5d798d --- /dev/null +++ b/.github/workflows/check-llms-files.yml @@ -0,0 +1,17 @@ +name: Verify llms context files + +on: + pull_request: + workflow_dispatch: + +jobs: + verify: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Regenerate llms.txt and llms-full.txt + run: python3 scripts/generate-llms-files.py + + - name: Ensure committed llms files are up-to-date + run: git diff --exit-code llms.txt llms-full.txt diff --git a/AGENTS.md b/AGENTS.md index 022e2e0e..b4581fa1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -25,6 +25,29 @@ The site is built with **Mintlify** and deployed automatically by Mintlify on pu - `.agents/skills/` — prompt extensions for agents editing this repo (legacy: `.openhands/skills/`; formerly `microagents`) - `tests/` — pytest checks for docs consistency (notably LLM pricing docs) + +## llms.txt / llms-full.txt (V1-only) + +Mintlify auto-generates `/llms.txt` and `/llms-full.txt`, but this repo **overrides** them by committing +`llms.txt` and `llms-full.txt` at the repo root. + +We do this so LLMs get **V1-only** context while legacy V0 pages remain available for humans. + +- Generator script: `scripts/generate-llms-files.py` +- Regenerate (recommended): + ```bash + make llms + ``` + Or directly: + ```bash + python3 scripts/generate-llms-files.py + ``` +- Verify they are up-to-date: + ```bash + make llms-check + ``` +- Exclusions: `openhands/usage/v0/` and any `V0*`-prefixed page files. + ## Local development ### Preview the site diff --git a/Makefile b/Makefile new file mode 100644 index 00000000..76599445 --- /dev/null +++ b/Makefile @@ -0,0 +1,12 @@ +.PHONY: llms llms-check + +# Regenerate the Mintlify llms context files (V1-only override). +# +# See: scripts/generate-llms-files.py +llms: + python3 scripts/generate-llms-files.py + +# Regenerate and fail if llms files changed (useful for local verification). +llms-check: + python3 scripts/generate-llms-files.py + git diff --exit-code llms.txt llms-full.txt diff --git a/llms-full.txt b/llms-full.txt new file mode 100644 index 00000000..509b3bfd --- /dev/null +++ b/llms-full.txt @@ -0,0 +1,33409 @@ +# OpenHands Docs + +> Consolidated documentation context for LLMs (V1-only). Legacy V0 docs pages are intentionally excluded. + +## OpenHands Software Agent SDK + +### Software Agent SDK +Source: https://docs.openhands.dev/sdk.md + +The OpenHands Software Agent SDK is a set of Python and REST APIs for building **agents that work with code**. + +You can use the OpenHands Software Agent SDK for: + +- One-off tasks, like building a README for your repo +- Routine maintenance tasks, like updating dependencies +- Major tasks that involve multiple agents, like refactors and rewrites + +You can even use the SDK to build new developer experiences—it’s the engine behind the [OpenHands CLI](/openhands/usage/cli/quick-start) and [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + +Get started with some examples or keep reading to learn more. + +## Features + + + + A unified Python API that enables you to run agents locally or in the cloud, define custom agent behaviors, and create custom tools. + + + Ready-to-use tools for executing Bash commands, editing files, browsing the web, integrating with MCP, and more. + + + A production-ready server that runs agents anywhere, including Docker and Kubernetes, while connecting seamlessly to the Python API. + + + +## Why OpenHands Software Agent SDK? + +### Emphasis on coding + +While other agent SDKs (e.g. [LangChain](https://python.langchain.com/docs/tutorials/agents/)) are focused on more general use cases, like delivering chat-based support or automating back-office tasks, OpenHands is purpose-built for software engineering. + +While some folks do use OpenHands to solve more general tasks (code is a powerful tool!), most of us use OpenHands to work with code. + +### State-of-the-Art Performance + +OpenHands is a top performer across a wide variety of benchmarks, including SWE-bench, SWT-bench, and multi-SWE-bench. The SDK includes a number of state-of-the-art agentic features developed by our research team, including: + +- Task planning and decomposition +- Automatic context compression +- Security analysis +- Strong agent-computer interfaces + +OpenHands has attracted researchers from a wide variety of academic institutions, and is [becoming the preferred harness](https://x.com/Alibaba_Qwen/status/1947766835023335516) for evaluating LLMs on coding tasks. + +### Free and Open Source + +OpenHands is also the leading open source framework for coding agents. It’s MIT-licensed, and can work with any LLM—including big proprietary LLMs like Claude and OpenAI, as well as open source LLMs like Qwen and Devstral. + +Other SDKs (e.g. [Claude Code](https://github.com/anthropics/claude-agent-sdk-python)) are proprietary and lock you into a particular model. Given how quickly models are evolving, it’s best to stay model-agnostic! + +## Get Started + + + + Install the SDK, run your first agent, and explore the guides. + + + +## Learn the SDK + + + + Understand the SDK's architecture: agents, tools, workspaces, and more. + + + Explore the complete SDK API and source code. + + + +## Build with Examples + + + + Build local agents with custom tools and capabilities. + + + Run agents on remote servers with Docker sandboxing. + + + Automate repository tasks with agent-powered workflows. + + + +## Community + + + + Connect with the OpenHands community on Slack. + + + Contribute to the SDK or report issues on GitHub. + + + +### openhands.sdk.agent +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.agent.md + +### class Agent + +Bases: `CriticMixin`, [`AgentBase`](#class-agentbase) + +Main agent implementation for OpenHands. + +The Agent class provides the core functionality for running AI agents that can +interact with tools, process messages, and execute actions. It inherits from +AgentBase and implements the agent execution logic. Critic-related functionality +is provided by CriticMixin. + +#### Example + +```pycon +>>> from openhands.sdk import LLM, Agent, Tool +>>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) +>>> tools = [Tool(name="TerminalTool"), Tool(name="FileEditorTool")] +>>> agent = Agent(llm=llm, tools=tools) +``` + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### init_state() + +Initialize conversation state. + +Invariants enforced by this method: +- If a SystemPromptEvent is already present, it must be within the first 3 + + events (index 0 or 1 in practice; index 2 is included in the scan window + to detect a user message appearing before the system prompt). +- A user MessageEvent should not appear before the SystemPromptEvent. + +These invariants keep event ordering predictable for downstream components +(condenser, UI, etc.) and also prevent accidentally materializing the full +event history during initialization. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### step() + +Taking a step in the conversation. + +Typically this involves: +1. Making a LLM call +2. Executing the tool +3. Updating the conversation state with + + LLM calls (role=”assistant”) and tool results (role=”tool”) + +4.1 If conversation is finished, set state.execution_status to FINISHED +4.2 Otherwise, just return, Conversation will kick off the next step + +If the underlying LLM supports streaming, partial deltas are forwarded to +`on_token` before the full response is returned. + +NOTE: state will be mutated in-place. + +### class AgentBase + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Abstract base class for OpenHands agents. + +Agents are stateless and should be fully defined by their configuration. +This base class provides the common interface and functionality that all +agent implementations must follow. + + +#### Properties + +- `agent_context`: AgentContext | None +- `condenser`: CondenserBase | None +- `critic`: CriticBase | None +- `dynamic_context`: str | None + Get the dynamic per-conversation context. + This returns the context that varies between conversations, such as: + - Repository information and skills + - Runtime information (hosts, working directory) + - User-specific secrets and settings + - Conversation instructions + This content should NOT be included in the cached system prompt to enable + cross-conversation cache sharing. Instead, it is sent as a second content + block (without a cache marker) inside the system message. + * Returns: + The dynamic context string, or None if no context is configured. +- `filter_tools_regex`: str | None +- `include_default_tools`: list[str] +- `llm`: LLM +- `mcp_config`: dict[str, Any] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: str + Returns the name of the Agent. +- `prompt_dir`: str + Returns the directory where this class’s module file is located. +- `security_policy_filename`: str +- `static_system_message`: str + Compute the static portion of the system message. + This returns only the base system prompt template without any dynamic + per-conversation context. This static portion can be cached and reused + across conversations for better prompt caching efficiency. + * Returns: + The rendered system prompt template without dynamic context. +- `system_message`: str + Return the combined system message (static + dynamic). +- `system_prompt_filename`: str +- `system_prompt_kwargs`: dict[str, object] +- `tools`: list[Tool] +- `tools_map`: dictstr, [ToolDefinition] + Get the initialized tools map. + :raises RuntimeError: If the agent has not been initialized. + +#### Methods + +#### get_all_llms() + +Recursively yield unique base-class LLM objects reachable from self. + +- Returns actual object references (not copies). +- De-dupes by id(LLM). +- Cycle-safe via a visited set for all traversed objects. +- Only yields objects whose type is exactly LLM (no subclasses). +- Does not handle dataclasses. + +#### init_state() + +Initialize the empty conversation state to prepare the agent for user +messages. + +Typically this involves adding system message + +NOTE: state will be mutated in-place. + +#### model_dump_succint() + +Like model_dump, but excludes None fields by default. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### abstractmethod step() + +Taking a step in the conversation. + +Typically this involves: +1. Making a LLM call +2. Executing the tool +3. Updating the conversation state with + + LLM calls (role=”assistant”) and tool results (role=”tool”) + +4.1 If conversation is finished, set state.execution_status to FINISHED +4.2 Otherwise, just return, Conversation will kick off the next step + +If the underlying LLM supports streaming, partial deltas are forwarded to +`on_token` before the full response is returned. + +NOTE: state will be mutated in-place. + +#### Deprecated +Deprecated since version 1.11.0: Use [`static_system_message`](#class-static_system_message) for the cacheable system prompt and +[`dynamic_context`](#class-dynamic_context) for per-conversation content. This separation +enables cross-conversation prompt caching. Will be removed in 1.16.0. + +#### WARNING +Using this property DISABLES cross-conversation prompt caching because +it combines static and dynamic content into a single string. Use +[`static_system_message`](#class-static_system_message) and [`dynamic_context`](#class-dynamic_context) separately +to enable caching. + +#### Deprecated +Deprecated since version 1.11.0: This will be removed in 1.16.0. Use static_system_message for the cacheable system prompt and dynamic_context for per-conversation content. Using system_message DISABLES cross-conversation prompt caching because it combines static and dynamic content into a single string. + +#### verify() + +Verify that we can resume this agent from persisted state. + +We do not merge configuration between persisted and runtime Agent +instances. Instead, we verify compatibility requirements and then +continue with the runtime-provided Agent. + +Compatibility requirements: +- Agent class/type must match. +- Tools must match exactly (same tool names). + +Tools are part of the system prompt and cannot be changed mid-conversation. +To use different tools, start a new conversation or use conversation forking +(see [https://github.com/OpenHands/OpenHands/issues/8560](https://github.com/OpenHands/OpenHands/issues/8560)). + +All other configuration (LLM, agent_context, condenser, etc.) can be +freely changed between sessions. + +* Parameters: + * `persisted` – The agent loaded from persisted state. + * `events` – Unused, kept for API compatibility. +* Returns: + This runtime agent (self) if verification passes. +* Raises: + `ValueError` – If agent class or tools don’t match. + +### openhands.sdk.conversation +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.conversation.md + +### class BaseConversation + +Bases: `ABC` + +Abstract base class for conversation implementations. + +This class defines the interface that all conversation implementations must follow. +Conversations manage the interaction between users and agents, handling message +exchange, execution control, and state management. + + +#### Properties + +- `confirmation_policy_active`: bool +- `conversation_stats`: ConversationStats +- `id`: UUID +- `is_confirmation_mode_active`: bool + Check if confirmation mode is active. + Returns True if BOTH conditions are met: + 1. The conversation state has a security analyzer set (not None) + 2. The confirmation policy is active +- `state`: ConversationStateProtocol + +#### Methods + +#### __init__() + +Initialize the base conversation with span tracking. + +#### abstractmethod ask_agent() + +Ask the agent a simple, stateless question and get a direct LLM response. + +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. + +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent + +#### abstractmethod close() + +#### static compose_callbacks() + +Compose multiple callbacks into a single callback function. + +* Parameters: + `callbacks` – An iterable of callback functions +* Returns: + A single callback function that calls all provided callbacks + +#### abstractmethod condense() + +Force condensation of the conversation history. + +This method uses the existing condensation request pattern to trigger +condensation. It adds a CondensationRequest event to the conversation +and forces the agent to take a single step to process it. + +The condensation will be applied immediately and will modify the conversation +state by adding a condensation event to the history. + +* Raises: + `ValueError` – If no condenser is configured or the condenser doesn’t + handle condensation requests. + +#### abstractmethod execute_tool() + +Execute a tool directly without going through the agent loop. + +This method allows executing tools before or outside of the normal +conversation.run() flow. It handles agent initialization automatically, +so tools can be executed before the first run() call. + +Note: This method bypasses the agent loop, including confirmation +policies and security analyzer checks. Callers are responsible for +applying any safeguards before executing potentially destructive tools. + +This is useful for: +- Pre-run setup operations (e.g., indexing repositories) +- Manual tool execution for environment setup +- Testing tool behavior outside the agent loop + +* Parameters: + * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) + * `action` – The action to pass to the tool executor +* Returns: + The observation returned by the tool execution +* Raises: + * `KeyError` – If the tool is not found in the agent’s tools + * `NotImplementedError` – If the tool has no executor + +#### abstractmethod generate_title() + +Generate a title for the conversation based on the first user message. + +* Parameters: + * `llm` – Optional LLM to use for title generation. If not provided, + uses the agent’s LLM. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. +* Raises: + `ValueError` – If no user messages are found in the conversation. + +#### static get_persistence_dir() + +Get the persistence directory for the conversation. + +* Parameters: + * `persistence_base_dir` – Base directory for persistence. Can be a string + path or Path object. + * `conversation_id` – Unique conversation ID. +* Returns: + String path to the conversation-specific persistence directory. + Always returns a normalized string path even if a Path was provided. + +#### abstractmethod pause() + +#### abstractmethod reject_pending_actions() + +#### abstractmethod run() + +Execute the agent to process messages and perform actions. + +This method runs the agent until it finishes processing the current +message or reaches the maximum iteration limit. + +#### abstractmethod send_message() + +Send a message to the agent. + +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. + +#### abstractmethod set_confirmation_policy() + +Set the confirmation policy for the conversation. + +#### abstractmethod set_security_analyzer() + +Set the security analyzer for the conversation. + +#### abstractmethod update_secrets() + +### class Conversation + +### class Conversation + +Bases: `object` + +Factory class for creating conversation instances with OpenHands agents. + +This factory automatically creates either a LocalConversation or RemoteConversation +based on the workspace type provided. LocalConversation runs the agent locally, +while RemoteConversation connects to a remote agent server. + +* Returns: + LocalConversation if workspace is local, RemoteConversation if workspace + is remote. + +#### Example + +```pycon +>>> from openhands.sdk import LLM, Agent, Conversation +>>> from openhands.sdk.plugin import PluginSource +>>> llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("key")) +>>> agent = Agent(llm=llm, tools=[]) +>>> conversation = Conversation( +... agent=agent, +... workspace="./workspace", +... plugins=[PluginSource(source="github:org/security-plugin", ref="v1.0")], +... ) +>>> conversation.send_message("Hello!") +>>> conversation.run() +``` + +### class ConversationExecutionStatus + +Bases: `str`, `Enum` + +Enum representing the current execution state of the conversation. + +#### Methods + +#### DELETING = 'deleting' + +#### ERROR = 'error' + +#### FINISHED = 'finished' + +#### IDLE = 'idle' + +#### PAUSED = 'paused' + +#### RUNNING = 'running' + +#### STUCK = 'stuck' + +#### WAITING_FOR_CONFIRMATION = 'waiting_for_confirmation' + +#### is_terminal() + +Check if this status represents a terminal state. + +Terminal states indicate the run has completed and the agent is no longer +actively processing. These are: FINISHED, ERROR, STUCK. + +Note: IDLE is NOT a terminal state - it’s the initial state of a conversation +before any run has started. Including IDLE would cause false positives when +the WebSocket delivers the initial state update during connection. + +* Returns: + True if this is a terminal status, False otherwise. + +### class ConversationState + +Bases: `OpenHandsModel` + + +#### Properties + +- `activated_knowledge_skills`: list[str] +- `agent`: AgentBase +- `agent_state`: dict[str, Any] +- `blocked_actions`: dict[str, str] +- `blocked_messages`: dict[str, str] +- `confirmation_policy`: ConfirmationPolicyBase +- `env_observation_persistence_dir`: str | None + Directory for persisting environment observation files. +- `events`: [EventLog](#class-eventlog) +- `execution_status`: [ConversationExecutionStatus](#class-conversationexecutionstatus) +- `id`: UUID +- `max_iterations`: int +- `persistence_dir`: str | None +- `secret_registry`: [SecretRegistry](#class-secretregistry) +- `security_analyzer`: SecurityAnalyzerBase | None +- `stats`: ConversationStats +- `stuck_detection`: bool +- `workspace`: BaseWorkspace + +#### Methods + +#### acquire() + +Acquire the lock. + +* Parameters: + * `blocking` – If True, block until lock is acquired. If False, return + immediately. + * `timeout` – Maximum time to wait for lock (ignored if blocking=False). + -1 means wait indefinitely. +* Returns: + True if lock was acquired, False otherwise. + +#### block_action() + +Persistently record a hook-blocked action. + +#### block_message() + +Persistently record a hook-blocked user message. + +#### classmethod create() + +Create a new conversation state or resume from persistence. + +This factory method handles both new conversation creation and resumption +from persisted state. + +New conversation: +The provided Agent is used directly. Pydantic validation happens via the +cls() constructor. + +Restored conversation: +The provided Agent is validated against the persisted agent using +agent.load(). Tools must match (they may have been used in conversation +history), but all other configuration can be freely changed: LLM, +agent_context, condenser, system prompts, etc. + +* Parameters: + * `id` – Unique conversation identifier + * `agent` – The Agent to use (tools must match persisted on restore) + * `workspace` – Working directory for agent operations + * `persistence_dir` – Directory for persisting state and events + * `max_iterations` – Maximum iterations per run + * `stuck_detection` – Whether to enable stuck detection + * `cipher` – Optional cipher for encrypting/decrypting secrets in + persisted state. If provided, secrets are encrypted when + saving and decrypted when loading. If not provided, secrets + are redacted (lost) on serialization. +* Returns: + ConversationState ready for use +* Raises: + * `ValueError` – If conversation ID or tools mismatch on restore + * `ValidationError` – If agent or other fields fail Pydantic validation + +#### static get_unmatched_actions() + +Find actions in the event history that don’t have matching observations. + +This method identifies ActionEvents that don’t have corresponding +ObservationEvents or UserRejectObservations, which typically indicates +actions that are pending confirmation or execution. + +* Parameters: + `events` – List of events to search through +* Returns: + List of ActionEvent objects that don’t have corresponding observations, + in chronological order + +#### locked() + +Return True if the lock is currently held by any thread. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### owned() + +Return True if the lock is currently held by the calling thread. + +#### pop_blocked_action() + +Remove and return a hook-blocked action reason, if present. + +#### pop_blocked_message() + +Remove and return a hook-blocked message reason, if present. + +#### release() + +Release the lock. + +* Raises: + `RuntimeError` – If the current thread doesn’t own the lock. + +#### set_on_state_change() + +Set a callback to be called when state changes. + +* Parameters: + `callback` – A function that takes an Event (ConversationStateUpdateEvent) + or None to remove the callback + +### class ConversationVisualizerBase + +Bases: `ABC` + +Base class for conversation visualizers. + +This abstract base class defines the interface that all conversation visualizers +must implement. Visualizers can be created before the Conversation is initialized +and will be configured with the conversation state automatically. + +The typical usage pattern: +1. Create a visualizer instance: + + viz = MyVisualizer() +1. Pass it to Conversation: conv = Conversation(agent, visualizer=viz) +2. Conversation automatically calls viz.initialize(state) to attach the state + +You can also pass the uninstantiated class if you don’t need extra args +: for initialization, and Conversation will create it: + : conv = Conversation(agent, visualizer=MyVisualizer) + +Conversation will then calls MyVisualizer() followed by initialize(state) + + +#### Properties + +- `conversation_stats`: ConversationStats | None + Get conversation stats from the state. + +#### Methods + +#### __init__() + +Initialize the visualizer base. + +#### create_sub_visualizer() + +Create a visualizer for a sub-agent during delegation. + +Override this method to support sub-agent visualization in multi-agent +delegation scenarios. The sub-visualizer will be used to display events +from the spawned sub-agent. + +By default, returns None which means sub-agents will not have visualization. +Subclasses that support delegation (like DelegationVisualizer) should +override this method to create appropriate sub-visualizers. + +* Parameters: + `agent_id` – The identifier of the sub-agent being spawned +* Returns: + A visualizer instance for the sub-agent, or None if sub-agent + visualization is not supported + +#### final initialize() + +Initialize the visualizer with conversation state. + +This method is called by Conversation after the state is created, +allowing the visualizer to access conversation stats and other +state information. + +Subclasses should not override this method, to ensure the state is set. + +* Parameters: + `state` – The conversation state object + +#### abstractmethod on_event() + +Handle a conversation event. + +This method is called for each event in the conversation and should +implement the visualization logic. + +* Parameters: + `event` – The event to visualize + +### class DefaultConversationVisualizer + +Bases: [`ConversationVisualizerBase`](#class-conversationvisualizerbase) + +Handles visualization of conversation events with Rich formatting. + +Provides Rich-formatted output with semantic dividers and complete content display. + +#### Methods + +#### __init__() + +Initialize the visualizer. + +* Parameters: + * `highlight_regex` – Dictionary mapping regex patterns to Rich color styles + for highlighting keywords in the visualizer. + For example: (configuration object) + * `skip_user_messages` – If True, skip displaying user messages. Useful for + scenarios where user input is not relevant to show. + +#### on_event() + +Main event handler that displays events with Rich formatting. + +### class EventLog + +Bases: [`EventsListBase`](#class-eventslistbase) + +Persistent event log with locking for concurrent writes. + +This class provides thread-safe and process-safe event storage using +the FileStore’s locking mechanism. Events are persisted to disk and +can be accessed by index or event ID. + +#### Methods + +#### NOTE +For LocalFileStore, file locking via flock() does NOT work reliably +on NFS mounts or network filesystems. Users deploying with shared +storage should use alternative coordination mechanisms. + +#### __init__() + +#### append() + +Append an event with locking for thread/process safety. + +* Raises: + * `TimeoutError` – If the lock cannot be acquired within LOCK_TIMEOUT_SECONDS. + * `ValueError` – If an event with the same ID already exists. + +#### get_id() + +Return the event_id for a given index. + +#### get_index() + +Return the integer index for a given event_id. + +### class EventsListBase + +Bases: `Sequence`[`Event`], `ABC` + +Abstract base class for event lists that can be appended to. + +This provides a common interface for both local EventLog and remote +RemoteEventsList implementations, avoiding circular imports in protocols. + +#### Methods + +#### abstractmethod append() + +Add a new event to the list. + +### class LocalConversation + +Bases: [`BaseConversation`](#class-baseconversation) + + +#### Properties + +- `agent`: AgentBase +- `delete_on_close`: bool = True +- `id`: UUID + Get the unique ID of the conversation. +- `llm_registry`: LLMRegistry +- `max_iteration_per_run`: int +- `resolved_plugins`: list[ResolvedPluginSource] | None + Get the resolved plugin sources after plugins are loaded. + Returns None if plugins haven’t been loaded yet, or if no plugins + were specified. Use this for persistence to ensure conversation + resume uses the exact same plugin versions. +- `state`: [ConversationState](#class-conversationstate) + Get the conversation state. + It returns a protocol that has a subset of ConversationState methods + and properties. We will have the ability to access the same properties + of ConversationState on a remote conversation object. + But we won’t be able to access methods that mutate the state. +- `stuck_detector`: [StuckDetector](#class-stuckdetector) | None + Get the stuck detector instance if enabled. +- `workspace`: LocalWorkspace + +#### Methods + +#### __init__() + +Initialize the conversation. + +* Parameters: + * `agent` – The agent to use for the conversation. + * `workspace` – Working directory for agent operations and tool execution. + Can be a string path, Path object, or LocalWorkspace instance. + * `plugins` – Optional list of plugins to load. Each plugin is specified + with a source (github:owner/repo, git URL, or local path), + optional ref (branch/tag/commit), and optional repo_path for + monorepos. Plugins are loaded in order with these merge + semantics: skills override by name (last wins), MCP config + override by key (last wins), hooks concatenate (all run). + * `persistence_dir` – Directory for persisting conversation state and events. + Can be a string path or Path object. + * `conversation_id` – Optional ID for the conversation. If provided, will + be used to identify the conversation. The user might want to + suffix their persistent filestore with this ID. + * `callbacks` – Optional list of callback functions to handle events + * `token_callbacks` – Optional list of callbacks invoked for streaming deltas + * `hook_config` – Optional hook configuration to auto-wire session hooks. + If plugins are loaded, their hooks are combined with this config. + * `max_iteration_per_run` – Maximum number of iterations per run + * `visualizer` – + + Visualization configuration. Can be: + - ConversationVisualizerBase subclass: Class to instantiate + > (default: ConversationVisualizer) + - ConversationVisualizerBase instance: Use custom visualizer + - None: No visualization + * `stuck_detection` – Whether to enable stuck detection + * `stuck_detection_thresholds` – Optional configuration for stuck detection + thresholds. Can be a StuckDetectionThresholds instance or + a dict with keys: ‘action_observation’, ‘action_error’, + ‘monologue’, ‘alternating_pattern’. Values are integers + representing the number of repetitions before triggering. + * `cipher` – Optional cipher for encrypting/decrypting secrets in persisted + state. If provided, secrets are encrypted when saving and + decrypted when loading. If not provided, secrets are redacted + (lost) on serialization. + +#### ask_agent() + +Ask the agent a simple, stateless question and get a direct LLM response. + +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. + +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent + +#### close() + +Close the conversation and clean up all tool executors. + +#### condense() + +Synchronously force condense the conversation history. + +If the agent is currently running, condense() will wait for the +ongoing step to finish before proceeding. + +Raises ValueError if no compatible condenser exists. + +#### property conversation_stats + +#### execute_tool() + +Execute a tool directly without going through the agent loop. + +This method allows executing tools before or outside of the normal +conversation.run() flow. It handles agent initialization automatically, +so tools can be executed before the first run() call. + +Note: This method bypasses the agent loop, including confirmation +policies and security analyzer checks. Callers are responsible for +applying any safeguards before executing potentially destructive tools. + +This is useful for: +- Pre-run setup operations (e.g., indexing repositories) +- Manual tool execution for environment setup +- Testing tool behavior outside the agent loop + +* Parameters: + * `tool_name` – The name of the tool to execute (e.g., “sleeptime_compute”) + * `action` – The action to pass to the tool executor +* Returns: + The observation returned by the tool execution +* Raises: + * `KeyError` – If the tool is not found in the agent’s tools + * `NotImplementedError` – If the tool has no executor + +#### generate_title() + +Generate a title for the conversation based on the first user message. + +* Parameters: + * `llm` – Optional LLM to use for title generation. If not provided, + uses self.agent.llm. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. +* Raises: + `ValueError` – If no user messages are found in the conversation. + +#### pause() + +Pause agent execution. + +This method can be called from any thread to request that the agent +pause execution. The pause will take effect at the next iteration +of the run loop (between agent steps). + +Note: If called during an LLM completion, the pause will not take +effect until the current LLM call completes. + +#### reject_pending_actions() + +Reject all pending actions from the agent. + +This is a non-invasive method to reject actions between run() calls. +Also clears the agent_waiting_for_confirmation flag. + +#### run() + +Runs the conversation until the agent finishes. + +In confirmation mode: +- First call: creates actions but doesn’t execute them, stops and waits +- Second call: executes pending actions (implicit confirmation) + +In normal mode: +- Creates and executes actions immediately + +Can be paused between steps + +#### send_message() + +Send a message to the agent. + +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. + +#### set_confirmation_policy() + +Set the confirmation policy and store it in conversation state. + +#### set_security_analyzer() + +Set the security analyzer for the conversation. + +#### update_secrets() + +Add secrets to the conversation. + +* Parameters: + `secrets` – Dictionary mapping secret keys to values or no-arg callables. + SecretValue = str | Callable[[], str]. Callables are invoked lazily + when a command references the secret key. + +### class RemoteConversation + +Bases: [`BaseConversation`](#class-baseconversation) + + +#### Properties + +- `agent`: AgentBase +- `delete_on_close`: bool = False +- `id`: UUID +- `max_iteration_per_run`: int +- `state`: RemoteState + Access to remote conversation state. +- `workspace`: RemoteWorkspace + +#### Methods + +#### __init__() + +Remote conversation proxy that talks to an agent server. + +* Parameters: + * `agent` – Agent configuration (will be sent to the server) + * `workspace` – The working directory for agent operations and tool execution. + * `plugins` – Optional list of plugins to load on the server. Each plugin + is a PluginSource specifying source, ref, and repo_path. + * `conversation_id` – Optional existing conversation id to attach to + * `callbacks` – Optional callbacks to receive events (not yet streamed) + * `max_iteration_per_run` – Max iterations configured on server + * `stuck_detection` – Whether to enable stuck detection on server + * `stuck_detection_thresholds` – Optional configuration for stuck detection + thresholds. Can be a StuckDetectionThresholds instance or + a dict with keys: ‘action_observation’, ‘action_error’, + ‘monologue’, ‘alternating_pattern’. Values are integers + representing the number of repetitions before triggering. + * `hook_config` – Optional hook configuration for session hooks + * `visualizer` – + + Visualization configuration. Can be: + - ConversationVisualizerBase subclass: Class to instantiate + > (default: ConversationVisualizer) + - ConversationVisualizerBase instance: Use custom visualizer + - None: No visualization + * `secrets` – Optional secrets to initialize the conversation with + +#### ask_agent() + +Ask the agent a simple, stateless question and get a direct LLM response. + +This bypasses the normal conversation flow and does not modify, persist, +or become part of the conversation state. The request is not remembered by +the main agent, no events are recorded, and execution status is untouched. +It is also thread-safe and may be called while conversation.run() is +executing in another thread. + +* Parameters: + `question` – A simple string question to ask the agent +* Returns: + A string response from the agent + +#### close() + +Close the conversation and clean up resources. + +Note: We don’t close self._client here because it’s shared with the workspace. +The workspace owns the client and will close it during its own cleanup. +Closing it here would prevent the workspace from making cleanup API calls. + +#### condense() + +Force condensation of the conversation history. + +This method sends a condensation request to the remote agent server. +The server will use the existing condensation request pattern to trigger +condensation if a condenser is configured and handles condensation requests. + +The condensation will be applied on the server side and will modify the +conversation state by adding a condensation event to the history. + +* Raises: + `HTTPError` – If the server returns an error (e.g., no condenser configured). + +#### property conversation_stats + +#### execute_tool() + +Execute a tool directly without going through the agent loop. + +Note: This method is not yet supported for RemoteConversation. +Tool execution for remote conversations happens on the server side +during the normal agent loop. + +* Parameters: + * `tool_name` – The name of the tool to execute + * `action` – The action to pass to the tool executor +* Raises: + `NotImplementedError` – Always, as this feature is not yet supported + for remote conversations. + +#### generate_title() + +Generate a title for the conversation based on the first user message. + +* Parameters: + * `llm` – Optional LLM to use for title generation. If provided, its usage_id + will be sent to the server. If not provided, uses the agent’s LLM. + * `max_length` – Maximum length of the generated title. +* Returns: + A generated title for the conversation. + +#### pause() + +#### reject_pending_actions() + +#### run() + +Trigger a run on the server. + +* Parameters: + * `blocking` – If True (default), wait for the run to complete by polling + the server. If False, return immediately after triggering the run. + * `poll_interval` – Time in seconds between status polls (only used when + blocking=True). Default is 1.0 second. + * `timeout` – Maximum time in seconds to wait for the run to complete + (only used when blocking=True). Default is 3600 seconds. +* Raises: + `ConversationRunError` – If the run fails or times out. + +#### send_message() + +Send a message to the agent. + +* Parameters: + * `message` – Either a string (which will be converted to a user message) + or a Message object + * `sender` – Optional identifier of the sender. Can be used to track + message origin in multi-agent scenarios. For example, when + one agent delegates to another, the sender can be set to + identify which agent is sending the message. + +#### set_confirmation_policy() + +Set the confirmation policy for the conversation. + +#### set_security_analyzer() + +Set the security analyzer for the remote conversation. + +#### property stuck_detector + +Stuck detector for compatibility. +Not implemented for remote conversations. + +#### update_secrets() + +### class SecretRegistry + +Bases: `OpenHandsModel` + +Manages secrets and injects them into bash commands when needed. + +The secret registry stores a mapping of secret keys to SecretSources +that retrieve the actual secret values. When a bash command is about to be +executed, it scans the command for any secret keys and injects the corresponding +environment variables. + +Secret sources will redact / encrypt their sensitive values as appropriate when +serializing, depending on the content of the context. If a context is present +and contains a ‘cipher’ object, this is used for encryption. If it contains a +boolean ‘expose_secrets’ flag set to True, secrets are dunped in plain text. +Otherwise secrets are redacted. + +Additionally, it tracks the latest exported values to enable consistent masking +even when callable secrets fail on subsequent calls. + + +#### Properties + +- `secret_sources`: dict[str, SecretSource] + +#### Methods + +#### find_secrets_in_text() + +Find all secret keys mentioned in the given text. + +* Parameters: + `text` – The text to search for secret keys +* Returns: + Set of secret keys found in the text + +#### get_secrets_as_env_vars() + +Get secrets that should be exported as environment variables for a command. + +* Parameters: + `command` – The bash command to check for secret references +* Returns: + Dictionary of environment variables to export (key -> value) + +#### mask_secrets_in_output() + +Mask secret values in the given text. + +This method uses both the current exported values and attempts to get +fresh values from callables to ensure comprehensive masking. + +* Parameters: + `text` – The text to mask secrets in +* Returns: + Text with secret values replaced by `` + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### update_secrets() + +Add or update secrets in the manager. + +* Parameters: + `secrets` – Dictionary mapping secret keys to either string values + or callable functions that return string values + +### class StuckDetector + +Bases: `object` + +Detects when an agent is stuck in repetitive or unproductive patterns. + +This detector analyzes the conversation history to identify various stuck patterns: +1. Repeating action-observation cycles +2. Repeating action-error cycles +3. Agent monologue (repeated messages without user input) +4. Repeating alternating action-observation patterns +5. Context window errors indicating memory issues + + +#### Properties + +- `action_error_threshold`: int +- `action_observation_threshold`: int +- `alternating_pattern_threshold`: int +- `monologue_threshold`: int +- `state`: [ConversationState](#class-conversationstate) +- `thresholds`: StuckDetectionThresholds + +#### Methods + +#### __init__() + +#### is_stuck() + +Check if the agent is currently stuck. + +Note: To avoid materializing potentially large file-backed event histories, +only the last MAX_EVENTS_TO_SCAN_FOR_STUCK_DETECTION events are analyzed. +If a user message exists within this window, only events after it are checked. +Otherwise, all events in the window are analyzed. + +#### __init__() + +### openhands.sdk.event +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.event.md + +### class ActionEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + + +#### Properties + +- `action`: Action | None +- `critic_result`: CriticResult | None +- `llm_response_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `reasoning_content`: str | None +- `responses_reasoning_item`: ReasoningItemModel | None +- `security_risk`: SecurityRisk +- `source`: Literal['agent', 'user', 'environment'] +- `summary`: str | None +- `thinking_blocks`: list[ThinkingBlock | RedactedThinkingBlock] +- `thought`: Sequence[TextContent] +- `tool_call`: MessageToolCall +- `tool_call_id`: str +- `tool_name`: str +- `visualize`: Text + Return Rich Text representation of this action event. + +#### Methods + +#### to_llm_message() + +Individual message - may be incomplete for multi-action batches + +### class AgentErrorEvent + +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) + +Error triggered by the agent. + +Note: This event should not contain model “thought” or “reasoning_content”. It +represents an error produced by the agent/scaffold, not model output. + + +#### Properties + +- `error`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `visualize`: Text + Return Rich Text representation of this agent error event. + +#### Methods + +#### to_llm_message() + +### class Condensation + +Bases: [`Event`](#class-event) + +This action indicates a condensation of the conversation history is happening. + + +#### Properties + +- `forgotten_event_ids`: list[[EventID](#class-eventid)] +- `has_summary_metadata`: bool + Checks if both summary and summary_offset are present. +- `llm_response_id`: [EventID](#class-eventid) +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `summary`: str | None +- `summary_event`: [CondensationSummaryEvent](#class-condensationsummaryevent) + Generates a CondensationSummaryEvent. + Since summary events are not part of the main event store and are generated + dynamically, this property ensures the created event has a unique and consistent + ID based on the condensation event’s ID. + * Raises: + `ValueError` – If no summary is present. +- `summary_offset`: int | None +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. + +#### Methods + +#### apply() + +Applies the condensation to a list of events. + +This method removes events that are marked to be forgotten and returns a new +list of events. If the summary metadata is present (both summary and offset), +the corresponding CondensationSummaryEvent will be inserted at the specified +offset _after_ the forgotten events have been removed. + +### class CondensationRequest + +Bases: [`Event`](#class-event) + +This action is used to request a condensation of the conversation history. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. + +#### Methods + +#### action + +The action type, namely ActionType.CONDENSATION_REQUEST. + +* Type: + str + +### class CondensationSummaryEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +This event represents a summary generated by a condenser. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: SourceType +- `summary`: str + The summary text. + +#### Methods + +#### to_llm_message() + +### class ConversationStateUpdateEvent + +Bases: [`Event`](#class-event) + +Event that contains conversation state updates. + +This event is sent via websocket whenever the conversation state changes, +allowing remote clients to stay in sync without making REST API calls. + +All fields are serialized versions of the corresponding ConversationState fields +to ensure compatibility with websocket transmission. + + +#### Properties + +- `key`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `value`: Any + +#### Methods + +#### classmethod from_conversation_state() + +Create a state update event from a ConversationState object. + +This creates an event containing a snapshot of important state fields. + +* Parameters: + * `state` – The ConversationState to serialize + * `conversation_id` – The conversation ID for the event +* Returns: + A ConversationStateUpdateEvent with serialized state data + +#### classmethod validate_key() + +#### classmethod validate_value() + +### class Event + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Base class for all events. + + +#### Properties + +- `id`: str +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `timestamp`: str +- `visualize`: Text + Return Rich Text representation of this event. + This is a fallback implementation for unknown event types. + Subclasses should override this method to provide specific visualization. +### class LLMCompletionLogEvent + +Bases: [`Event`](#class-event) + +Event containing LLM completion log data. + +When an LLM is configured with log_completions=True in a remote conversation, +this event streams the completion log data back to the client through WebSocket +instead of writing it to a file inside the Docker container. + + +#### Properties + +- `filename`: str +- `log_data`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_name`: str +- `source`: Literal['agent', 'user', 'environment'] +- `usage_id`: str +### class LLMConvertibleEvent + +Bases: [`Event`](#class-event), `ABC` + +Base class for events that can be converted to LLM messages. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### static events_to_messages() + +Convert event stream to LLM message stream, handling multi-action batches + +#### abstractmethod to_llm_message() + +### class MessageEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +Message from either agent or user. + +This is originally the “MessageAction”, but it suppose not to be tool call. + + +#### Properties + +- `activated_skills`: list[str] +- `critic_result`: CriticResult | None +- `extended_content`: list[TextContent] +- `llm_message`: Message +- `llm_response_id`: str | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `reasoning_content`: str +- `sender`: str | None +- `source`: Literal['agent', 'user', 'environment'] +- `thinking_blocks`: Sequence[ThinkingBlock | RedactedThinkingBlock] + Return the Anthropic thinking blocks from the LLM message. +- `visualize`: Text + Return Rich Text representation of this message event. + +#### Methods + +#### to_llm_message() + +### class ObservationBaseEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +Base class for anything as a response to a tool call. + +Examples include tool execution, error, user reject. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `tool_call_id`: str +- `tool_name`: str +### class ObservationEvent + +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) + + +#### Properties + +- `action_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `observation`: Observation +- `visualize`: Text + Return Rich Text representation of this observation event. + +#### Methods + +#### to_llm_message() + +### class PauseEvent + +Bases: [`Event`](#class-event) + +Event indicating that the agent execution was paused by user request. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `visualize`: Text + Return Rich Text representation of this pause event. +### class SystemPromptEvent + +Bases: [`LLMConvertibleEvent`](#class-llmconvertibleevent) + +System prompt added by the agent. + +The system prompt can optionally include dynamic context that varies between +conversations. When `dynamic_context` is provided, it is included as a +second content block in the same system message. Cache markers are NOT +applied here - they are applied by `LLM._apply_prompt_caching()` when +caching is enabled, ensuring provider-specific cache control is only added +when appropriate. + + +#### Properties + +- `dynamic_context`: TextContent | None +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `source`: Literal['agent', 'user', 'environment'] +- `system_prompt`: TextContent +- `tools`: list[ToolDefinition] +- `visualize`: Text + Return Rich Text representation of this system prompt event. + +#### Methods + +#### system_prompt + +The static system prompt text (cacheable across conversations) + +* Type: + openhands.sdk.llm.message.TextContent + +#### tools + +List of available tools + +* Type: + list[openhands.sdk.tool.tool.ToolDefinition] + +#### dynamic_context + +Optional per-conversation context (hosts, repo info, etc.) +Sent as a second TextContent block inside the system message. + +* Type: + openhands.sdk.llm.message.TextContent | None + +#### to_llm_message() + +Convert to a single system LLM message. + +When `dynamic_context` is present the message contains two content +blocks: the static prompt followed by the dynamic context. Cache markers +are NOT applied here - they are applied by `LLM._apply_prompt_caching()` +when caching is enabled, which marks the static block (index 0) and leaves +the dynamic block (index 1) unmarked for cross-conversation cache sharing. + +### class TokenEvent + +Bases: [`Event`](#class-event) + +Event from VLLM representing token IDs used in LLM interaction. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `prompt_token_ids`: list[int] +- `response_token_ids`: list[int] +- `source`: Literal['agent', 'user', 'environment'] +### class UserRejectObservation + +Bases: [`ObservationBaseEvent`](#class-observationbaseevent) + +Observation when an action is rejected by user or hook. + +This event is emitted when: +- User rejects an action during confirmation mode (rejection_source=”user”) +- A PreToolUse hook blocks an action (rejection_source=”hook”) + + +#### Properties + +- `action_id`: str +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `rejection_reason`: str +- `rejection_source`: Literal['user', 'hook'] +- `visualize`: Text + Return Rich Text representation of this user rejection event. + +#### Methods + +#### to_llm_message() + +### openhands.sdk.llm +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.llm.md + +### class CredentialStore + +Bases: `object` + +Store and retrieve OAuth credentials for LLM providers. + + +#### Properties + +- `credentials_dir`: Path + Get the credentials directory, creating it if necessary. + +#### Methods + +#### __init__() + +Initialize the credential store. + +* Parameters: + `credentials_dir` – Optional custom directory for storing credentials. + Defaults to ~/.local/share/openhands/auth/ + +#### delete() + +Delete stored credentials for a vendor. + +* Parameters: + `vendor` – The vendor/provider name +* Returns: + True if credentials were deleted, False if they didn’t exist + +#### get() + +Get stored credentials for a vendor. + +* Parameters: + `vendor` – The vendor/provider name (e.g., ‘openai’) +* Returns: + OAuthCredentials if found and valid, None otherwise + +#### save() + +Save credentials for a vendor. + +* Parameters: + `credentials` – The OAuth credentials to save + +#### update_tokens() + +Update tokens for an existing credential. + +* Parameters: + * `vendor` – The vendor/provider name + * `access_token` – New access token + * `refresh_token` – New refresh token (if provided) + * `expires_in` – Token expiry in seconds +* Returns: + Updated credentials, or None if no existing credentials found + +### class ImageContent + +Bases: `BaseContent` + + +#### Properties + +- `image_urls`: list[str] +- `type`: Literal['image'] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### to_llm_dict() + +Convert to LLM API format. + +### class LLM + +Bases: `BaseModel`, `RetryMixin`, `NonNativeToolCallingMixin` + +Language model interface for OpenHands agents. + +The LLM class provides a unified interface for interacting with various +language models through the litellm library. It handles model configuration, +API authentication, +retry logic, and tool calling capabilities. + +#### Example + +```pycon +>>> from openhands.sdk import LLM +>>> from pydantic import SecretStr +>>> llm = LLM( +... model="claude-sonnet-4-20250514", +... api_key=SecretStr("your-api-key"), +... usage_id="my-agent" +... ) +>>> # Use with agent or conversation +``` + + +#### Properties + +- `api_key`: str | SecretStr | None +- `api_version`: str | None +- `aws_access_key_id`: str | SecretStr | None +- `aws_region_name`: str | None +- `aws_secret_access_key`: str | SecretStr | None +- `base_url`: str | None +- `caching_prompt`: bool +- `custom_tokenizer`: str | None +- `disable_stop_word`: bool | None +- `disable_vision`: bool | None +- `drop_params`: bool +- `enable_encrypted_reasoning`: bool +- `extended_thinking_budget`: int | None +- `extra_headers`: dict[str, str] | None +- `force_string_serializer`: bool | None +- `input_cost_per_token`: float | None +- `is_subscription`: bool + Check if this LLM uses subscription-based authentication. + Returns True when the LLM was created via LLM.subscription_login(), + which uses the ChatGPT subscription Codex backend rather than the + standard OpenAI API. + * Returns: + True if using subscription-based transport, False otherwise. + * Return type: + bool +- `litellm_extra_body`: dict[str, Any] +- `log_completions`: bool +- `log_completions_folder`: str +- `max_input_tokens`: int | None +- `max_message_chars`: int +- `max_output_tokens`: int | None +- `metrics`: [Metrics](#class-metrics) + Get usage metrics for this LLM instance. + * Returns: + Metrics object containing token usage, costs, and other statistics. +- `model`: str +- `model_canonical_name`: str | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `model_info`: dict | None + Returns the model info dictionary. +- `modify_params`: bool +- `native_tool_calling`: bool +- `num_retries`: int +- `ollama_base_url`: str | None +- `openrouter_app_name`: str +- `openrouter_site_url`: str +- `output_cost_per_token`: float | None +- `prompt_cache_retention`: str | None +- `reasoning_effort`: Literal['low', 'medium', 'high', 'xhigh', 'none'] | None +- `reasoning_summary`: Literal['auto', 'concise', 'detailed'] | None +- `retry_listener`: SkipJsonSchema[Callable[[int, int, BaseException | None], None] | None] +- `retry_max_wait`: int +- `retry_min_wait`: int +- `retry_multiplier`: float +- `safety_settings`: list[dict[str, str]] | None +- `seed`: int | None +- `stream`: bool +- `telemetry`: Telemetry + Get telemetry handler for this LLM instance. + * Returns: + Telemetry object for managing logging and metrics callbacks. +- `temperature`: float | None +- `timeout`: int | None +- `top_k`: float | None +- `top_p`: float | None +- `usage_id`: str + +#### Methods + +#### completion() + +Generate a completion from the language model. + +This is the method for getting responses from the model via Completion API. +It handles message formatting, tool calling, and response processing. + +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `_return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming tokens + kwargs* – Additional arguments passed to the LLM API +* Returns: + LLMResponse containing the model’s response and metadata. + +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. + +* Raises: + `ValueError` – If streaming is requested (not supported). + +#### format_messages_for_llm() + +Formats Message objects for LLM consumption. + +#### format_messages_for_responses() + +Prepare (instructions, input[]) for the OpenAI Responses API. + +- Skips prompt caching flags and string serializer concerns +- Uses Message.to_responses_value to get either instructions (system) + or input items (others) +- Concatenates system instructions into a single instructions string +- For subscription mode, system prompts are prepended to user content + +#### get_token_count() + +#### is_caching_prompt_active() + +Check if prompt caching is supported and enabled for current model. + +* Returns: + True if prompt caching is supported and enabled for the given + : model. +* Return type: + boolean + +#### classmethod load_from_env() + +#### classmethod load_from_json() + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### reset_metrics() + +Reset metrics and telemetry to fresh instances. + +This is used by the LLMRegistry to ensure each registered LLM has +independent metrics, preventing metrics from being shared between +LLMs that were created via model_copy(). + +When an LLM is copied (e.g., to create a condenser LLM from an agent LLM), +Pydantic’s model_copy() does a shallow copy of private attributes by default, +causing the original and copied LLM to share the same Metrics object. +This method allows the registry to fix this by resetting metrics to None, +which will be lazily recreated when accessed. + +#### responses() + +Alternative invocation path using OpenAI Responses API via LiteLLM. + +Maps Message[] -> (instructions, input[]) and returns LLMResponse. + +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `include` – Optional list of fields to include in response + * `store` – Whether to store the conversation + * `_return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming deltas + kwargs* – Additional arguments passed to the API + +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. + +#### restore_metrics() + +#### classmethod subscription_login() + +Authenticate with a subscription service and return an LLM instance. + +This method provides subscription-based access to LLM models that are +available through chat subscriptions (e.g., ChatGPT Plus/Pro) rather +than API credits. It handles credential caching, token refresh, and +the OAuth login flow. + +Currently supported vendors: +- “openai”: ChatGPT Plus/Pro subscription for Codex models + +Supported OpenAI models: +- gpt-5.1-codex-max +- gpt-5.1-codex-mini +- gpt-5.2 +- gpt-5.2-codex + +* Parameters: + * `vendor` – The vendor/provider. Currently only “openai” is supported. + * `model` – The model to use. Must be supported by the vendor’s + subscription service. + * `force_login` – If True, always perform a fresh login even if valid + credentials exist. + * `open_browser` – Whether to automatically open the browser for the + OAuth login flow. + llm_kwargs* – Additional arguments to pass to the LLM constructor. +* Returns: + An LLM instance configured for subscription-based access. +* Raises: + * `ValueError` – If the vendor or model is not supported. + * `RuntimeError` – If authentication fails. + +#### uses_responses_api() + +Whether this model uses the OpenAI Responses API path. + +#### vision_is_active() + +### class LLMProfileStore + +Bases: `object` + +Standalone utility for persisting LLM configurations. + +#### Methods + +#### __init__() + +Initialize the profile store. + +* Parameters: + `base_dir` – Path to the directory where the profiles are stored. + If None is provided, the default directory is used, i.e., + ~/.openhands/profiles. + +#### delete() + +Delete an existing profile. + +If the profile is not present in the profile directory, it does nothing. + +* Parameters: + `name` – Name of the profile to delete. +* Raises: + `TimeoutError` – If the lock cannot be acquired. + +#### list() + +Returns a list of all profiles stored. + +* Returns: + List of profile filenames (e.g., [“default.json”, “gpt4.json”]). + +#### load() + +Load an LLM instance from the given profile name. + +* Parameters: + `name` – Name of the profile to load. +* Returns: + An LLM instance constructed from the profile configuration. +* Raises: + * `FileNotFoundError` – If the profile name does not exist. + * `ValueError` – If the profile file is corrupted or invalid. + * `TimeoutError` – If the lock cannot be acquired. + +#### save() + +Save a profile to the profile directory. + +Note that if a profile name already exists, it will be overwritten. + +* Parameters: + * `name` – Name of the profile to save. + * `llm` – LLM instance to save + * `include_secrets` – Whether to include the profile secrets. Defaults to False. +* Raises: + `TimeoutError` – If the lock cannot be acquired. + +### class LLMRegistry + +Bases: `object` + +A minimal LLM registry for managing LLM instances by usage ID. + +This registry provides a simple way to manage multiple LLM instances, +avoiding the need to recreate LLMs with the same configuration. + +The registry also ensures that each registered LLM has independent metrics, +preventing metrics from being shared between LLMs that were created via +model_copy(). This is important for scenarios like creating a condenser LLM +from an agent LLM, where each should track its own usage independently. + + +#### Properties + +- `registry_id`: str +- `retry_listener`: Callable[[int, int], None] | None +- `subscriber`: Callable[[[RegistryEvent](#class-registryevent)], None] | None +- `usage_to_llm`: MappingProxyType + Access the internal usage-ID-to-LLM mapping (read-only view). + +#### Methods + +#### __init__() + +Initialize the LLM registry. + +* Parameters: + `retry_listener` – Optional callback for retry events. + +#### add() + +Add an LLM instance to the registry. + +This method ensures that the LLM has independent metrics before +registering it. If the LLM’s metrics are shared with another +registered LLM (e.g., due to model_copy()), fresh metrics will +be created automatically. + +* Parameters: + `llm` – The LLM instance to register. +* Raises: + `ValueError` – If llm.usage_id already exists in the registry. + +#### get() + +Get an LLM instance from the registry. + +* Parameters: + `usage_id` – Unique identifier for the LLM usage slot. +* Returns: + The LLM instance. +* Raises: + `KeyError` – If usage_id is not found in the registry. + +#### list_usage_ids() + +List all registered usage IDs. + +#### notify() + +Notify subscribers of registry events. + +* Parameters: + `event` – The registry event to notify about. + +#### subscribe() + +Subscribe to registry events. + +* Parameters: + `callback` – Function to call when LLMs are created or updated. + +### class LLMResponse + +Bases: `BaseModel` + +Result of an LLM completion request. + +This type provides a clean interface for LLM completion results, exposing +only OpenHands-native types to consumers while preserving access to the +raw LiteLLM response for internal use. + + +#### Properties + +- `id`: str + Get the response ID from the underlying LLM response. + This property provides a clean interface to access the response ID, + supporting both completion mode (ModelResponse) and response API modes + (ResponsesAPIResponse). + * Returns: + The response ID from the LLM response +- `message`: [Message](#class-message) +- `metrics`: [MetricsSnapshot](#class-metricssnapshot) +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `raw_response`: ModelResponse | ResponsesAPIResponse + +#### Methods + +#### message + +The completion message converted to OpenHands Message type + +* Type: + [openhands.sdk.llm.message.Message](#class-message) + +#### metrics + +Snapshot of metrics from the completion request + +* Type: + [openhands.sdk.llm.utils.metrics.MetricsSnapshot](#class-metricssnapshot) + +#### raw_response + +The original LiteLLM response (ModelResponse or +ResponsesAPIResponse) for internal use + +* Type: + litellm.types.utils.ModelResponse | litellm.types.llms.openai.ResponsesAPIResponse + +### class Message + +Bases: `BaseModel` + + +#### Properties + +- `contains_image`: bool +- `content`: Sequence[[TextContent](#class-textcontent) | [ImageContent](#class-imagecontent)] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: str | None +- `reasoning_content`: str | None +- `responses_reasoning_item`: [ReasoningItemModel](#class-reasoningitemmodel) | None +- `role`: Literal['user', 'system', 'assistant', 'tool'] +- `thinking_blocks`: Sequence[[ThinkingBlock](#class-thinkingblock) | [RedactedThinkingBlock](#class-redactedthinkingblock)] +- `tool_call_id`: str | None +- `tool_calls`: list[[MessageToolCall](#class-messagetoolcall)] | None + +#### Methods + +#### classmethod from_llm_chat_message() + +Convert a LiteLLMMessage (Chat Completions) to our Message class. + +Provider-agnostic mapping for reasoning: +- Prefer message.reasoning_content if present (LiteLLM normalized field) +- Extract thinking_blocks from content array (Anthropic-specific) + +#### classmethod from_llm_responses_output() + +Convert OpenAI Responses API output items into a single assistant Message. + +Policy (non-stream): +- Collect assistant text by concatenating output_text parts from message items +- Normalize function_call items to MessageToolCall list + +#### to_chat_dict() + +Serialize message for OpenAI Chat Completions. + +* Parameters: + * `cache_enabled` – Whether prompt caching is active. + * `vision_enabled` – Whether vision/image processing is enabled. + * `function_calling_enabled` – Whether native function calling is enabled. + * `force_string_serializer` – Force string serializer instead of list format. + * `send_reasoning_content` – Whether to include reasoning_content in output. + +Chooses the appropriate content serializer and then injects threading keys: +- Assistant tool call turn: role == “assistant” and self.tool_calls +- Tool result turn: role == “tool” and self.tool_call_id (with name) + +#### to_responses_dict() + +Serialize message for OpenAI Responses (input parameter). + +Produces a list of “input” items for the Responses API: +- system: returns [], system content is expected in ‘instructions’ +- user: one ‘message’ item with content parts -> input_text / input_image +(when vision enabled) +- assistant: emits prior assistant content as input_text, +and function_call items for tool_calls +- tool: emits function_call_output items (one per TextContent) +with matching call_id + +#### to_responses_value() + +Return serialized form. + +Either an instructions string (for system) or input items (for other roles). + +### class MessageToolCall + +Bases: `BaseModel` + +Transport-agnostic tool call representation. + +One canonical id is used for linking across actions/observations and +for Responses function_call_output call_id. + + +#### Properties + +- `arguments`: str +- `id`: str +- `name`: str +- `origin`: Literal['completion', 'responses'] +- `costs`: list[Cost] +- `response_latencies`: list[ResponseLatency] +- `token_usages`: list[TokenUsage] + +#### Methods + +#### classmethod from_chat_tool_call() + +Create a MessageToolCall from a Chat Completions tool call. + +#### classmethod from_responses_function_call() + +Create a MessageToolCall from a typed OpenAI Responses function_call item. + +Note: OpenAI Responses function_call.arguments is already a JSON string. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### to_chat_dict() + +Serialize to OpenAI Chat Completions tool_calls format. + +#### to_responses_dict() + +Serialize to OpenAI Responses ‘function_call’ input item format. + +#### add_cost() + +#### add_response_latency() + +#### add_token_usage() + +Add a single usage record. + +#### deep_copy() + +Create a deep copy of the Metrics object. + +#### diff() + +Calculate the difference between current metrics and a baseline. + +This is useful for tracking metrics for specific operations like delegates. + +* Parameters: + `baseline` – A metrics object representing the baseline state +* Returns: + A new Metrics object containing only the differences since the baseline + +#### get() + +Return the metrics in a dictionary. + +#### get_snapshot() + +Get a snapshot of the current metrics without the detailed lists. + +#### initialize_accumulated_token_usage() + +#### log() + +Log the metrics. + +#### merge() + +Merge ‘other’ metrics into this one. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### classmethod validate_accumulated_cost() + +### class MetricsSnapshot + +Bases: `BaseModel` + +A snapshot of metrics at a point in time. + +Does not include lists of individual costs, latencies, or token usages. + + +#### Properties + +- `accumulated_cost`: float +- `accumulated_token_usage`: TokenUsage | None +- `max_budget_per_task`: float | None +- `model_name`: str + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class OAuthCredentials + +Bases: `BaseModel` + +OAuth credentials for subscription-based LLM access. + + +#### Properties + +- `access_token`: str +- `expires_at`: int +- `refresh_token`: str +- `type`: Literal['oauth'] +- `vendor`: str + +#### Methods + +#### is_expired() + +Check if the access token is expired. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class OpenAISubscriptionAuth + +Bases: `object` + +Handle OAuth authentication for OpenAI ChatGPT subscription access. + + +#### Properties + +- `vendor`: str + Get the vendor name. + +#### Methods + +#### __init__() + +Initialize the OpenAI subscription auth handler. + +* Parameters: + * `credential_store` – Optional custom credential store. + * `oauth_port` – Port for the local OAuth callback server. + +#### create_llm() + +Create an LLM instance configured for Codex subscription access. + +* Parameters: + * `model` – The model to use (must be in OPENAI_CODEX_MODELS). + * `credentials` – OAuth credentials to use. If None, uses stored credentials. + * `instructions` – Optional instructions for the Codex model. + llm_kwargs* – Additional arguments to pass to LLM constructor. +* Returns: + An LLM instance configured for Codex access. +* Raises: + `ValueError` – If the model is not supported or no credentials available. + +#### get_credentials() + +Get stored credentials if they exist. + +#### has_valid_credentials() + +Check if valid (non-expired) credentials exist. + +#### async login() + +Perform OAuth login flow. + +This starts a local HTTP server to handle the OAuth callback, +opens the browser for user authentication, and waits for the +callback with the authorization code. + +* Parameters: + `open_browser` – Whether to automatically open the browser. +* Returns: + The obtained OAuth credentials. +* Raises: + `RuntimeError` – If the OAuth flow fails or times out. + +#### logout() + +Remove stored credentials. + +* Returns: + True if credentials were removed, False if none existed. + +#### async refresh_if_needed() + +Refresh credentials if they are expired. + +* Returns: + Updated credentials, or None if no credentials exist. +* Raises: + `RuntimeError` – If token refresh fails. + +### class ReasoningItemModel + +Bases: `BaseModel` + +OpenAI Responses reasoning item (non-stream, subset we consume). + +Do not log or render encrypted_content. + + +#### Properties + +- `content`: list[str] | None +- `encrypted_content`: str | None +- `id`: str | None +- `status`: str | None +- `summary`: list[str] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class RedactedThinkingBlock + +Bases: `BaseModel` + +Redacted thinking block for previous responses without extended thinking. + +This is used as a placeholder for assistant messages that were generated +before extended thinking was enabled. + + +#### Properties + +- `data`: str +- `type`: Literal['redacted_thinking'] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class RegistryEvent + +Bases: `BaseModel` + + +#### Properties + +- `llm`: [LLM](#class-llm) +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +### class RouterLLM + +Bases: [`LLM`](#class-llm) + +Base class for multiple LLM acting as a unified LLM. +This class provides a foundation for implementing model routing by +inheriting from LLM, allowing routers to work with multiple underlying +LLM models while presenting a unified LLM interface to consumers. +Key features: +- Works with multiple LLMs configured via llms_for_routing +- Delegates all other operations/properties to the selected LLM +- Provides routing interface through select_llm() method + + +#### Properties + +- `active_llm`: [LLM](#class-llm) | None +- `llms_for_routing`: dict[str, [LLM](#class-llm)] +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `router_name`: str + +#### Methods + +#### completion() + +This method intercepts completion calls and routes them to the appropriate +underlying LLM based on the routing logic implemented in select_llm(). + +* Parameters: + * `messages` – List of conversation messages + * `tools` – Optional list of tools available to the model + * `return_metrics` – Whether to return usage metrics + * `add_security_risk_prediction` – Add security_risk field to tool schemas + * `on_token` – Optional callback for streaming tokens + kwargs* – Additional arguments passed to the LLM API + +#### NOTE +Summary field is always added to tool schemas for transparency and +explainability of agent actions. + +#### model_post_init() + +This function is meant to behave like a BaseModel method to initialise private attributes. + +It takes context as an argument since that’s what pydantic-core passes when calling it. + +* Parameters: + * `self` – The BaseModel instance. + * `context` – The context. + +#### abstractmethod select_llm() + +Select which LLM to use based on messages and events. + +This method implements the core routing logic for the RouterLLM. +Subclasses should analyze the provided messages to determine which +LLM from llms_for_routing is most appropriate for handling the request. + +* Parameters: + `messages` – List of messages in the conversation that can be used + to inform the routing decision. +* Returns: + The key/name of the LLM to use from llms_for_routing dictionary. + +#### classmethod set_placeholder_model() + +Guarantee model exists before LLM base validation runs. + +#### classmethod validate_llms_not_empty() + +### class TextContent + +Bases: `BaseContent` + + +#### Properties + +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `text`: str +- `type`: Literal['text'] + +#### Methods + +#### to_llm_dict() + +Convert to LLM API format. + +### class ThinkingBlock + +Bases: `BaseModel` + +Anthropic thinking block for extended thinking feature. + +This represents the raw thinking blocks returned by Anthropic models +when extended thinking is enabled. These blocks must be preserved +and passed back to the API for tool use scenarios. + + +#### Properties + +- `signature`: str | None +- `thinking`: str +- `type`: Literal['thinking'] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### openhands.sdk.security +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.security.md + +### class AlwaysConfirm + +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +### class ConfirmRisky + +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) + + +#### Properties + +- `confirm_unknown`: bool +- `threshold`: [SecurityRisk](#class-securityrisk) + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +#### classmethod validate_threshold() + +### class ConfirmationPolicyBase + +Bases: `DiscriminatedUnionMixin`, `ABC` + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### abstractmethod should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +### class GraySwanAnalyzer + +Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) + +Security analyzer using GraySwan’s Cygnal API for AI safety monitoring. + +This analyzer sends conversation history and pending actions to the GraySwan +Cygnal API for security analysis. The API returns a violation score which is +mapped to SecurityRisk levels. + +Environment Variables: +: GRAYSWAN_API_KEY: Required API key for GraySwan authentication + GRAYSWAN_POLICY_ID: Optional policy ID for custom GraySwan policy + +#### Example + +```pycon +>>> from openhands.sdk.security.grayswan import GraySwanAnalyzer +>>> analyzer = GraySwanAnalyzer() +>>> risk = analyzer.security_risk(action_event) +``` + + +#### Properties + +- `api_key`: SecretStr | None +- `api_url`: str +- `history_limit`: int +- `low_threshold`: float +- `max_message_chars`: int +- `medium_threshold`: float +- `policy_id`: str | None +- `timeout`: float + +#### Methods + +#### close() + +Clean up resources. + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +Initialize the analyzer after model creation. + +#### security_risk() + +Analyze action for security risks using GraySwan API. + +This method converts the conversation history and the pending action +to OpenAI message format and sends them to the GraySwan Cygnal API +for security analysis. + +* Parameters: + `action` – The ActionEvent to analyze +* Returns: + SecurityRisk level based on GraySwan analysis + +#### set_events() + +Set the events for context when analyzing actions. + +* Parameters: + `events` – Sequence of events to use as context for security analysis + +#### validate_thresholds() + +Validate that thresholds are properly ordered. + +### class LLMSecurityAnalyzer + +Bases: [`SecurityAnalyzerBase`](#class-securityanalyzerbase) + +LLM-based security analyzer. + +This analyzer respects the security_risk attribute that can be set by the LLM +when generating actions, similar to OpenHands’ LLMRiskAnalyzer. + +It provides a lightweight security analysis approach that leverages the LLM’s +understanding of action context and potential risks. + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### security_risk() + +Evaluate security risk based on LLM-provided assessment. + +This method checks if the action has a security_risk attribute set by the LLM +and returns it. The LLM may not always provide this attribute but it defaults to +UNKNOWN if not explicitly set. + +### class NeverConfirm + +Bases: [`ConfirmationPolicyBase`](#class-confirmationpolicybase) + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### should_confirm() + +Determine if an action with the given risk level requires confirmation. + +This method defines the core logic for determining whether user confirmation +is required before executing an action based on its security risk level. + +* Parameters: + `risk` – The security risk level of the action to be evaluated. + Defaults to SecurityRisk.UNKNOWN if not specified. +* Returns: + True if the action requires user confirmation before execution, + False if the action can proceed without confirmation. + +### class SecurityAnalyzerBase + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Abstract base class for security analyzers. + +Security analyzers evaluate the risk of actions before they are executed +and can influence the conversation flow based on security policies. + +This is adapted from OpenHands SecurityAnalyzer but designed to work +with the agent-sdk’s conversation-based architecture. + +#### Methods + +#### analyze_event() + +Analyze an event for security risks. + +This is a convenience method that checks if the event is an action +and calls security_risk() if it is. Non-action events return None. + +* Parameters: + `event` – The event to analyze +* Returns: + ActionSecurityRisk if event is an action, None otherwise + +#### analyze_pending_actions() + +Analyze all pending actions in a conversation. + +This method gets all unmatched actions from the conversation state +and analyzes each one for security risks. + +* Parameters: + `conversation` – The conversation to analyze +* Returns: + List of tuples containing (action, risk_level) for each pending action + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### abstractmethod security_risk() + +Evaluate the security risk of an ActionEvent. + +This is the core method that analyzes an ActionEvent and returns its risk level. +Implementations should examine the action’s content, context, and potential +impact to determine the appropriate risk level. + +* Parameters: + `action` – The ActionEvent to analyze for security risks +* Returns: + ActionSecurityRisk enum indicating the risk level + +#### should_require_confirmation() + +Determine if an action should require user confirmation. + +This implements the default confirmation logic based on risk level +and confirmation mode settings. + +* Parameters: + * `risk` – The security risk level of the action + * `confirmation_mode` – Whether confirmation mode is enabled +* Returns: + True if confirmation is required, False otherwise + +### class SecurityRisk + +Bases: `str`, `Enum` + +Security risk levels for actions. + +Based on OpenHands security risk levels but adapted for agent-sdk. +Integer values allow for easy comparison and ordering. + + +#### Properties + +- `description`: str + Get a human-readable description of the risk level. +- `visualize`: Text + Return Rich Text representation of this risk level. + +#### Methods + +#### HIGH = 'HIGH' + +#### LOW = 'LOW' + +#### MEDIUM = 'MEDIUM' + +#### UNKNOWN = 'UNKNOWN' + +#### get_color() + +Get the color for displaying this risk level in Rich text. + +#### is_riskier() + +Check if this risk level is riskier than another. + +Risk levels follow the natural ordering: LOW is less risky than MEDIUM, which is +less risky than HIGH. UNKNOWN is not comparable to any other level. + +To make this act like a standard well-ordered domain, we reflexively consider +risk levels to be riskier than themselves. That is: + + for risk_level in list(SecurityRisk): + : assert risk_level.is_riskier(risk_level) + + # More concretely: + assert SecurityRisk.HIGH.is_riskier(SecurityRisk.HIGH) + assert SecurityRisk.MEDIUM.is_riskier(SecurityRisk.MEDIUM) + assert SecurityRisk.LOW.is_riskier(SecurityRisk.LOW) + +This can be disabled by setting the reflexive parameter to False. + +* Parameters: + other ([SecurityRisk*](#class-securityrisk)) – The other risk level to compare against. + reflexive (bool*) – Whether the relationship is reflexive. +* Raises: + `ValueError` – If either risk level is UNKNOWN. + +### openhands.sdk.tool +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.tool.md + +### class Action + +Bases: `Schema`, `ABC` + +Base schema for input action. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `visualize`: Text + Return Rich Text representation of this action. + This method can be overridden by subclasses to customize visualization. + The base implementation displays all action fields systematically. +### class ExecutableTool + +Bases: `Protocol` + +Protocol for tools that are guaranteed to have a non-None executor. + +This eliminates the need for runtime None checks and type narrowing +when working with tools that are known to be executable. + + +#### Properties + +- `executor`: [ToolExecutor](#class-toolexecutor)[Any, Any] +- `name`: str + +#### Methods + +#### __init__() + +### class FinishTool + +Bases: `ToolDefinition[FinishAction, FinishObservation]` + +Tool for signaling the completion of a task or conversation. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### classmethod create() + +Create FinishTool instance. + +* Parameters: + * `conv_state` – Optional conversation state (not used by FinishTool). + params* – Additional parameters (none supported). +* Returns: + A sequence containing a single FinishTool instance. +* Raises: + `ValueError` – If any parameters are provided. + +#### name = 'finish' + +### class Observation + +Bases: `Schema`, `ABC` + +Base schema for output observation. + + +#### Properties + +- `ERROR_MESSAGE_HEADER`: ClassVar[str] = '[An error occurred during execution.]n' +- `content`: list[TextContent | ImageContent] +- `is_error`: bool +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `text`: str + Extract all text content from the observation. + * Returns: + Concatenated text from all TextContent items in content. +- `to_llm_content`: Sequence[TextContent | ImageContent] + Default content formatting for converting observation to LLM readable content. + Subclasses can override to provide richer content (e.g., images, diffs). +- `visualize`: Text + Return Rich Text representation of this observation. + Subclasses can override for custom visualization; by default we show the + same text that would be sent to the LLM. + +#### Methods + +#### classmethod from_text() + +Utility to create an Observation from a simple text string. + +* Parameters: + * `text` – The text content to include in the observation. + * `is_error` – Whether this observation represents an error. + kwargs* – Additional fields for the observation subclass. +* Returns: + An Observation instance with the text wrapped in a TextContent. + +### class ThinkTool + +Bases: `ToolDefinition[ThinkAction, ThinkObservation]` + +Tool for logging thoughts without making changes. + + +#### Properties + +- `model_config`: = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### Methods + +#### classmethod create() + +Create ThinkTool instance. + +* Parameters: + * `conv_state` – Optional conversation state (not used by ThinkTool). + params* – Additional parameters (none supported). +* Returns: + A sequence containing a single ThinkTool instance. +* Raises: + `ValueError` – If any parameters are provided. + +#### name = 'think' + +### class Tool + +Bases: `BaseModel` + +Defines a tool to be initialized for the agent. + +This is only used in agent-sdk for type schema for server use. + + +#### Properties + +- `name`: str +- `params`: dict[str, Any] + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### classmethod validate_name() + +Validate that name is not empty. + +#### classmethod validate_params() + +Convert None params to empty dict. + +### class ToolAnnotations + +Bases: `BaseModel` + +Annotations to provide hints about the tool’s behavior. + +Based on Model Context Protocol (MCP) spec: +[https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/caf3424488b10b4a7b1f8cb634244a450a1f4400/schema/2025-06-18/schema.ts#L838) + + +#### Properties + +- `destructiveHint`: bool +- `idempotentHint`: bool +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `openWorldHint`: bool +- `readOnlyHint`: bool +- `title`: str | None +### class ToolDefinition + +Bases: `DiscriminatedUnionMixin`, `ABC`, `Generic` + +Base class for all tool implementations. + +This class serves as a base for the discriminated union of all tool types. +All tools must inherit from this class and implement the .create() method for +proper initialization with executors and parameters. + +Features: +- Normalize input/output schemas (class or dict) into both model+schema. +- Validate inputs before execute. +- Coerce outputs only if an output model is defined; else return vanilla JSON. +- Export MCP tool description. + +#### Examples + +Simple tool with no parameters: +: class FinishTool(ToolDefinition[FinishAction, FinishObservation]): + : @classmethod + def create(cls, conv_state=None, + `
` + ``` + ** + ``` + `
` + params): + `
` + > return [cls(name=”finish”, …, executor=FinishExecutor())] + +Complex tool with initialization parameters: +: class TerminalTool(ToolDefinition[TerminalAction, + : TerminalObservation]): + @classmethod + def create(cls, conv_state, + `
` + ``` + ** + ``` + `
` + params): + `
` + > executor = TerminalExecutor( + > : working_dir=conv_state.workspace.working_dir, + > `
` + > ``` + > ** + > ``` + > `
` + > params, + `
` + > ) + > return [cls(name=”terminal”, …, executor=executor)] + + +#### Properties + +- `action_type`: type[[Action](#class-action)] +- `annotations`: [ToolAnnotations](#class-toolannotations) | None +- `description`: str +- `executor`: Annotated[[ToolExecutor](#class-toolexecutor) | None, SkipJsonSchema()] +- `meta`: dict[str, Any] | None +- `model_config`: ClassVar[ConfigDict] = (configuration object) + Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. +- `name`: ClassVar[str] = '' +- `observation_type`: type[[Observation](#class-observation)] | None +- `title`: str + +#### Methods + +#### action_from_arguments() + +Create an action from parsed arguments. + +This method can be overridden by subclasses to provide custom logic +for creating actions from arguments (e.g., for MCP tools). + +* Parameters: + `arguments` – The parsed arguments from the tool call. +* Returns: + The action instance created from the arguments. + +#### as_executable() + +Return this tool as an ExecutableTool, ensuring it has an executor. + +This method eliminates the need for runtime None checks by guaranteeing +that the returned tool has a non-None executor. + +* Returns: + This tool instance, typed as ExecutableTool. +* Raises: + `NotImplementedError` – If the tool has no executor. + +#### abstractmethod classmethod create() + +Create a sequence of Tool instances. + +This method must be implemented by all subclasses to provide custom +initialization logic, typically initializing the executor with parameters +from conv_state and other optional parameters. + +* Parameters: + args** – Variable positional arguments (typically conv_state as first arg). + kwargs* – Optional parameters for tool initialization. +* Returns: + A sequence of Tool instances. Even single tools are returned as a sequence + to provide a consistent interface and eliminate union return types. + +#### classmethod resolve_kind() + +Resolve a kind string to its corresponding tool class. + +* Parameters: + `kind` – The name of the tool class to resolve +* Returns: + The tool class corresponding to the kind +* Raises: + `ValueError` – If the kind is unknown + +#### set_executor() + +Create a new Tool instance with the given executor. + +#### to_mcp_tool() + +Convert a Tool to an MCP tool definition. + +Allow overriding input/output schemas (usually by subclasses). + +* Parameters: + * `input_schema` – Optionally override the input schema. + * `output_schema` – Optionally override the output schema. + +#### to_openai_tool() + +Convert a Tool to an OpenAI tool. + +* Parameters: + * `add_security_risk_prediction` – Whether to add a security_risk field + to the action schema for LLM to predict. This is useful for + tools that may have safety risks, so the LLM can reason about + the risk level before calling the tool. + * `action_type` – Optionally override the action_type to use for the schema. + This is useful for MCPTool to use a dynamically created action type + based on the tool’s input schema. + +#### NOTE +Summary field is always added to the schema for transparency and +explainability of agent actions. + +#### to_responses_tool() + +Convert a Tool to a Responses API function tool (LiteLLM typed). + +For Responses API, function tools expect top-level keys: +(JSON configuration object) + +* Parameters: + * `add_security_risk_prediction` – Whether to add a security_risk field + * `action_type` – Optional override for the action type + +#### NOTE +Summary field is always added to the schema for transparency and +explainability of agent actions. + +### class ToolExecutor + +Bases: `ABC`, `Generic` + +Executor function type for a Tool. + +#### Methods + +#### close() + +Close the executor and clean up resources. + +Default implementation does nothing. Subclasses should override +this method to perform cleanup (e.g., closing connections, +terminating processes, etc.). + +### openhands.sdk.utils +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.utils.md + +Utility functions for the OpenHands SDK. + +### deprecated() + +Return a decorator that deprecates a callable with explicit metadata. + +Use this helper when you can annotate a function, method, or property with +@deprecated(…). It transparently forwards to `deprecation.deprecated()` +while filling in the SDK’s current version metadata unless custom values are +supplied. + +### maybe_truncate() + +Truncate the middle of content if it exceeds the specified length. + +Keeps the head and tail of the content to preserve context at both ends. +Optionally saves the full content to a file for later investigation. + +* Parameters: + * `content` – The text content to potentially truncate + * `truncate_after` – Maximum length before truncation. If None, no truncation occurs + * `truncate_notice` – Notice to insert in the middle when content is truncated + * `save_dir` – Working directory to save full content file in + * `tool_prefix` – Prefix for the saved file (e.g., “bash”, “browser”, “editor”) +* Returns: + Original content if under limit, or truncated content with head and tail + preserved and reference to saved file if applicable + +### sanitize_openhands_mentions() + +Sanitize @OpenHands mentions in text to prevent self-mention loops. + +This function inserts a zero-width joiner (ZWJ) after the @ symbol in +@OpenHands mentions, making them non-clickable in GitHub comments while +preserving readability. The original case of the mention is preserved. + +* Parameters: + `text` – The text to sanitize +* Returns: + Text with sanitized @OpenHands mentions (e.g., “@OpenHands” -> “@‍OpenHands”) + +### Examples + +```pycon +>>> sanitize_openhands_mentions("Thanks @OpenHands for the help!") +'Thanks @u200dOpenHands for the help!' +>>> sanitize_openhands_mentions("Check @openhands and @OPENHANDS") +'Check @u200dopenhands and @u200dOPENHANDS' +>>> sanitize_openhands_mentions("No mention here") +'No mention here' +``` + +### sanitized_env() + +Return a copy of env with sanitized values. + +PyInstaller-based binaries rewrite `LD_LIBRARY_PATH` so their vendored +libraries win. This function restores the original value so that subprocess +will not use them. + +### warn_deprecated() + +Emit a deprecation warning for dynamic access to a legacy feature. + +Prefer this helper when a decorator is not practical—e.g. attribute accessors, +data migrations, or other runtime paths that must conditionally warn. Provide +explicit version metadata so the SDK reports consistent messages and upgrades +to `deprecation.UnsupportedWarning` after the removal threshold. + +### openhands.sdk.workspace +Source: https://docs.openhands.dev/sdk/api-reference/openhands.sdk.workspace.md + +### class BaseWorkspace + +Bases: `DiscriminatedUnionMixin`, `ABC` + +Abstract base class for workspace implementations. + +Workspaces provide a sandboxed environment where agents can execute commands, +read/write files, and perform other operations. All workspace implementations +support the context manager protocol for safe resource management. + +#### Example + +```pycon +>>> with workspace: +... result = workspace.execute_command("echo 'hello'") +... content = workspace.read_file("example.txt") +``` + + +#### Properties + +- `working_dir`: Annotated[str, BeforeValidator(func=_convert_path_to_str, json_schema_input_type=PydanticUndefined), FieldInfo(annotation=NoneType, required=True, description='The working directory for agent operations and tool execution. Accepts both string paths and Path objects. Path objects are automatically converted to strings.')] + +#### Methods + +#### abstractmethod execute_command() + +Execute a bash command on the system. + +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory for the command (optional) + * `timeout` – Timeout in seconds (defaults to 30.0) +* Returns: + Result containing stdout, stderr, exit_code, and other + : metadata +* Return type: + [CommandResult](#class-commandresult) +* Raises: + `Exception` – If command execution fails + +#### abstractmethod file_download() + +Download a file from the system. + +* Parameters: + * `source_path` – Path to the source file on the system + * `destination_path` – Path where the file should be downloaded +* Returns: + Result containing success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) +* Raises: + `Exception` – If file download fails + +#### abstractmethod file_upload() + +Upload a file to the system. + +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be uploaded +* Returns: + Result containing success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) +* Raises: + `Exception` – If file upload fails + +#### abstractmethod git_changes() + +Get the git changes for the repository at the path given. + +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed + +#### abstractmethod git_diff() + +Get the git diff for the file at the path given. + +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### pause() + +Pause the workspace to conserve resources. + +For local workspaces, this is a no-op. +For container-based workspaces, this pauses the container. + +* Raises: + `NotImplementedError` – If the workspace type does not support pausing. + +#### resume() + +Resume a paused workspace. + +For local workspaces, this is a no-op. +For container-based workspaces, this resumes the container. + +* Raises: + `NotImplementedError` – If the workspace type does not support resuming. + +### class CommandResult + +Bases: `BaseModel` + +Result of executing a command in the workspace. + + +#### Properties + +- `command`: str +- `exit_code`: int +- `stderr`: str +- `stdout`: str +- `timeout_occurred`: bool + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class FileOperationResult + +Bases: `BaseModel` + +Result of a file upload or download operation. + + +#### Properties + +- `destination_path`: str +- `error`: str | None +- `file_size`: int | None +- `source_path`: str +- `success`: bool + +#### Methods + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +### class LocalWorkspace + +Bases: [`BaseWorkspace`](#class-baseworkspace) + +Local workspace implementation that operates on the host filesystem. + +LocalWorkspace provides direct access to the local filesystem and command execution +environment. It’s suitable for development and testing scenarios where the agent +should operate directly on the host system. + +#### Example + +```pycon +>>> workspace = LocalWorkspace(working_dir="/path/to/project") +>>> with workspace: +... result = workspace.execute_command("ls -la") +... content = workspace.read_file("README.md") +``` + +#### Methods + +#### __init__() + +Create a new model by parsing and validating input data from keyword arguments. + +Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be +validated to form a valid model. + +self is explicitly positional-only to allow self as a field name. + +#### execute_command() + +Execute a bash command locally. + +Uses the shared shell execution utility to run commands with proper +timeout handling, output streaming, and error management. + +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory (optional) + * `timeout` – Timeout in seconds +* Returns: + Result with stdout, stderr, exit_code, command, and + : timeout_occurred +* Return type: + [CommandResult](#class-commandresult) + +#### file_download() + +Download (copy) a file locally. + +For local systems, file download is implemented as a file copy operation +using shutil.copy2 to preserve metadata. + +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be copied +* Returns: + Result with success status and file information +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### file_upload() + +Upload (copy) a file locally. + +For local systems, file upload is implemented as a file copy operation +using shutil.copy2 to preserve metadata. + +* Parameters: + * `source_path` – Path to the source file + * `destination_path` – Path where the file should be copied +* Returns: + Result with success status and file information +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### git_changes() + +Get the git changes for the repository at the path given. + +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed + +#### git_diff() + +Get the git diff for the file at the path given. + +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### pause() + +Pause the workspace (no-op for local workspaces). + +Local workspaces have nothing to pause since they operate directly +on the host filesystem. + +#### resume() + +Resume the workspace (no-op for local workspaces). + +Local workspaces have nothing to resume since they operate directly +on the host filesystem. + +### class RemoteWorkspace + +Bases: `RemoteWorkspaceMixin`, [`BaseWorkspace`](#class-baseworkspace) + +Remote workspace implementation that connects to an OpenHands agent server. + +RemoteWorkspace provides access to a sandboxed environment running on a remote +OpenHands agent server. This is the recommended approach for production deployments +as it provides better isolation and security. + +#### Example + +```pycon +>>> workspace = RemoteWorkspace( +... host="https://agent-server.example.com", +... working_dir="/workspace" +... ) +>>> with workspace: +... result = workspace.execute_command("ls -la") +... content = workspace.read_file("README.md") +``` + + +#### Properties + +- `alive`: bool + Check if the remote workspace is alive by querying the health endpoint. + * Returns: + True if the health endpoint returns a successful response, False otherwise. +- `client`: Client + +#### Methods + +#### execute_command() + +Execute a bash command on the remote system. + +This method starts a bash command via the remote agent server API, +then polls for the output until the command completes. + +* Parameters: + * `command` – The bash command to execute + * `cwd` – Working directory (optional) + * `timeout` – Timeout in seconds +* Returns: + Result with stdout, stderr, exit_code, and other metadata +* Return type: + [CommandResult](#class-commandresult) + +#### file_download() + +Download a file from the remote system. + +Requests the file from the remote system via HTTP API and saves it locally. + +* Parameters: + * `source_path` – Path to the source file on remote system + * `destination_path` – Path where the file should be saved locally +* Returns: + Result with success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### file_upload() + +Upload a file to the remote system. + +Reads the local file and sends it to the remote system via HTTP API. + +* Parameters: + * `source_path` – Path to the local source file + * `destination_path` – Path where the file should be uploaded on remote system +* Returns: + Result with success status and metadata +* Return type: + [FileOperationResult](#class-fileoperationresult) + +#### git_changes() + +Get the git changes for the repository at the path given. + +* Parameters: + `path` – Path to the git repository +* Returns: + List of changes +* Return type: + list[GitChange] +* Raises: + `Exception` – If path is not a git repository or getting changes failed + +#### git_diff() + +Get the git diff for the file at the path given. + +* Parameters: + `path` – Path to the file +* Returns: + Git diff +* Return type: + GitDiff +* Raises: + `Exception` – If path is not a git repository or getting diff failed + +#### model_config = (configuration object) + +Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. + +#### model_post_init() + +Override this method to perform additional initialization after __init__ and model_construct. +This is useful if you want to do some validation that requires the entire model to be initialized. + +#### reset_client() + +Reset the HTTP client to force re-initialization. + +This is useful when connection parameters (host, api_key) have changed +and the client needs to be recreated with new values. + +### class Workspace + +### class Workspace + +Bases: `object` + +Factory entrypoint that returns a LocalWorkspace or RemoteWorkspace. + +Usage: +: - Workspace(working_dir=…) -> LocalWorkspace + - Workspace(working_dir=…, host=”http://…”) -> RemoteWorkspace + +### Agent +Source: https://docs.openhands.dev/sdk/arch/agent.md + +The **Agent** component implements the core reasoning-action loop that drives autonomous task execution. It orchestrates LLM queries, tool execution, and context management through a stateless, event-driven architecture. + +**Source:** [`openhands-sdk/openhands/sdk/agent/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/agent) + +## Core Responsibilities + +The Agent system has four primary responsibilities: + +1. **Reasoning-Action Loop** - Query LLM to generate next actions based on conversation history +2. **Tool Orchestration** - Select and execute tools, handle results and errors +3. **Context Management** - Apply [skills](/sdk/guides/skill), manage conversation history via [condensers](/sdk/guides/context-condenser) +4. **Security Validation** - Analyze proposed actions for safety before execution via [security analyzer](/sdk/guides/security) + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 50}} }%% +flowchart TB + subgraph Input[" "] + Events["Event History"] + Context["Agent Context
Skills + Prompts"] + end + + subgraph Core["Agent Core"] + Condense["Condenser
History compression"] + Reason["LLM Query
Generate actions"] + Security["Security Analyzer
Risk assessment"] + end + + subgraph Execution[" "] + Tools["Tool Executor
Action → Observation"] + Results["Observation Events"] + end + + Events --> Condense + Context -.->|Skills| Reason + Condense --> Reason + Reason --> Security + Security --> Tools + Tools --> Results + Results -.->|Feedback| Events + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Reason primary + class Condense,Security secondary + class Tools tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Agent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py)** | Main implementation | Stateless reasoning-action loop executor | +| **[`AgentBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py)** | Abstract base class | Defines agent interface and initialization | +| **[`AgentContext`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/agent_context.py)** | Context container | Manages skills, prompts, and metadata | +| **[`Condenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/)** | History compression | Reduces context when token limits approached | +| **[`SecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/)** | Safety validation | Evaluates action risk before execution | + +## Reasoning-Action Loop + +The agent operates through a **single-step execution model** where each `step()` call processes one reasoning cycle: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 10, "rankSpacing": 10}} }%% +flowchart TB + Start["step() called"] + Pending{"Pending
actions?"} + ExecutePending["Execute pending actions"] + + HasCondenser{"Has
condenser?"} + Condense["Call condenser.condense()"] + CondenseResult{"Result
type?"} + EmitCondensation["Emit Condensation event"] + UseView["Use View events"] + UseRaw["Use raw events"] + + Query["Query LLM with messages"] + ContextExceeded{"Context
window
exceeded?"} + EmitRequest["Emit CondensationRequest"] + + Parse{"Response
type?"} + CreateActions["Create ActionEvents"] + CreateMessage["Create MessageEvent"] + + Confirmation{"Need
confirmation?"} + SetWaiting["Set WAITING_FOR_CONFIRMATION"] + + Execute["Execute actions"] + Observe["Create ObservationEvents"] + + Return["Return"] + + Start --> Pending + Pending -->|Yes| ExecutePending --> Return + Pending -->|No| HasCondenser + + HasCondenser -->|Yes| Condense + HasCondenser -->|No| UseRaw + Condense --> CondenseResult + CondenseResult -->|Condensation| EmitCondensation --> Return + CondenseResult -->|View| UseView --> Query + UseRaw --> Query + + Query --> ContextExceeded + ContextExceeded -->|Yes| EmitRequest --> Return + ContextExceeded -->|No| Parse + + Parse -->|Tool calls| CreateActions + Parse -->|Message| CreateMessage --> Return + + CreateActions --> Confirmation + Confirmation -->|Yes| SetWaiting --> Return + Confirmation -->|No| Execute + + Execute --> Observe + Observe --> Return + + style Query fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Condense fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Confirmation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Step Execution Flow:** + +1. **Pending Actions:** If actions awaiting confirmation exist, execute them and return +2. **Condensation:** If condenser exists: + - Call `condenser.condense()` with current event view + - If returns `View`: use condensed events for LLM query (continue in same step) + - If returns `Condensation`: emit event and return (will be processed next step) +3. **LLM Query:** Query LLM with messages from event history + - If context window exceeded: emit `CondensationRequest` and return +4. **Response Parsing:** Parse LLM response into events + - Tool calls → create `ActionEvent`(s) + - Text message → create `MessageEvent` and return +5. **Confirmation Check:** If actions need user approval: + - Set conversation status to `WAITING_FOR_CONFIRMATION` and return +6. **Action Execution:** Execute tools and create `ObservationEvent`(s) + +**Key Characteristics:** +- **Stateless:** Agent holds no mutable state between steps +- **Event-Driven:** Reads from event history, writes new events +- **Interruptible:** Each step is atomic and can be paused/resumed + +## Agent Context + +The agent applies `AgentContext` which includes **skills** and **prompts** to shape LLM behavior: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Context["AgentContext"] + + subgraph Skills["Skills"] + Repo["repo
Always active"] + Knowledge["knowledge
Trigger-based"] + end + SystemAug["System prompt prefix/suffix
Per-conversation"] + System["Prompt template
Per-conversation"] + + subgraph Application["Applied to LLM"] + SysPrompt["System Prompt"] + UserMsg["User Messages"] + end + + Context --> Skills + Context --> SystemAug + Repo --> SysPrompt + Knowledge -.->|When triggered| UserMsg + System --> SysPrompt + SystemAug --> SysPrompt + + style Context fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Repo fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Knowledge fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +| Skill Type | Activation | Use Case | +|------------|------------|----------| +| **repo** | Always included | Project-specific context, conventions | +| **knowledge** | Trigger words/patterns | Domain knowledge, special behaviors | + +Review [this guide](/sdk/guides/skill) for details on creating and applying agent context and skills. + + +## Tool Execution + +Tools follow a **strict action-observation pattern**: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + LLM["LLM generates tool_call"] + Convert["Convert to ActionEvent"] + + Decision{"Confirmation
mode?"} + Defer["Store as pending"] + + Execute["Execute tool"] + Success{"Success?"} + + Obs["ObservationEvent
with result"] + Error["ObservationEvent
with error"] + + LLM --> Convert + Convert --> Decision + + Decision -->|Yes| Defer + Decision -->|No| Execute + + Execute --> Success + Success -->|Yes| Obs + Success -->|No| Error + + style Convert fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Execution Modes:** + +| Mode | Behavior | Use Case | +|------|----------|----------| +| **Direct** | Execute immediately | Development, trusted environments | +| **Confirmation** | Store as pending, wait for user approval | High-risk actions, production | + +**Security Integration:** + +Before execution, the security analyzer evaluates each action: +- **Low Risk:** Execute immediately +- **Medium Risk:** Log warning, execute with monitoring +- **High Risk:** Block execution, request user confirmation + +## Component Relationships + +### How Agent Interacts + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Conv["Conversation"] + LLM["LLM"] + Tools["Tools"] + Context["AgentContext"] + + Conv -->|.step calls| Agent + Agent -->|Reads events| Conv + Agent -->|Query| LLM + Agent -->|Execute| Tools + Context -.->|Skills and Context| Agent + Agent -.->|New events| Conv + + style Agent fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Conv fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style LLM fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Conversation → Agent**: Orchestrates step execution, provides event history +- **Agent → LLM**: Queries for next actions, receives tool calls or messages +- **Agent → Tools**: Executes actions, receives observations +- **AgentContext → Agent**: Injects skills and prompts into LLM queries + + +## See Also + +- **[Conversation Architecture](/sdk/arch/conversation)** - Agent orchestration and lifecycle +- **[Tool System](/sdk/arch/tool-system)** - Tool definition and execution patterns +- **[Events](/sdk/arch/events)** - Event types and structures +- **[Skills](/sdk/arch/skill)** - Prompt engineering and skill patterns +- **[LLM](/sdk/arch/llm)** - Language model abstraction + +### Agent Server Package +Source: https://docs.openhands.dev/sdk/arch/agent-server.md + +The Agent Server package (`openhands.agent_server`) provides an HTTP API server for remote agent execution. It enables building multi-user systems, SaaS products, and distributed agent platforms. + +**Source**: [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) + +## Purpose + +The Agent Server enables: +- **Remote execution**: Clients interact with agents via HTTP API +- **Multi-user isolation**: Each user gets isolated workspace +- **Container orchestration**: Manages Docker containers for workspaces +- **Centralized management**: Monitor and control all agents +- **Scalability**: Horizontal scaling with multiple servers + +## Architecture Overview + +```mermaid +graph TB + Client[Web/Mobile Client] -->|HTTPS| API[FastAPI Server] + + API --> Auth[Authentication] + API --> Router[API Router] + + Router --> WS[Workspace Manager] + Router --> Conv[Conversation Handler] + + WS --> Docker[Docker Manager] + Docker --> C1[Container 1
User A] + Docker --> C2[Container 2
User B] + Docker --> C3[Container 3
User C] + + Conv --> Agent[Software Agent SDK] + Agent --> C1 + Agent --> C2 + Agent --> C3 + + style Client fill:#e1f5fe + style API fill:#fff3e0 + style WS fill:#e8f5e8 + style Docker fill:#f3e5f5 + style Agent fill:#fce4ec +``` + +### Key Components + +**1. FastAPI Server** +- HTTP REST API endpoints +- Authentication and authorization +- Request validation +- WebSocket support for streaming + +**2. Workspace Manager** +- Creates and manages Docker containers +- Isolates workspaces per user +- Handles container lifecycle +- Manages resource limits + +**3. Conversation Handler** +- Routes requests to appropriate workspace +- Manages conversation state +- Handles concurrent requests +- Supports streaming responses + +**4. Docker Manager** +- Interfaces with Docker daemon +- Builds and pulls images +- Creates and destroys containers +- Monitors container health + +## Design Decisions + +### Why HTTP API? + +Alternative approaches considered: +- **gRPC**: More efficient but harder for web clients +- **WebSockets only**: Good for streaming but not RESTful +- **HTTP + WebSockets**: Best of both worlds + +**Decision**: HTTP REST for operations, WebSockets for streaming +- ✅ Works from any client (web, mobile, CLI) +- ✅ Easy to debug (curl, Postman) +- ✅ Standard authentication (API keys, OAuth) +- ✅ Streaming where needed + +### Why Container Per User? + +Alternative approaches: +- **Shared container**: Multiple users in one container +- **Container per session**: New container each conversation +- **Container per user**: One container per user (chosen) + +**Decision**: Container per user +- ✅ Strong isolation between users +- ✅ Persistent workspace across sessions +- ✅ Better resource management +- ⚠️ More containers, but worth it for isolation + +### Why FastAPI? + +Alternative frameworks: +- **Flask**: Simpler but less type-safe +- **Django**: Too heavyweight +- **FastAPI**: Modern, fast, type-safe (chosen) + +**Decision**: FastAPI +- ✅ Automatic API documentation (OpenAPI) +- ✅ Type validation with Pydantic +- ✅ Async support for performance +- ✅ WebSocket support built-in + +## API Design + +### Key Endpoints + +**Workspace Management** +``` +POST /workspaces Create new workspace +GET /workspaces/{id} Get workspace info +DELETE /workspaces/{id} Delete workspace +POST /workspaces/{id}/execute Execute command +``` + +**Conversation Management** +``` +POST /conversations Create conversation +GET /conversations/{id} Get conversation +POST /conversations/{id}/messages Send message +GET /conversations/{id}/stream Stream responses (WebSocket) +``` + +**Health & Monitoring** +``` +GET /health Server health check +GET /metrics Prometheus metrics +``` + +### Authentication + +**API Key Authentication** +```bash +curl -H "Authorization: Bearer YOUR_API_KEY" \ + https://agent-server.example.com/conversations +``` + +**Per-user workspace isolation** +- API key → user ID mapping +- Each user gets separate workspace +- Users can't access each other's workspaces + +### Streaming Responses + +**WebSocket for real-time updates** +```python +async with websocket_connect(url) as ws: + # Send message + await ws.send_json({"message": "Hello"}) + + # Receive events + async for event in ws: + if event["type"] == "message": + print(event["content"]) +``` + +**Why streaming?** +- Real-time feedback to users +- Show agent thinking process +- Better UX for long-running tasks + +## Deployment Models + +### 1. Local Development + +Run server locally for testing: +```bash +# Start server +openhands-agent-server --port 8000 + +# Or with Docker +docker run -p 8000:8000 \ + -v /var/run/docker.sock:/var/run/docker.sock \ + ghcr.io/all-hands-ai/agent-server:latest +``` + +**Use case**: Development and testing + +### 2. Single-Server Deployment + +Deploy on one server (VPS, EC2, etc.): +```bash +# Install +pip install openhands-agent-server + +# Run with systemd/supervisor +openhands-agent-server \ + --host 0.0.0.0 \ + --port 8000 \ + --workers 4 +``` + +**Use case**: Small deployments, prototypes, MVPs + +### 3. Multi-Server Deployment + +Scale horizontally with load balancer: +``` + Load Balancer + | + +-------------+-------------+ + | | | + Server 1 Server 2 Server 3 + (Agents) (Agents) (Agents) + | | | + +-------------+-------------+ + | + Shared State Store + (Database, Redis, etc.) +``` + +**Use case**: Production SaaS, high traffic, need redundancy + +### 4. Kubernetes Deployment + +Container orchestration with Kubernetes: +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: agent-server +spec: + replicas: 3 + template: + spec: + containers: + - name: agent-server + image: ghcr.io/all-hands-ai/agent-server:latest + ports: + - containerPort: 8000 +``` + +**Use case**: Enterprise deployments, auto-scaling, high availability + +## Resource Management + +### Container Limits + +Set per-workspace resource limits: +```python +# In server configuration +WORKSPACE_CONFIG = { + "resource_limits": { + "memory": "2g", # 2GB RAM + "cpus": "2", # 2 CPU cores + "disk": "10g" # 10GB disk + }, + "timeout": 300, # 5 min timeout +} +``` + +**Why limit resources?** +- Prevent one user from consuming all resources +- Fair usage across users +- Protect server from runaway processes +- Cost control + +### Cleanup & Garbage Collection + +**Container lifecycle**: +- Containers created on first use +- Kept alive between requests (warm) +- Cleaned up after inactivity timeout +- Force cleanup on server shutdown + +**Storage management**: +- Old workspaces deleted automatically +- Disk usage monitored +- Alerts when approaching limits + +## Security Considerations + +### Multi-Tenant Isolation + +**Container isolation**: +- Each user gets separate container +- Containers can't communicate +- Network isolation (optional) +- File system isolation + +**API isolation**: +- API keys mapped to users +- Users can only access their workspaces +- Server validates all permissions + +### Input Validation + +**Server validates**: +- API request schemas +- Command injection attempts +- Path traversal attempts +- File size limits + +**Defense in depth**: +- API validation +- Container validation +- Docker security features +- OS-level security + +### Network Security + +**Best practices**: +- HTTPS only (TLS certificates) +- Firewall rules (only port 443/8000) +- Rate limiting +- DDoS protection + +**Container networking**: +```python +# Disable network for workspace +WORKSPACE_CONFIG = { + "network_mode": "none" # No network access +} + +# Or allow specific hosts +WORKSPACE_CONFIG = { + "allowed_hosts": ["api.example.com"] +} +``` + +## Monitoring & Observability + +### Health Checks + +```bash +# Simple health check +curl https://agent-server.example.com/health + +# Response +{ + "status": "healthy", + "docker": "connected", + "workspaces": 15, + "uptime": 86400 +} +``` + +### Metrics + +**Prometheus metrics**: +- Request count and latency +- Active workspaces +- Container resource usage +- Error rates + +**Logging**: +- Structured JSON logs +- Per-request tracing +- Workspace events +- Error tracking + +### Alerting + +**Alert on**: +- Server down +- High error rate +- Resource exhaustion +- Container failures + +## Client SDK + +Python SDK for interacting with Agent Server: + +```python +from openhands.client import AgentServerClient + +client = AgentServerClient( + url="https://agent-server.example.com", + api_key="your-api-key" +) + +# Create conversation +conversation = client.create_conversation() + +# Send message +response = client.send_message( + conversation_id=conversation.id, + message="Hello, agent!" +) + +# Stream responses +for event in client.stream_conversation(conversation.id): + if event.type == "message": + print(event.content) +``` + +**Client handles**: +- Authentication +- Request/response serialization +- Error handling +- Streaming +- Retries + +## Cost Considerations + +### Server Costs + +**Compute**: CPU and memory for containers +- Each active workspace = 1 container +- Typically 1-2 GB RAM per workspace +- 0.5-1 CPU core per workspace + +**Storage**: Workspace files and conversation state +- ~1-10 GB per workspace (depends on usage) +- Conversation history in database + +**Network**: API requests and responses +- Minimal (mostly text) +- Streaming adds bandwidth + +### Cost Optimization + +**1. Idle timeout**: Shutdown containers after inactivity +```python +WORKSPACE_CONFIG = { + "idle_timeout": 3600 # 1 hour +} +``` + +**2. Resource limits**: Don't over-provision +```python +WORKSPACE_CONFIG = { + "resource_limits": { + "memory": "1g", # Smaller limit + "cpus": "0.5" # Fractional CPU + } +} +``` + +**3. Shared resources**: Use single server for multiple low-traffic apps + +**4. Auto-scaling**: Scale servers based on demand + +## When to Use Agent Server + +### Use Agent Server When: + +✅ **Multi-user system**: Web app with many users +✅ **Remote clients**: Mobile app, web frontend +✅ **Centralized management**: Need to monitor all agents +✅ **Workspace isolation**: Users shouldn't interfere +✅ **SaaS product**: Building agent-as-a-service +✅ **Scaling**: Need to handle concurrent users + +**Examples**: +- Chatbot platforms +- Code assistant web apps +- Agent marketplaces +- Enterprise agent deployments + +### Use Standalone SDK When: + +✅ **Single-user**: Personal tool or script +✅ **Local execution**: Running on your machine +✅ **Full control**: Need programmatic access +✅ **Simpler deployment**: No server management +✅ **Lower latency**: No network overhead + +**Examples**: +- CLI tools +- Automation scripts +- Local development +- Desktop applications + +### Hybrid Approach + +Use SDK locally but RemoteAPIWorkspace for execution: +- Agent logic in your Python code +- Execution happens on remote server +- Best of both worlds + +## Building Custom Agent Server + +The server is extensible for custom needs: + +**Custom authentication**: +```python +from openhands.agent_server import AgentServer + +class CustomAgentServer(AgentServer): + async def authenticate(self, request): + # Custom auth logic + return await oauth_verify(request) +``` + +**Custom workspace configuration**: +```python +server = AgentServer( + workspace_factory=lambda user: DockerWorkspace( + image=f"custom-image-{user.tier}", + resource_limits=user.resource_limits + ) +) +``` + +**Custom middleware**: +```python +@server.middleware +async def logging_middleware(request, call_next): + # Custom logging + response = await call_next(request) + return response +``` + +## Next Steps + +### For Usage Examples + +- [Local Agent Server](/sdk/guides/agent-server/local-server) - Run locally +- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) - Docker setup +- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) - Remote API +- [Remote Agent Server Overview](/sdk/guides/agent-server/overview) - All options + +### For Related Architecture + +- [Workspace Architecture](/sdk/arch/workspace) - RemoteAPIWorkspace details +- [SDK Architecture](/sdk/arch/sdk) - Core framework +- [Architecture Overview](/sdk/arch/overview) - System design + +### For Implementation Details + +- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) - Server source +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples + +### Condenser +Source: https://docs.openhands.dev/sdk/arch/condenser.md + +The **Condenser** system manages conversation history compression to keep agent context within LLM token limits. It reduces long event histories into condensed summaries while preserving critical information for reasoning. For more details, read the [blog here](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). + +**Source:** [`openhands-sdk/openhands/sdk/context/condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) + +## Core Responsibilities + +The Condenser system has four primary responsibilities: + +1. **History Compression** - Reduce event lists to fit within context windows +2. **Threshold Detection** - Determine when condensation should trigger +3. **Summary Generation** - Create meaningful summaries via LLM or heuristics +4. **View Management** - Transform event history into LLM-ready views + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["CondenserBase
Abstract base"] + end + + subgraph Implementations["Concrete Implementations"] + NoOp["NoOpCondenser
No compression"] + LLM["LLMSummarizingCondenser
LLM-based"] + Pipeline["PipelineCondenser
Multi-stage"] + end + + subgraph Process["Condensation Process"] + View["View
Event history"] + Check["should_condense()?"] + Condense["get_condensation()"] + Result["View | Condensation"] + end + + subgraph Output["Condensation Output"] + CondEvent["Condensation Event
Summary metadata"] + NewView["Condensed View
Reduced tokens"] + end + + Base --> NoOp + Base --> LLM + Base --> Pipeline + + View --> Check + Check -->|Yes| Condense + Check -->|No| Result + Condense --> CondEvent + CondEvent --> NewView + NewView --> Result + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class LLM,Pipeline secondary + class Check,Condense tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`CondenserBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Abstract interface | Defines `condense()` contract | +| **[`RollingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)** | Rolling window base | Implements threshold-based triggering | +| **[`LLMSummarizingCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py)** | LLM summarization | Uses LLM to generate summaries | +| **[`NoOpCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/no_op_condenser.py)** | No-op implementation | Returns view unchanged | +| **[`PipelineCondenser`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/pipeline_condenser.py)** | Multi-stage pipeline | Chains multiple condensers | +| **[`View`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)** | Event view | Represents history for LLM | +| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation event | Metadata about compression | + +## Condenser Types + +### NoOpCondenser + +Pass-through condenser that performs no compression: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + View["View"] + NoOp["NoOpCondenser"] + Same["Same View"] + + View --> NoOp --> Same + + style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +### LLMSummarizingCondenser + +Uses an LLM to generate summaries of conversation history: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + View["Long View
120+ events"] + Check["Threshold
exceeded?"] + Summarize["LLM Summarization"] + Summary["Summary Text"] + Metadata["Condensation Event"] + AddToHistory["Add to History"] + NextStep["Next Step: View.from_events()"] + NewView["Condensed View"] + + View --> Check + Check -->|Yes| Summarize + Summarize --> Summary + Summary --> Metadata + Metadata --> AddToHistory + AddToHistory --> NextStep + NextStep --> NewView + + style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Summarize fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style NewView fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Process:** +1. **Check Threshold:** Compare view size to configured limit (e.g., event count > `max_size`) +2. **Select Events:** Identify events to keep (first N + last M) and events to summarize (middle) +3. **LLM Call:** Generate summary of middle events using dedicated LLM +4. **Create Event:** Wrap summary in `Condensation` event with `forgotten_event_ids` +5. **Add to History:** Agent adds `Condensation` to event log and returns early +6. **Next Step:** `View.from_events()` filters forgotten events and inserts summary + +**Configuration:** +- **`max_size`:** Event count threshold before condensation triggers (default: 120) +- **`keep_first`:** Number of initial events to preserve verbatim (default: 4) +- **`llm`:** LLM instance for summarization (often cheaper model than reasoning LLM) + +### PipelineCondenser + +Chains multiple condensers in sequence: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + View["Original View"] + C1["Condenser 1"] + C2["Condenser 2"] + C3["Condenser 3"] + Final["Final View"] + + View --> C1 --> C2 --> C3 --> Final + + style C1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style C2 fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style C3 fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Use Case:** Multi-stage compression (e.g., remove old events, then summarize, then truncate) + +## Condensation Flow + +### Trigger Mechanisms + +Condensers can be triggered in two ways: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Automatic["Automatic Trigger"] + Agent1["Agent Step"] + Build1["View.from_events()"] + Check1["condenser.condense(view)"] + Trigger1["should_condense()?"] + end + + Agent1 --> Build1 --> Check1 --> Trigger1 + + style Check1 fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Automatic Trigger:** +- **When:** Threshold exceeded (e.g., event count > `max_size`) +- **Who:** Agent calls `condenser.condense()` each step +- **Purpose:** Proactively keep context within limits + + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Manual["Manual Trigger"] + Error["LLM Context Error"] + Request["CondensationRequest Event"] + NextStep["Next Agent Step"] + Trigger2["condense() detects request"] + end + + Error --> Request --> NextStep --> Trigger2 + + style Request fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` +**Manual Trigger:** +- **When:** `CondensationRequest` event added to history (via `view.unhandled_condensation_request`) +- **Who:** Agent (on LLM context window error) or application code +- **Purpose:** Force compression when context limit exceeded + +### Condensation Workflow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["Agent calls condense(view)"] + + Decision{"should_condense?"} + + ReturnView["Return View
Agent proceeds"] + + Extract["Select Events to Keep/Forget"] + Generate["LLM Generates Summary"] + Create["Create Condensation Event"] + ReturnCond["Return Condensation"] + AddHistory["Agent adds to history"] + NextStep["Next Step: View.from_events()"] + FilterEvents["Filter forgotten events"] + InsertSummary["Insert summary at offset"] + NewView["New condensed view"] + + Start --> Decision + Decision -->|No| ReturnView + Decision -->|Yes| Extract + Extract --> Generate + Generate --> Create + Create --> ReturnCond + ReturnCond --> AddHistory + AddHistory --> NextStep + NextStep --> FilterEvents + FilterEvents --> InsertSummary + InsertSummary --> NewView + + style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Generate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Create fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Key Steps:** + +1. **Threshold Check:** `should_condense()` determines if condensation needed +2. **Event Selection:** Identify events to keep (head + tail) vs forget (middle) +3. **Summary Generation:** LLM creates compressed representation of forgotten events +4. **Condensation Creation:** Create `Condensation` event with `forgotten_event_ids` and summary +5. **Return to Agent:** Condenser returns `Condensation` (not `View`) +6. **History Update:** Agent adds `Condensation` to event log and exits step +7. **Next Step:** `View.from_events()` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/view.py)) processes Condensation to filter events and insert summary + +## View and Condensation + +### View Structure + +A `View` represents the conversation history as it will be sent to the LLM: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Full Event List
+ Condensation events"] + FromEvents["View.from_events()"] + Filter["Filter forgotten events"] + Insert["Insert summary"] + View["View
LLMConvertibleEvents"] + Convert["events_to_messages()"] + LLM["LLM Input"] + + Events --> FromEvents + FromEvents --> Filter + Filter --> Insert + Insert --> View + View --> Convert + Convert --> LLM + + style View fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style FromEvents fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**View Components:** +- **`events`:** List of `LLMConvertibleEvent` objects (filtered by Condensation) +- **`unhandled_condensation_request`:** Flag for pending manual condensation +- **`condensations`:** List of all Condensation events processed +- **Methods:** `from_events()` creates view from raw events, handling Condensation semantics + +### Condensation Event + +When condensation occurs, a `Condensation` event is created: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Old["Middle Events
~60 events"] + Summary["Summary Text
LLM-generated"] + Event["Condensation Event
forgotten_event_ids"] + Applied["View.from_events()"] + New["New View
~60 events + summary"] + + Old -.->|Summarized| Summary + Summary --> Event + Event --> Applied + Applied --> New + + style Event fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Summary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Condensation Fields:** +- **`forgotten_event_ids`:** List of event IDs to filter out +- **`summary`:** Compressed text representation of forgotten events +- **`summary_offset`:** Index where summary event should be inserted +- Inherits from `Event`: `id`, `timestamp`, `source` + +## Rolling Window Pattern + +`RollingCondenser` implements a common pattern for threshold-based condensation: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + View["Current View
120+ events"] + Check["Count Events"] + + Compare{"Count >
max_size?"} + + Keep["Keep All Events"] + + Split["Split Events"] + Head["Head
First 4 events"] + Middle["Middle
~56 events"] + Tail["Tail
~56 events"] + Summarize["LLM Summarizes Middle"] + Result["Head + Summary + Tail
~60 events total"] + + View --> Check + Check --> Compare + + Compare -->|Under| Keep + Compare -->|Over| Split + + Split --> Head + Split --> Middle + Split --> Tail + + Middle --> Summarize + Head --> Result + Summarize --> Result + Tail --> Result + + style Compare fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Split fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Summarize fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Rolling Window Strategy:** +1. **Keep Head:** Preserve first `keep_first` events (default: 4) - usually system prompts +2. **Keep Tail:** Preserve last `target_size - keep_first - 1` events - recent context +3. **Summarize Middle:** Compress events between head and tail into summary +4. **Target Size:** After condensation, view has `max_size // 2` events (default: 60) + +## Component Relationships + +### How Condenser Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Agent["Agent"] + Condenser["Condenser"] + State["Conversation State"] + Events["Event Log"] + + Agent -->|"View.from_events()"| State + State -->|View| Agent + Agent -->|"condense(view)"| Condenser + Condenser -->|"View | Condensation"| Agent + Agent -->|Adds Condensation| Events + + style Condenser fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → State**: Calls `View.from_events()` to get current view +- **Agent → Condenser**: Calls `condense(view)` each step if condenser registered +- **Condenser → Agent**: Returns `View` (proceed) or `Condensation` (defer) +- **Agent → Events**: Adds `Condensation` event to log when returned + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use condensers during reasoning +- **[Conversation Architecture](/sdk/arch/conversation)** - View generation and event management +- **[Events](/sdk/arch/events)** - Condensation event type and append-only log +- **[Context Condenser Guide](/sdk/guides/context-condenser)** - Configuring and using condensers + +### Conversation +Source: https://docs.openhands.dev/sdk/arch/conversation.md + +The **Conversation** component orchestrates agent execution through structured message flows and state management. It serves as the primary interface for interacting with agents, managing their lifecycle from initialization to completion. + +**Source:** [`openhands-sdk/openhands/sdk/conversation/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/conversation) + +## Core Responsibilities + +The Conversation system has four primary responsibilities: + +1. **Agent Lifecycle Management** - Initialize, run, pause, and terminate agents +2. **State Orchestration** - Maintain conversation history, events, and execution status +3. **Workspace Coordination** - Bridge agent operations with execution environments +4. **Runtime Services** - Provide persistence, monitoring, security, and visualization + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart LR + User["User Code"] + + subgraph Factory[" "] + Entry["Conversation()"] + end + + subgraph Implementations[" "] + Local["LocalConversation
Direct execution"] + Remote["RemoteConversation
Via agent-server API"] + end + + subgraph Core[" "] + State["ConversationState
• agent
workspace • stats • ..."] + EventLog["ConversationState.events
Event storage"] + end + + User --> Entry + Entry -.->|LocalWorkspace| Local + Entry -.->|RemoteWorkspace| Remote + + Local --> State + Remote --> State + + State --> EventLog + + classDef factory fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef impl fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef core fill:#fff4df,stroke:#b7791f,stroke-width:2px + classDef service fill:#e9f9ef,stroke:#2f855a,stroke-width:1.5px + + class Entry factory + class Local,Remote impl + class State,EventLog core + class Persist,Stuck,Viz,Secrets service +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)** | Unified entrypoint | Returns correct implementation based on workspace type | +| **[`LocalConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py)** | Local execution | Runs agent directly in process | +| **[`RemoteConversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** | Remote execution | Delegates to agent-server via HTTP/WebSocket | +| **[`ConversationState`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | State container | Pydantic model with validation and serialization | +| **[`EventLog`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Event storage | Immutable append-only store with efficient queries | + +## Factory Pattern + +The [`Conversation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py) class automatically selects the correct implementation based on workspace type: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Input["Conversation(agent, workspace)"] + Check{Workspace Type?} + Local["LocalConversation
Agent runs in-process"] + Remote["RemoteConversation
Agent runs via API"] + + Input --> Check + Check -->|str or LocalWorkspace| Local + Check -->|RemoteWorkspace| Remote + + style Input fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Remote fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Dispatch Logic:** +- **Local:** String paths or `LocalWorkspace` → in-process execution +- **Remote:** `RemoteWorkspace` → agent-server via HTTP/WebSocket + +This abstraction enables switching deployment modes without code changes—just swap the workspace type. + +## State Management + +State updates follow a **two-path pattern** depending on the type of change: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["State Update Request"] + Lock["Acquire FIFO Lock"] + Decision{New Event?} + + StateOnly["Update State Fields
stats, status, metadata"] + EventPath["Append to Event Log
messages, actions, observations"] + + Callback["Trigger Callbacks"] + Release["Release Lock"] + + Start --> Lock + Lock --> Decision + Decision -->|No| StateOnly + Decision -->|Yes| EventPath + StateOnly --> Callback + EventPath --> Callback + Callback --> Release + + style Decision fill:#fff4df,stroke:#b7791f,stroke-width:2px + style EventPath fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style StateOnly fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Two Update Patterns:** + +1. **State-Only Updates** - Modify fields without appending events (e.g., status changes, stat increments) +2. **Event-Based Updates** - Append to event log when new messages, actions, or observations occur + +**Thread Safety:** +- FIFO Lock ensures ordered, atomic updates +- Callbacks fire after successful commit +- Read operations never block writes + +## Execution Models + +The conversation system supports two execution models with identical APIs: + +### Local vs Remote Execution + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Local["LocalConversation"] + L1["User sends message"] + L2["Agent executes in-process"] + L3["Direct tool calls"] + L4["Events via callbacks"] + L1 --> L2 --> L3 --> L4 + end + style Local fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Remote["RemoteConversation"] + R1["User sends message"] + R2["HTTP → Agent Server"] + R3["Isolated container execution"] + R4["WebSocket event stream"] + R1 --> R2 --> R3 --> R4 + end + style Remote fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +| Aspect | LocalConversation | RemoteConversation | +|--------|-------------------|-------------------| +| **Execution** | In-process | Remote container/server | +| **Communication** | Direct function calls | HTTP + WebSocket | +| **State Sync** | Immediate | Network serialized | +| **Use Case** | Development, CLI tools | Production, web apps | +| **Isolation** | Process-level | Container-level | + +**Key Insight:** Same API surface means switching between local and remote requires only changing workspace type—no code changes. + +## Auxiliary Services + +The conversation system provides pluggable services that operate independently on the event stream: + +| Service | Purpose | Architecture Pattern | +|---------|---------|---------------------| +| **[Event Log](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/event_store.py)** | Append-only immutable storage | Event sourcing with indexing | +| **[Persistence](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py)** | Auto-save & resume | Debounced writes, incremental events | +| **[Stuck Detection](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py)** | Loop prevention | Sliding window pattern matching | +| **[Visualization](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/visualizer/)** | Execution diagrams | Event stream → visual representation | +| **[Secret Registry](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/secret_registry.py)** | Secure value storage | Memory-only with masked logging | + +**Design Principle:** Services read from the event log but never mutate state directly. This enables: +- Services can be enabled/disabled independently +- Easy to add new services without changing core orchestration +- Event stream acts as the integration point + +## Component Relationships + +### How Conversation Interacts + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Conv["Conversation"] + Agent["Agent"] + WS["Workspace"] + Tools["Tools"] + LLM["LLM"] + + Conv -->|Delegates to| Agent + Conv -->|Configures| WS + Agent -.->|Updates| Conv + Agent -->|Uses| Tools + Agent -->|Queries| LLM + + style Conv fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style WS fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Conversation → Agent**: One-way orchestration, agent reports back via state updates +- **Conversation → Workspace**: Configuration only, workspace doesn't know about conversation +- **Agent → Conversation**: Indirect via state events + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - Agent reasoning loop design +- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environment design +- **[Event System](/sdk/arch/events)** - Event types and flow +- **[Conversation Usage Guide](/sdk/guides/convo-persistence)** - Practical examples + +### Design Principles +Source: https://docs.openhands.dev/sdk/arch/design.md + +The **OpenHands Software Agent SDK** is part of the [OpenHands V1](https://openhands.dev/blog/the-path-to-openhands-v1) effort — a complete architectural rework based on lessons from **OpenHands V0**, one of the most widely adopted open-source coding agents. + +[Over the last eighteen months](https://openhands.dev/blog/one-year-of-openhands-a-journey-of-open-source-ai-development), OpenHands V0 evolved from a scrappy prototype into a widely used open-source coding agent. The project grew to tens of thousands of GitHub stars, hundreds of contributors, and multiple production deployments. That growth exposed architectural tensions — tight coupling between research and production, mandatory sandboxing, mutable state, and configuration sprawl — which informed the design principles of agent-sdk in V1. + +## Optional Isolation over Mandatory Sandboxing + + +**V0 Challenge:** +Every tool call in V0 executed in a sandboxed Docker container by default. While this guaranteed reproducibility and security, it also created friction — the agent and sandbox ran as separate processes, states diverged easily, and multi-tenant workloads could crash each other. +Moreover, with the rise of the Model Context Protocol (MCP), which assumes local execution and direct access to user environments, V0's rigid isolation model became incompatible. + + +**V1 Principle:** +**Sandboxing should be opt-in, not universal.** +V1 unifies agent and tool execution within a single process by default, aligning with MCP's local-execution model. +When isolation is needed, the same stack can be transparently containerized, maintaining flexibility without complexity. + +## Stateless by Default, One Source of Truth for State + + +**V0 Challenge:** +V0 relied on mutable Python objects and dynamic typing, which led to silent inconsistencies — failed session restores, version drift, and non-deterministic behavior. Each subsystem tracked its own transient state, making debugging and recovery painful. + + +**V1 Principle:** +**Keep everything stateless, with exactly one mutable state.** +All components (agents, tools, LLMs, and configurations) are immutable Pydantic models validated at construction. +The only mutable entity is the [conversation state](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py), a single source of truth that enables deterministic replay and robust persistence across sessions or distributed systems. + +## Clear Boundaries between Agent and Applications + + +**V0 Challenge:** +The same codebase powered the CLI, web interface, and integrations (e.g., Github, Gitlab, etc). Over time, application-specific conditionals and prompts polluted the agent core, making it brittle. +Heavy research dependencies and benchmark integrations further bloated production builds. + + +**V1 Principle:** +**Maintain strict separation of concerns.** +V1 divides the system into stable, isolated layers: the [SDK (agent core)](/sdk/arch/overview#1-sdk-%E2%80%93-openhands-sdk), [tools (set of tools)](/sdk/arch/overview#2-tools-%E2%80%93-openhands-tools), [workspace (sandbox)](/sdk/arch/overview#3-workspace-%E2%80%93-openhands-workspace), and [agent server (server that runs inside sandbox)](/sdk/arch/overview#4-agent-server-%E2%80%93-openhands-agent-server). +Applications communicate with the agent via APIs rather than embedding it directly, ensuring research and production can evolve independently. + + +## Composable Components for Extensibility + + +**V0 Challenge:** +Because agent logic was hard-coded into the core application, extending behavior (e.g., adding new tools or entry points) required branching logic for different entrypoints. This rigidity limited experimentation and discouraged contributions. + + +**V1 Principle:** +**Everything should be composable and safe to extend.** +Agents are defined as graphs of interchangeable components—tools, prompts, LLMs, and contexts—each described declaratively with strong typing. +Developers can reconfigure capabilities (e.g., swap toolsets, override prompts, add delegation logic) without modifying core code, preserving stability while fostering rapid innovation. + +### Events +Source: https://docs.openhands.dev/sdk/arch/events.md + +The **Event System** provides an immutable, type-safe event framework that drives agent execution and state management. Events form an append-only log that serves as both the agent's memory and the integration point for auxiliary services. + +**Source:** [`openhands-sdk/openhands/sdk/event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) + +## Core Responsibilities + +The Event System has four primary responsibilities: + +1. **Type Safety** - Enforce event schemas through Pydantic models +2. **LLM Integration** - Convert events to/from LLM message formats +3. **Append-Only Log** - Maintain immutable event history +4. **Service Integration** - Enable observers to react to event streams + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 80}} }%% +flowchart TB + Base["Event
Base class"] + LLMBase["LLMConvertibleEvent
Abstract base"] + + subgraph LLMTypes["LLM-Convertible Events
Visible to the LLM"] + Message["MessageEvent
User/assistant text"] + Action["ActionEvent
Tool calls"] + System["SystemPromptEvent
Initial system prompt"] + CondSummary["CondensationSummaryEvent
Condenser summary"] + + ObsBase["ObservationBaseEvent
Base for tool responses"] + Observation["ObservationEvent
Tool results"] + UserReject["UserRejectObservation
User rejected action"] + AgentError["AgentErrorEvent
Agent error"] + end + + subgraph Internals["Internal Events
NOT visible to the LLM"] + ConvState["ConversationStateUpdateEvent
State updates"] + CondReq["CondensationRequest
Request compression"] + Cond["Condensation
Compression result"] + Pause["PauseEvent
User pause"] + end + + Base --> LLMBase + Base --> Internals + LLMBase --> LLMTypes + ObsBase --> Observation + ObsBase --> UserReject + ObsBase --> AgentError + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base,LLMBase,Message,Action,SystemPromptEvent primary + class ObsBase,Observation,UserReject,AgentError secondary + class ConvState,CondReq,Cond,Pause tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Event`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | Base event class | Immutable Pydantic model with ID, timestamp, source | +| **[`LLMConvertibleEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/base.py)** | LLM-compatible events | Abstract class with `to_llm_message()` method | +| **[`MessageEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/message.py)** | Text messages | User or assistant conversational messages with skills | +| **[`ActionEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py)** | Tool calls | Agent tool invocations with thought, reasoning, security risk | +| **[`ObservationBaseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool response base | Base for all tool call responses | +| **[`ObservationEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Tool results | Successful tool execution outcomes | +| **[`UserRejectObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | User rejection | User rejected action in confirmation mode | +| **[`AgentErrorEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py)** | Agent errors | Errors from agent/scaffold (not model output) | +| **[`SystemPromptEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/system.py)** | System context | System prompt with tool schemas | +| **[`CondensationSummaryEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condenser summary | LLM-convertible summary of forgotten events | +| **[`ConversationStateUpdateEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_state.py)** | State updates | Key-value conversation state changes | +| **[`Condensation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Condensation result | Events being forgotten with optional summary | +| **[`CondensationRequest`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/condenser.py)** | Request compression | Trigger for conversation history compression | +| **[`PauseEvent`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/user_action.py)** | User pause | User requested pause of agent execution | + +## Event Types + +### LLM-Convertible Events + +Events that participate in agent reasoning and can be converted to LLM messages: + + +| Event Type | Source | Content | LLM Role | +|------------|--------|---------|----------| +| **MessageEvent (user)** | user | Text, images | `user` | +| **MessageEvent (agent)** | agent | Text reasoning, skills | `assistant` | +| **ActionEvent** | agent | Tool call with thought, reasoning, security risk | `assistant` with `tool_calls` | +| **ObservationEvent** | environment | Tool execution result | `tool` | +| **UserRejectObservation** | environment | Rejection reason | `tool` | +| **AgentErrorEvent** | agent | Error details | `tool` | +| **SystemPromptEvent** | agent | System prompt with tool schemas | `system` | +| **CondensationSummaryEvent** | environment | Summary of forgotten events | `user` | + +The event system bridges agent events to LLM messages: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Event List"] + Filter["Filter LLMConvertibleEvent"] + Group["Group ActionEvents
by llm_response_id"] + Convert["Convert to Messages"] + LLM["LLM Input"] + + Events --> Filter + Filter --> Group + Group --> Convert + Convert --> LLM + + style Filter fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Group fill:#fff4df,stroke:#b7791f,stroke-width:2px + style Convert fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Special Handling - Parallel Function Calling:** + +When multiple `ActionEvent`s share the same `llm_response_id` (parallel function calling): +1. Group all ActionEvents by `llm_response_id` +2. Combine into single Message with multiple `tool_calls` +3. Only first event's `thought`, `reasoning_content`, and `thinking_blocks` are included +4. All subsequent events in the batch have empty thought fields + +**Example:** +``` +ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) +ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) +→ Combined into single Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) +``` + + +### Internal Events + +Events for metadata, control flow, and user actions (not sent to LLM): + +| Event Type | Source | Purpose | Key Fields | +|------------|--------|---------|------------| +| **ConversationStateUpdateEvent** | environment | State synchronization | `key` (field name), `value` (serialized data) | +| **CondensationRequest** | environment | Trigger history compression | Signal to condenser when context window exceeded | +| **Condensation** | environment | Compression result | `forgotten_event_ids`, `summary`, `summary_offset` | +| **PauseEvent** | user | User pause action | Indicates agent execution was paused by user | + +**Source Types:** +- **user**: Event originated from user input +- **agent**: Event generated by agent logic +- **environment**: Event from system/framework/tools + +## Component Relationships + +### How Events Integrate + +## `source` vs LLM `role` + +Events often carry **two different concepts** that are easy to confuse: + +- **`Event.source`**: where the event *originated* (`user`, `agent`, or `environment`). This is about attribution. +- **LLM `role`** (e.g. `Message.role` / `MessageEvent.llm_message.role`): how the event should be represented to the LLM (`system`, `user`, `assistant`, `tool`). This is about LLM formatting. + +These fields are **intentionally independent**. + +Common examples include: + +- **Observations**: tool results are typically `source="environment"` and represented to the LLM with `role="tool"`. +- **Synthetic framework messages**: the SDK may inject feedback or control messages (e.g. from hooks) as `source="environment"` while still using an LLM `role="user"` so the agent reads it as a user-facing instruction. + +**Do not infer event origin from LLM role.** If you need to distinguish real user input from synthetic/framework messages, rely on `Event.source` (and any explicit metadata fields on the event), not the LLM role. + + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Events["Event System"] + Agent["Agent"] + Conversation["Conversation"] + Tools["Tools"] + Services["Auxiliary Services"] + + Agent -->|Reads| Events + Agent -->|Writes| Events + Conversation -->|Manages| Events + Tools -->|Creates| Events + Events -.->|Stream| Services + + style Events fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → Events**: Reads history for context, writes actions/messages +- **Conversation → Events**: Owns and persists event log +- **Tools → Events**: Create ObservationEvents after execution +- **Services → Events**: Read-only observers for monitoring, visualization + +## Error Events: Agent vs Conversation + +Two distinct error events exist in the SDK, with different purpose and visibility: + +- AgentErrorEvent + - Type: ObservationBaseEvent (LLM-convertible) + - Scope: Error for a specific tool call (has tool_name and tool_call_id) + - Source: "agent" + - LLM visibility: Sent as a tool message so the model can react/recover + - Effect: Conversation continues; not a terminal state + - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/observation.py + +- ConversationErrorEvent + - Type: Event (not LLM-convertible) + - Scope: Conversation-level runtime failure (no tool_name/tool_call_id) + - Source: typically "environment" + - LLM visibility: Not sent to the model + - Effect: Run loop transitions to ERROR and run() raises ConversationRunError; surface top-level error to client applications + - Code: https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/conversation_error.py + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents read and write events +- **[Conversation Architecture](/sdk/arch/conversation)** - Event log management +- **[Tool System](/sdk/arch/tool-system)** - ActionEvent and ObservationEvent generation +- **[Condenser](/sdk/arch/condenser)** - Event history compression + +### LLM +Source: https://docs.openhands.dev/sdk/arch/llm.md + +The **LLM** system provides a unified interface to language model providers through LiteLLM. It handles model configuration, request orchestration, retry logic, telemetry, and cost tracking across all providers. + +**Source:** [`openhands-sdk/openhands/sdk/llm/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/llm) + +## Core Responsibilities + +The LLM system has five primary responsibilities: + +1. **Provider Abstraction** - Uniform interface to OpenAI, Anthropic, Google, and 100+ providers +2. **Request Pipeline** - Dual API support: Chat Completions (`completion()`) and Responses API (`responses()`) +3. **Configuration Management** - Load from environment, JSON, or programmatic configuration +4. **Telemetry & Cost** - Track usage, latency, and costs across providers +5. **Enhanced Reasoning** - Support for OpenAI Responses API with encrypted thinking and reasoning summaries + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 70}} }%% +flowchart TB + subgraph Configuration["Configuration Sources"] + Env["Environment Variables
LLM_MODEL, LLM_API_KEY"] + JSON["JSON Files
config/llm.json"] + Code["Programmatic
LLM(...)"] + end + + subgraph Core["Core LLM"] + Model["LLM Model
Pydantic configuration"] + Pipeline["Request Pipeline
Retry, timeout, telemetry"] + end + + subgraph Backend["LiteLLM Backend"] + Providers["100+ Providers
OpenAI, Anthropic, etc."] + end + + subgraph Output["Telemetry"] + Usage["Token Usage"] + Cost["Cost Tracking"] + Latency["Latency Metrics"] + end + + Env --> Model + JSON --> Model + Code --> Model + + Model --> Pipeline + Pipeline --> Providers + + Pipeline --> Usage + Pipeline --> Cost + Pipeline --> Latency + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Model primary + class Pipeline secondary + class LiteLLM tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`LLM`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Configuration model | Pydantic model with provider settings | +| **[`completion()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Chat Completions API | Handles retries, timeouts, streaming | +| **[`responses()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py)** | Responses API | Enhanced reasoning with encrypted thinking | +| **[`LiteLLM`](https://github.com/BerriAI/litellm)** | Provider adapter | Unified API for 100+ providers | +| **Configuration Loaders** | Config hydration | `load_from_env()`, `load_from_json()` | +| **Telemetry** | Usage tracking | Token counts, costs, latency | + +## Configuration + +See [`LLM` source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/llm.py) for complete list of supported fields. + +### Programmatic Configuration + +Create LLM instances directly in code: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Code["Python Code"] + LLM["LLM(model=...)"] + Agent["Agent"] + + Code --> LLM + LLM --> Agent + + style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Example:** +```python +from pydantic import SecretStr +from openhands.sdk import LLM + +llm = LLM( + model="anthropic/claude-sonnet-4.1", + api_key=SecretStr("sk-ant-123"), + temperature=0.1, + timeout=120, +) +``` + +### Environment Variable Configuration + +Load from environment using naming convention: + +**Environment Variable Pattern:** +- **Prefix:** All variables start with `LLM_` +- **Mapping:** `LLM_FIELD` → `field` (lowercased) +- **Types:** Auto-cast to int, float, bool, JSON, or SecretStr + +**Common Variables:** +```bash +export LLM_MODEL="anthropic/claude-sonnet-4.1" +export LLM_API_KEY="sk-ant-123" +export LLM_USAGE_ID="primary" +export LLM_TIMEOUT="120" +export LLM_NUM_RETRIES="5" +``` + +### JSON Configuration + +Serialize and load from JSON files: + +**Example:** +```python +# Save +llm.model_dump_json(exclude_none=True, indent=2) + +# Load +llm = LLM.load_from_json("config/llm.json") +``` + +**Security:** Secrets are redacted in serialized JSON (combine with environment variables for sensitive data). +If you need to include secrets in JSON, use `llm.model_dump_json(exclude_none=True, context={"expose_secrets": True})`. + + +## Request Pipeline + +### Completion Flow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 20}} }%% +flowchart TB + Request["completion() or responses() call"] + Validate["Validate Config"] + + Attempt["LiteLLM Request"] + Success{"Success?"} + + Retry{"Retries
remaining?"} + Wait["Exponential Backoff"] + + Telemetry["Record Telemetry"] + Response["Return Response"] + Error["Raise Error"] + + Request --> Validate + Validate --> Attempt + Attempt --> Success + + Success -->|Yes| Telemetry + Success -->|No| Retry + + Retry -->|Yes| Wait + Retry -->|No| Error + + Wait --> Attempt + Telemetry --> Response + + style Attempt fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Retry fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Telemetry fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Pipeline Stages:** + +1. **Validation:** Check required fields (model, messages) +2. **Request:** Call LiteLLM with provider-specific formatting +3. **Retry Logic:** Exponential backoff on failures (configurable) +4. **Telemetry:** Record tokens, cost, latency +5. **Response:** Return completion or raise error + +### Responses API Support + +In addition to the standard chat completion API, the LLM system supports [OpenAI's Responses API](https://platform.openai.com/docs/api-reference/responses) as an alternative invocation path for models that benefit from this newer interface (e.g., GPT-5-Codex only supports Responses API). The Responses API provides enhanced reasoning capabilities with encrypted thinking and detailed reasoning summaries. + +#### Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Check{"Model supports
Responses API?"} + + subgraph Standard["Standard Path"] + ChatFormat["Format as
Chat Messages"] + ChatCall["litellm.completion()"] + end + + subgraph ResponsesPath["Responses Path"] + RespFormat["Format as
instructions + input[]"] + RespCall["litellm.responses()"] + end + + ChatResponse["ModelResponse"] + RespResponse["ResponsesAPIResponse"] + + Parse["Parse to Message"] + Return["LLMResponse"] + + Check -->|No| ChatFormat + Check -->|Yes| RespFormat + + ChatFormat --> ChatCall + RespFormat --> RespCall + + ChatCall --> ChatResponse + RespCall --> RespResponse + + ChatResponse --> Parse + RespResponse --> Parse + + Parse --> Return + + style RespFormat fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style RespCall fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +#### Supported Models + +Models that automatically use the Responses API path: + +| Pattern | Examples | Documentation | +|---------|----------|---------------| +| **gpt-5*** | `gpt-5`, `gpt-5-mini`, `gpt-5-codex` | OpenAI GPT-5 family | + +**Detection:** The SDK automatically detects if a model supports the Responses API using pattern matching in [`model_features.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/model_features.py). + + +## Provider Integration + +### LiteLLM Abstraction + +Software Agent SDK uses LiteLLM for provider abstraction: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + SDK["Software Agent SDK"] + LiteLLM["LiteLLM"] + + subgraph Providers["100+ Providers"] + OpenAI["OpenAI"] + Anthropic["Anthropic"] + Google["Google"] + Azure["Azure"] + Others["..."] + end + + SDK --> LiteLLM + LiteLLM --> OpenAI + LiteLLM --> Anthropic + LiteLLM --> Google + LiteLLM --> Azure + LiteLLM --> Others + + style LiteLLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style SDK fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Benefits:** +- **100+ Providers:** OpenAI, Anthropic, Google, Azure, AWS Bedrock, local models, etc. +- **Unified API:** Same interface regardless of provider +- **Format Translation:** Provider-specific request/response formatting +- **Error Handling:** Normalized error codes and messages + +### LLM Providers + +Provider integrations remain shared between the Software Agent SDK and the OpenHands Application. +The pages linked below live under the OpenHands app section but apply +verbatim to SDK applications because both layers wrap the same +`openhands.sdk.llm.LLM` interface. + +| Provider / scenario | Documentation | +| --- | --- | +| OpenHands hosted models | [/openhands/usage/llms/openhands-llms](/openhands/usage/llms/openhands-llms) | +| OpenAI | [/openhands/usage/llms/openai-llms](/openhands/usage/llms/openai-llms) | +| Azure OpenAI | [/openhands/usage/llms/azure-llms](/openhands/usage/llms/azure-llms) | +| Google Gemini / Vertex | [/openhands/usage/llms/google-llms](/openhands/usage/llms/google-llms) | +| Groq | [/openhands/usage/llms/groq](/openhands/usage/llms/groq) | +| OpenRouter | [/openhands/usage/llms/openrouter](/openhands/usage/llms/openrouter) | +| Moonshot | [/openhands/usage/llms/moonshot](/openhands/usage/llms/moonshot) | +| LiteLLM proxy | [/openhands/usage/llms/litellm-proxy](/openhands/usage/llms/litellm-proxy) | +| Local LLMs (Ollama, SGLang, vLLM, LM Studio) | [/openhands/usage/llms/local-llms](/openhands/usage/llms/local-llms) | +| Custom LLM configurations | [/openhands/usage/llms/custom-llm-configs](/openhands/usage/llms/custom-llm-configs) | + +When you follow any of those guides while building with the SDK, create an +`LLM` object using the documented parameters (for example, API keys, base URLs, +or custom headers) and pass it into your agent or registry. The OpenHands UI +surfacing is simply a convenience layer on top of the same configuration model. + + +## Telemetry and Cost Tracking + +### Telemetry Collection + +LLM requests automatically collect metrics: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Request["LLM Request"] + + subgraph Metrics + Tokens["Token Counts
Input/Output"] + Cost["Cost
USD"] + Latency["Latency
ms"] + end + + Events["Event Log"] + + Request --> Tokens + Request --> Cost + Request --> Latency + + Tokens --> Events + Cost --> Events + Latency --> Events + + style Metrics fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Tracked Metrics:** +- **Token Usage:** Input tokens, output tokens, total +- **Cost:** Per-request cost using configured rates +- **Latency:** Request duration in milliseconds +- **Errors:** Failure types and retry counts + +### Cost Configuration + +Configure per-token costs for custom models: + +```python +llm = LLM( + model="custom/my-model", + input_cost_per_token=0.00001, # $0.01 per 1K tokens + output_cost_per_token=0.00003, # $0.03 per 1K tokens +) +``` + +**Built-in Costs:** LiteLLM includes costs for major providers (updated regularly, [link](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)) + +**Custom Costs:** Override for: +- Internal models +- Custom pricing agreements +- Cost estimation for budgeting + +## Component Relationships + +### How LLM Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + LLM["LLM"] + Agent["Agent"] + Conversation["Conversation"] + Events["Events"] + Security["Security Analyzer"] + Condenser["Context Condenser"] + + Agent -->|Uses| LLM + LLM -->|Records| Events + Security -.->|Optional| LLM + Condenser -.->|Optional| LLM + Conversation -->|Provides context| Agent + + style LLM fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Events fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → LLM**: Agent uses LLM for reasoning and tool calls +- **LLM → Events**: LLM requests/responses recorded as events +- **Security → LLM**: Optional security analyzer can use separate LLM +- **Condenser → LLM**: Optional context condenser can use separate LLM +- **Configuration**: LLM configured independently, passed to agent +- **Telemetry**: LLM metrics flow through event system to UI/logging + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use LLMs for reasoning and perform actions +- **[Events](/sdk/arch/events)** - LLM request/response event types +- **[Security](/sdk/arch/security)** - Optional LLM-based security analysis +- **[Provider Setup Guides](/openhands/usage/llms/openai-llms)** - Provider-specific configuration + +### MCP Integration +Source: https://docs.openhands.dev/sdk/arch/mcp.md + +The **MCP Integration** system enables agents to use external tools via the Model Context Protocol (MCP). It provides a bridge between MCP servers and the Software Agent SDK's tool system, supporting both synchronous and asynchronous execution. + +**Source:** [`openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) + +## Core Responsibilities + +The MCP Integration system has four primary responsibilities: + +1. **MCP Client Management** - Connect to and communicate with MCP servers +2. **Tool Discovery** - Enumerate available tools from MCP servers +3. **Schema Adaptation** - Convert MCP tool schemas to SDK tool definitions +4. **Execution Bridge** - Execute MCP tool calls from agent actions + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart TB + subgraph Client["MCP Client"] + Sync["MCPClient
Sync/Async bridge"] + Async["AsyncMCPClient
FastMCP base"] + end + + subgraph Bridge["Tool Bridge"] + Def["MCPToolDefinition
Schema conversion"] + Exec["MCPToolExecutor
Execution handler"] + end + + subgraph Integration["Agent Integration"] + Action["MCPToolAction
Dynamic model"] + Obs["MCPToolObservation
Result wrapper"] + end + + subgraph External["External"] + Server["MCP Server
stdio/HTTP"] + Tools["External Tools"] + end + + Sync --> Async + Async --> Server + + Server --> Def + Def --> Exec + + Exec --> Action + Action --> Server + Server --> Obs + + Server -.->|Spawns| Tools + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Sync,Async primary + class Def,Exec secondary + class Action,Obs tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | Client wrapper | Extends FastMCP with sync/async bridge | +| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Tool metadata | Converts MCP schemas to SDK format | +| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP calls | +| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Dynamic action model | Runtime-generated Pydantic model | +| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results | + +## MCP Client + +### Sync/Async Bridge + +The SDK's `MCPClient` extends FastMCP's async client with synchronous wrappers: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Sync["Sync Code
Agent execution"] + Bridge["call_async_from_sync()"] + Executor["AsyncExecutor
Background loop"] + Async["Async MCP Call"] + Server["MCP Server"] + Result["Result"] + + Sync --> Bridge + Bridge --> Executor + Executor --> Async + Async --> Server + Server --> Result + Result --> Sync + + style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Executor fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Async fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Bridge Pattern:** +- **Problem:** MCP protocol is async, but agent tools run synchronously +- **Solution:** Background event loop that executes async code from sync contexts +- **Benefit:** Agents use MCP tools without async/await in tool definitions + +**Client Features:** +- **Lifecycle Management:** `__enter__`/`__exit__` for context manager +- **Timeout Support:** Configurable timeouts for MCP operations +- **Error Handling:** Wraps MCP errors in observations +- **Connection Pooling:** Reuses connections across tool calls + +### MCP Server Configuration + +MCP servers are configured using the FastMCP format: + +```python +mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] + } + } +} +``` + +**Configuration Fields:** +- **command:** Executable to spawn (e.g., `uvx`, `npx`, `node`) +- **args:** Arguments to pass to command +- **env:** Environment variables (optional) + +## Tool Discovery and Conversion + +### Discovery Flow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Config["MCP Config"] + Spawn["Spawn Server"] + List["List Tools"] + + subgraph Convert["Convert Each Tool"] + Schema["MCP Schema"] + Action["Generate Action Model"] + Def["Create ToolDefinition"] + end + + Register["Register in ToolRegistry"] + + Config --> Spawn + Spawn --> List + List --> Schema + + Schema --> Action + Action --> Def + Def --> Register + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Action fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Discovery Steps:** + +1. **Spawn Server:** Launch MCP server via stdio +2. **List Tools:** Call `tools/list` MCP endpoint +3. **Parse Schemas:** Extract tool names, descriptions, parameters +4. **Generate Models:** Dynamically create Pydantic models for actions +5. **Create Definitions:** Wrap in `ToolDefinition` objects +6. **Register:** Add to agent's tool registry + +### Schema Conversion + +MCP tool schemas are converted to SDK tool definitions: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + MCP["MCP Tool Schema
JSON Schema"] + Parse["Parse Parameters"] + Model["Dynamic Pydantic Model
MCPToolAction"] + Def["ToolDefinition
SDK format"] + + MCP --> Parse + Parse --> Model + Model --> Def + + style Parse fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Model fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Conversion Rules:** + +| MCP Schema | SDK Action Model | +|------------|------------------| +| **name** | Class name (camelCase) | +| **description** | Docstring | +| **inputSchema** | Pydantic fields | +| **required** | Field(required=True) | +| **type** | Python type hints | + +**Example:** + +```python +# MCP Schema +{ + "name": "fetch_url", + "description": "Fetch content from URL", + "inputSchema": { + "type": "object", + "properties": { + "url": {"type": "string"}, + "timeout": {"type": "number"} + }, + "required": ["url"] + } +} + +# Generated Action Model +class FetchUrl(MCPToolAction): + """Fetch content from URL""" + url: str + timeout: float | None = None +``` + +## Tool Execution + +### Execution Flow + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Agent["Agent generates action"] + Action["MCPToolAction"] + Executor["MCPToolExecutor"] + + Convert["Convert to MCP format"] + Call["MCP call_tool"] + Server["MCP Server"] + + Result["MCP Result"] + Obs["MCPToolObservation"] + Return["Return to Agent"] + + Agent --> Action + Action --> Executor + Executor --> Convert + Convert --> Call + Call --> Server + Server --> Result + Result --> Obs + Obs --> Return + + style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Call fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Obs fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Execution Steps:** + +1. **Action Creation:** LLM generates tool call, parsed into `MCPToolAction` +2. **Executor Lookup:** Find `MCPToolExecutor` for tool name +3. **Format Conversion:** Convert action fields to MCP arguments +4. **MCP Call:** Execute `call_tool` via MCP client +5. **Result Parsing:** Parse MCP result (text, images, resources) +6. **Observation Creation:** Wrap in `MCPToolObservation` +7. **Error Handling:** Catch exceptions, return error observations + +### MCPToolExecutor + +Executors bridge SDK actions to MCP calls: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Executor["MCPToolExecutor"] + Client["MCP Client"] + Name["tool_name"] + + Executor -->|Uses| Client + Executor -->|Knows| Name + + style Executor fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Client fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Executor Responsibilities:** +- **Client Management:** Hold reference to MCP client +- **Tool Identification:** Know which MCP tool to call +- **Argument Conversion:** Transform action fields to MCP format +- **Result Handling:** Parse MCP responses +- **Error Recovery:** Handle connection errors, timeouts, server failures + +## MCP Tool Lifecycle + +### From Configuration to Execution + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Load["Load MCP Config"] + Start["Start Conversation"] + Spawn["Spawn MCP Servers"] + Discover["Discover Tools"] + Register["Register Tools"] + + Ready["Agent Ready"] + + Step["Agent Step"] + LLM["LLM Tool Call"] + Execute["Execute MCP Tool"] + Result["Return Observation"] + + End["End Conversation"] + Cleanup["Close MCP Clients"] + + Load --> Start + Start --> Spawn + Spawn --> Discover + Discover --> Register + Register --> Ready + + Ready --> Step + Step --> LLM + LLM --> Execute + Execute --> Result + Result --> Step + + Step --> End + End --> Cleanup + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Execute fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Cleanup fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Lifecycle Phases:** + +| Phase | Operations | Components | +|-------|-----------|------------| +| **Initialization** | Spawn servers, discover tools | MCPClient, ToolRegistry | +| **Registration** | Create definitions, executors | MCPToolDefinition, MCPToolExecutor | +| **Execution** | Handle tool calls | Agent, MCPToolAction | +| **Cleanup** | Close connections, shutdown servers | MCPClient.sync_close() | + +## MCP Annotations + +MCP tools can include metadata hints for agents: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Tool["MCP Tool"] + + subgraph Annotations + ReadOnly["readOnlyHint"] + Destructive["destructiveHint"] + Progress["progressEnabled"] + end + + Security["Security Analysis"] + + Tool --> ReadOnly + Tool --> Destructive + Tool --> Progress + + ReadOnly --> Security + Destructive --> Security + + style Destructive fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Annotation Types:** + +| Annotation | Meaning | Use Case | +|------------|---------|----------| +| **readOnlyHint** | Tool doesn't modify state | Lower security risk | +| **destructiveHint** | Tool modifies/deletes data | Require confirmation | +| **progressEnabled** | Tool reports progress | Show progress UI | + +These annotations feed into the security analyzer for risk assessment. + +## Component Relationships + +### How MCP Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + MCP["MCP System"] + Skills["Skills"] + Tools["Tool Registry"] + Agent["Agent"] + Security["Security"] + + Skills -->|Configures| MCP + MCP -->|Registers| Tools + Agent -->|Uses| Tools + MCP -->|Provides hints| Security + + style MCP fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Skills fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Skills → MCP**: Repository skills can embed MCP configurations +- **MCP → Tools**: MCP tools registered alongside native tools +- **Agent → Tools**: Agents use MCP tools like any other tool +- **MCP → Security**: Annotations inform security risk assessment +- **Transparent Integration**: Agent doesn't distinguish MCP from native tools + +## Design Rationale + +**Async Bridge Pattern:** MCP protocol requires async, but synchronous tool execution simplifies agent implementation. Background event loop bridges the gap without exposing async complexity to tool users. + +**Dynamic Model Generation:** Creating Pydantic models at runtime from MCP schemas enables type-safe tool calls without manual model definitions. This supports arbitrary MCP servers without SDK code changes. + +**Unified Tool Interface:** Wrapping MCP tools in `ToolDefinition` makes them indistinguishable from native tools. Agents use the same interface regardless of tool source. + +**FastMCP Foundation:** Building on FastMCP (MCP SDK for Python) provides battle-tested client implementation, protocol compliance, and ongoing updates as MCP evolves. + +**Annotation Support:** Exposing MCP hints (readOnly, destructive) enables intelligent security analysis and user confirmation flows based on tool characteristics. + +**Lifecycle Management:** Automatic spawn/cleanup of MCP servers in conversation lifecycle ensures resources are properly managed without manual bookkeeping. + +## See Also + +- **[Tool System](/sdk/arch/tool-system)** - How MCP tools integrate with tool framework +- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills +- **[Security](/sdk/arch/security)** - How MCP annotations inform risk assessment +- **[MCP Guide](/sdk/guides/mcp)** - Using MCP tools in applications +- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library + +### Overview +Source: https://docs.openhands.dev/sdk/arch/overview.md + +The **OpenHands Software Agent SDK** provides a unified, type-safe framework for building and deploying AI agents—from local experiments to full production systems, focused on **statelessness**, **composability**, and **clear boundaries** between research and deployment. + +Check [this document](/sdk/arch/design) for the core design principles that guided its architecture. + +## Relationship with OpenHands Applications + +The Software Agent SDK serves as the **source of truth for agents** in OpenHands. The [OpenHands repository](https://github.com/OpenHands/OpenHands) provides interfaces—web app, CLI, and cloud—that consume the SDK APIs. This architecture ensures consistency and enables flexible integration patterns. +- **Software Agent SDK = foundation.** The SDK defines all core components: agents, LLMs, conversations, tools, workspaces, events, and security policies. +- **Interfaces reuse SDK objects.** The OpenHands GUI or CLI hydrate SDK components from persisted settings and orchestrate execution through SDK APIs. +- **Consistent configuration.** Whether you launch an agent programmatically or via the OpenHands GUI, the supported parameters and defaults come from the SDK. + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 50}} }%% +graph TB + subgraph Interfaces["OpenHands Interfaces"] + UI[OpenHands GUI
React frontend] + CLI[OpenHands CLI
Command-line interface] + Custom[Your Custom Client
Automations & workflows] + end + + SDK[Software Agent SDK
openhands.sdk + tools + workspace] + + subgraph External["External Services"] + LLM[LLM Providers
OpenAI, Anthropic, etc.] + Runtime[Runtime Services
Docker, Remote API, etc.] + end + + UI --> SDK + CLI --> SDK + Custom --> SDK + + SDK --> LLM + SDK --> Runtime + + classDef interface fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef sdk fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class UI,CLI,Custom interface + class SDK sdk + class LLM,Runtime external +``` + + +## Four-Package Architecture + +The agent-sdk is organized into four distinct Python packages: + +| Package | What It Does | When You Need It | +|---------|-------------|------------------| +| **openhands.sdk** | Core agent framework + base workspace classes | Always (required) | +| **openhands.tools** | Pre-built tools (bash, file editing, etc.) | Optional - provides common tools | +| **openhands.workspace** | Extended workspace implementations (Docker, remote) | Optional - extends SDK's base classes | +| **openhands.agent_server** | Multi-user API server | Optional - used by workspace implementations | + +### Two Deployment Modes + +The SDK supports two deployment architectures depending on your needs: + +#### Mode 1: Local Development + +**Installation:** Just install `openhands-sdk` + `openhands-tools` + +```bash +pip install openhands-sdk openhands-tools +``` + +**Architecture:** + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + SDK["openhands.sdk
Agent · LLM · Conversation
+ LocalWorkspace"]:::sdk + Tools["openhands.tools
BashTool · FileEditor · GrepTool · …"]:::tools + + SDK -->|uses| Tools + + classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:2px,rx:8,ry:8 + classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:2px,rx:8,ry:8 +``` + +- `LocalWorkspace` included in SDK (no extra install) +- Everything runs in one process +- Perfect for prototyping and simple use cases +- Quick setup, no Docker required + +#### Mode 2: Production / Sandboxed + +**Installation:** Install all 4 packages + +```bash +pip install openhands-sdk openhands-tools openhands-workspace openhands-agent-server +``` + +**Architecture:** + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20, "rankSpacing": 30}} }%% +flowchart LR + + WSBase["openhands.sdk
Base Classes:
Workspace · Local · Remote"]:::sdk + + subgraph WS[" "] + direction LR + Docker["openhands.workspace DockerWorkspace
extends RemoteWorkspace"]:::ws + Remote["openhands.workspace RemoteAPIWorkspace
extends RemoteWorkspace"]:::ws + end + + Server["openhands.agent_server
FastAPI + WebSocket"]:::server + Agent["openhands.sdk
Agent · LLM · Conversation"]:::sdk + Tools["openhands.tools
BashTool · FileEditor · …"]:::tools + + WSBase -.->|extended by| Docker + WSBase -.->|extended by| Remote + Docker -->|spawns container with| Server + Remote -->|connects via HTTP to| Server + Server -->|runs| Agent + Agent -->|uses| Tools + + classDef sdk fill:#e8f3ff,stroke:#2b6cb0,color:#0f2a45,stroke-width:1.1px,rx:8,ry:8 + classDef ws fill:#fff4df,stroke:#b7791f,color:#5b3410,stroke-width:1.1px,rx:8,ry:8 + classDef server fill:#f3e8ff,stroke:#7c3aed,color:#3b2370,stroke-width:1.1px,rx:8,ry:8 + classDef tools fill:#e9f9ef,stroke:#2f855a,color:#14532d,stroke-width:1.1px,rx:8,ry:8 + + style WS stroke:#b7791f,stroke-width:1.5px,stroke-dasharray: 4 3,rx:8,ry:8,fill:none +``` + +- `RemoteWorkspace` auto-spawns agent-server in containers +- Sandboxed execution for security +- Multi-user deployments +- Distributed systems (e.g., Kubernetes) support + + +**Key Point:** Same agent code works in both modes—just swap the workspace type (`LocalWorkspace` → `DockerWorkspace` → `RemoteAPIWorkspace`). + + +### SDK Package (`openhands.sdk`) + +**Purpose:** Core components and base classes for OpenHands agent. + +**Key Components:** +- **[Agent](/sdk/arch/agent):** Implements the reasoning-action loop +- **[Conversation](/sdk/arch/conversation):** Manages conversation state and lifecycle +- **[LLM](/sdk/arch/llm):** Provider-agnostic language model interface with retry and telemetry +- **[Tool System](/sdk/arch/tool-system):** Typed base class definitions for action, observation, tool, and executor; includes MCP integration +- **[Events](/sdk/arch/events):** Typed event framework (e.g., action, observation, user messages, state update, etc.) +- **[Workspace](/sdk/arch/workspace):** Base classes (`Workspace`, `LocalWorkspace`, `RemoteWorkspace`) +- **[Skill](/sdk/arch/skill):** Reusable user-defined prompts with trigger-based activation +- **[Condenser](/sdk/arch/condenser):** Conversation history compression for token management +- **[Security](/sdk/arch/security):** Action risk assessment and validation before execution + +**Design:** Stateless, immutable components with type-safe Pydantic models. + +**Self-Contained:** Build and run agents with just `openhands-sdk` using `LocalWorkspace`. + +**Source:** [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) + +### Tools Package (`openhands.tools`) + + + +**Tool Independence:** Tools run alongside the agent in whatever environment workspace configures (local/container/remote). They don't run "through" workspace APIs. + + +**Purpose:** Pre-built tools following consistent patterns. + +**Design:** All tools follow Action/Observation/Executor pattern with built-in validation, error handling, and security. + + +For full list of tools, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) as the source of truth. + + + +### Workspace Package (`openhands.workspace`) + +**Purpose:** Workspace implementations extending SDK base classes. + +**Key Components:** Docker Workspace, Remote API Workspace, and more. + +**Design:** All workspace implementations extend `RemoteWorkspace` from SDK, adding container lifecycle or API client functionality. + +**Use Cases:** Sandboxed execution, multi-user deployments, production environments. + + +For full list of implemented workspaces, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace). + + +### Agent Server Package (`openhands.agent_server`) + +**Purpose:** FastAPI-based HTTP/WebSocket server for remote agent execution. + +**Features:** +- REST API & WebSocket endpoints for conversations, bash, files, events, desktop, and VSCode +- Service management with isolated per-user sessions +- API key authentication and health checking + +**Deployment:** Runs inside containers (via `DockerWorkspace`) or as standalone process (connected via `RemoteWorkspace`). + +**Use Cases:** Multi-user web apps, SaaS products, distributed systems. + + +For implementation details, see the [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server). + + +## How Components Work Together + +### Basic Execution Flow (Local) + +When you send a message to an agent, here's what happens: + +```mermaid +sequenceDiagram + participant You + participant Conversation + participant Agent + participant LLM + participant Tool + + You->>Conversation: "Create hello.txt" + Conversation->>Agent: Process message + Agent->>LLM: What should I do? + LLM-->>Agent: Use BashTool("touch hello.txt") + Agent->>Tool: Execute action + Note over Tool: Runs in same environment
as Agent (local/container/remote) + Tool-->>Agent: Observation + Agent->>LLM: Got result, continue? + LLM-->>Agent: Done + Agent-->>Conversation: Update state + Conversation-->>You: "File created!" +``` + +**Key takeaway:** The agent orchestrates the reasoning-action loop—calling the LLM for decisions and executing tools to perform actions. + +### Deployment Flexibility + +The same agent code runs in different environments by swapping workspace configuration: + +```mermaid +graph TB + subgraph "Your Code (Unchanged)" + Code["Agent + Tools + LLM"] + end + + subgraph "Deployment Options" + Local["Local
Direct execution"] + Docker["Docker
Containerized"] + Remote["Remote
Multi-user server"] + end + + Code -->|LocalWorkspace| Local + Code -->|DockerWorkspace| Docker + Code -->|RemoteAPIWorkspace| Remote + + style Code fill:#e1f5fe + style Local fill:#e8f5e8 + style Docker fill:#e8f5e8 + style Remote fill:#e8f5e8 +``` + +## Next Steps + +### Get Started +- [Getting Started](/sdk/getting-started) – Build your first agent +- [Hello World](/sdk/guides/hello-world) – Minimal example + +### Explore Components + +**SDK Package:** +- [Agent](/sdk/arch/agent) – Core reasoning-action loop +- [Conversation](/sdk/arch/conversation) – State management and lifecycle +- [LLM](/sdk/arch/llm) – Language model integration +- [Tool System](/sdk/arch/tool-system) – Action/Observation/Executor pattern +- [Events](/sdk/arch/events) – Typed event framework +- [Workspace](/sdk/arch/workspace) – Base workspace architecture + +**Tools Package:** +- See [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) source code for implementation details + +**Workspace Package:** +- See [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) source code for implementation details + +**Agent Server:** +- See [`openhands-agent-server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server) source code for implementation details + +### Deploy +- [Remote Server](/sdk/guides/agent-server/overview) – Deploy remotely +- [Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox) – Container setup +- [API Sandboxed Server](/sdk/guides/agent-server/api-sandbox) – Hosted runtime service +- [Local Agent Server](/sdk/guides/agent-server/local-server) – In-process server + +### Source Code +- [`openhands/sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) – Core framework +- [`openhands/tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) – Pre-built tools +- [`openhands/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace) – Workspaces +- [`openhands/agent_server/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-agent-server/openhands/agent_server) – HTTP server +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) – Working examples + +### SDK Package +Source: https://docs.openhands.dev/sdk/arch/sdk.md + +The SDK package (`openhands.sdk`) is the heart of the OpenHands Software Agent SDK. It provides the core framework for building agents locally or embedding them in applications. + +**Source**: [`sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk) + +## Purpose + +The SDK package handles: +- **Agent reasoning loop**: How agents process messages and make decisions +- **State management**: Conversation lifecycle and persistence +- **LLM integration**: Provider-agnostic language model access +- **Tool system**: Typed actions and observations +- **Workspace abstraction**: Where code executes +- **Extensibility**: Skills, condensers, MCP, security + +## Core Components + +```mermaid +graph TB + Conv[Conversation
Lifecycle Manager] --> Agent[Agent
Reasoning Loop] + + Agent --> LLM[LLM
Language Model] + Agent --> Tools[Tool System
Capabilities] + Agent --> Micro[Skills
Behavior Modules] + Agent --> Cond[Condenser
Memory Manager] + + Tools --> Workspace[Workspace
Execution] + + Conv --> Events[Events
Communication] + Tools --> MCP[MCP
External Tools] + Workspace --> Security[Security
Validation] + + style Conv fill:#e1f5fe + style Agent fill:#f3e5f5 + style LLM fill:#e8f5e8 + style Tools fill:#fff3e0 + style Workspace fill:#fce4ec +``` + +### 1. Conversation - State & Lifecycle + +**What it does**: Manages the entire conversation lifecycle and state. + +**Key responsibilities**: +- Maintains conversation state (immutable) +- Handles message flow between user and agent +- Manages turn-taking and async execution +- Persists and restores conversation state +- Emits events for monitoring + +**Design decisions**: +- **Immutable state**: Each operation returns a new Conversation instance +- **Serializable**: Can be saved to disk or database and restored +- **Async-first**: Built for streaming and concurrent execution + +**When to use directly**: When you need fine-grained control over conversation state, want to implement custom persistence, or need to pause/resume conversations. + +**Example use cases**: +- Saving conversation to database after each turn +- Implementing undo/redo functionality +- Building multi-session chatbots +- Time-travel debugging + +**Learn more**: +- Guide: [Conversation Persistence](/sdk/guides/convo-persistence) +- Guide: [Pause and Resume](/sdk/guides/convo-pause-and-resume) +- Source: [`conversation/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation) + +--- + +### 2. Agent - The Reasoning Loop + +**What it does**: The core reasoning engine that processes messages and decides what to do. + +**Key responsibilities**: +- Receives messages and current state +- Consults LLM to reason about next action +- Validates and executes tool calls +- Processes observations and loops until completion +- Integrates with skills for specialized behavior + +**Design decisions**: +- **Stateless**: Agent doesn't hold state, operates on Conversation +- **Extensible**: Behavior can be modified via skills +- **Provider-agnostic**: Works with any LLM through unified interface + +**The reasoning loop**: +1. Receive message from Conversation +2. Add message to context +3. Consult LLM with full conversation history +4. If LLM returns tool call → validate and execute tool +5. If tool returns observation → add to context, go to step 3 +6. If LLM returns response → done, return to user + +**When to customize**: When you need specialized reasoning strategies, want to implement custom agent behaviors, or need to control the execution flow. + +**Example use cases**: +- Planning agents that break tasks into steps +- Code review agents with specific checks +- Agents with domain-specific reasoning patterns + +**Learn more**: +- Guide: [Custom Agents](/sdk/guides/agent-custom) +- Guide: [Agent Stuck Detector](/sdk/guides/agent-stuck-detector) +- Source: [`agent/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent) + +--- + +### 3. LLM - Language Model Integration + +**What it does**: Provides a provider-agnostic interface to language models. + +**Key responsibilities**: +- Abstracts different LLM providers (OpenAI, Anthropic, etc.) +- Handles message formatting and conversion +- Manages streaming responses +- Supports tool calling and reasoning modes +- Handles retries and error recovery + +**Design decisions**: +- **Provider-agnostic**: Same API works with any provider +- **Streaming-first**: Built for real-time responses +- **Type-safe**: Pydantic models for all messages +- **Extensible**: Easy to add new providers + +**Why provider-agnostic?** You can switch between OpenAI, Anthropic, local models, etc. without changing your agent code. This is crucial for: +- Cost optimization (switch to cheaper models) +- Testing with different models +- Avoiding vendor lock-in +- Supporting customer choice + +**When to customize**: When you need to add a new LLM provider, implement custom retries, or modify message formatting. + +**Example use cases**: +- Routing requests to different models based on complexity +- Implementing custom caching strategies +- Adding observability hooks + +**Learn more**: +- Guide: [LLM Registry](/sdk/guides/llm-registry) +- Guide: [LLM Routing](/sdk/guides/llm-routing) +- Guide: [Reasoning and Tool Use](/sdk/guides/llm-reasoning) +- Source: [`llm/`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm) + +--- + +### 4. Tool System - Typed Capabilities + +**What it does**: Defines what agents can do through a typed action/observation pattern. + +**Key responsibilities**: +- Defines tool schemas (inputs and outputs) +- Validates actions before execution +- Executes tools and returns typed observations +- Generates JSON schemas for LLM tool calling +- Registers tools with the agent + +**Design decisions**: +- **Action/Observation pattern**: Tools are defined as type-safe input/output pairs +- **Schema generation**: Pydantic models auto-generate JSON schemas +- **Executor pattern**: Separation of tool definition and execution +- **Composable**: Tools can call other tools + +**The three components**: +1. **Action**: Input schema (what the tool accepts) +2. **Observation**: Output schema (what the tool returns) +3. **ToolExecutor**: Logic that transforms Action → Observation + +**Why this pattern?** +- Type safety catches errors early +- LLMs get accurate schemas for tool calling +- Tools are testable in isolation +- Easy to compose tools + +**When to customize**: When you need domain-specific capabilities not covered by built-in tools. + +**Example use cases**: +- Database query tools +- API integration tools +- Custom file format parsers +- Domain-specific calculators + +**Learn more**: +- Guide: [Custom Tools](/sdk/guides/custom-tools) +- Source: [`tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) + +--- + +### 5. Workspace - Execution Abstraction + +**What it does**: Abstracts *where* code executes (local, Docker, remote). + +**Key responsibilities**: +- Provides unified interface for code execution +- Handles file operations across environments +- Manages working directories +- Supports different isolation levels + +**Design decisions**: +- **Abstract interface**: LocalWorkspace in SDK, advanced types in workspace package +- **Environment-agnostic**: Code works the same locally or remotely +- **Lazy initialization**: Workspace setup happens on first use + +**Why abstract?** You can develop locally with LocalWorkspace, then deploy with DockerWorkspace or RemoteAPIWorkspace without changing agent code. + +**When to use directly**: Rarely - usually configured when creating an agent. Use advanced workspaces for production. + +**Learn more**: +- Architecture: [Workspace Architecture](/sdk/arch/workspace) +- Guides: [Remote Agent Server](/sdk/guides/agent-server/overview) +- Source: [`workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) + +--- + +### 6. Events - Component Communication + +**What it does**: Enables observability and debugging through event emissions. + +**Key responsibilities**: +- Defines event types (messages, actions, observations, errors) +- Emitted by Conversation, Agent, Tools +- Enables logging, debugging, and monitoring +- Supports custom event handlers + +**Design decisions**: +- **Immutable**: Events are snapshots, not mutable objects +- **Serializable**: Can be logged, stored, replayed +- **Type-safe**: Pydantic models for all events + +**Why events?** They provide a timeline of what happened during agent execution. Essential for: +- Debugging agent behavior +- Understanding decision-making +- Building observability dashboards +- Implementing custom logging + +**When to use**: When building monitoring systems, debugging tools, or need to track agent behavior. + +**Learn more**: +- Guide: [Metrics and Observability](/sdk/guides/metrics) +- Source: [`event/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/event) + +--- + +### 7. Condenser - Memory Management + +**What it does**: Compresses conversation history when it gets too long. + +**Key responsibilities**: +- Monitors conversation length +- Summarizes older messages +- Preserves important context +- Keeps conversation within token limits + +**Design decisions**: +- **Pluggable**: Different condensing strategies +- **Automatic**: Triggered when context gets large +- **Preserves semantics**: Important information retained + +**Why needed?** LLMs have token limits. Long conversations would eventually exceed context windows. Condensers keep conversations running indefinitely while staying within limits. + +**When to customize**: When you need domain-specific summarization strategies or want to control what gets preserved. + +**Example strategies**: +- Summarize old messages +- Keep only last N turns +- Preserve task-related messages + +**Learn more**: +- Guide: [Context Condenser](/sdk/guides/context-condenser) +- Source: [`condenser/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/condenser) + +--- + +### 8. MCP - Model Context Protocol + +**What it does**: Integrates external tool servers via Model Context Protocol. + +**Key responsibilities**: +- Connects to MCP-compatible tool servers +- Translates MCP tools to SDK tool format +- Manages server lifecycle +- Handles server communication + +**Design decisions**: +- **Standard protocol**: Uses MCP specification +- **Transparent integration**: MCP tools look like regular tools to agents +- **Process management**: Handles server startup/shutdown + +**Why MCP?** It lets you use external tools without writing custom SDK integrations. Many tools (databases, APIs, services) provide MCP servers. + +**When to use**: When you need tools that: +- Already have MCP servers (fetch, filesystem, etc.) +- Are too complex to rewrite as SDK tools +- Need to run in separate processes +- Are provided by third parties + +**Learn more**: +- Guide: [MCP Integration](/sdk/guides/mcp) +- Spec: [Model Context Protocol](https://modelcontextprotocol.io/) +- Source: [`mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) + +--- + +### 9. Skills (formerly Microagents) - Behavior Modules + +**What it does**: Specialized modules that modify agent behavior for specific tasks. + +**Key responsibilities**: +- Provide domain-specific instructions +- Modify system prompts +- Guide agent decision-making +- Compose to create specialized agents + +**Design decisions**: +- **Composable**: Multiple skills can work together +- **Declarative**: Defined as configuration, not code +- **Reusable**: Share skills across agents + +**Why skills?** Instead of hard-coding behaviors, skills let you compose agent personalities and capabilities. Like "plugins" for agent behavior. + +**Example skills**: +- GitHub operations (issue creation, PRs) +- Code review guidelines +- Documentation style enforcement +- Project-specific conventions + +**When to use**: When you need agents with specialized knowledge or behavior patterns that apply to specific domains or tasks. + +**Learn more**: +- Guide: [Agent Skills & Context](/sdk/guides/skill) +- Source: [`skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) + +--- + +### 10. Security - Validation & Sandboxing + +**What it does**: Validates inputs and enforces security constraints. + +**Key responsibilities**: +- Input validation +- Command sanitization +- Path traversal prevention +- Resource limits + +**Design decisions**: +- **Defense in depth**: Multiple validation layers +- **Fail-safe**: Rejects suspicious inputs by default +- **Configurable**: Adjust security levels as needed + +**Why needed?** Agents execute arbitrary code and file operations. Security prevents: +- Malicious prompts escaping sandboxes +- Path traversal attacks +- Resource exhaustion +- Unintended system access + +**When to customize**: When you need domain-specific validation rules or want to adjust security policies. + +**Learn more**: +- Guide: [Security and Secrets](/sdk/guides/security) +- Source: [`security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) + +--- + +## How Components Work Together + +### Example: User asks agent to create a file + +``` +1. User → Conversation: "Create a file called hello.txt with 'Hello World'" + +2. Conversation → Agent: New message event + +3. Agent → LLM: Full conversation history + available tools + +4. LLM → Agent: Tool call for FileEditorTool.create() + +5. Agent → Tool System: Validate FileEditorAction + +6. Tool System → Tool Executor: Execute action + +7. Tool Executor → Workspace: Create file (local/docker/remote) + +8. Workspace → Tool Executor: Success + +9. Tool Executor → Tool System: FileEditorObservation (success=true) + +10. Tool System → Agent: Observation + +11. Agent → LLM: Updated history with observation + +12. LLM → Agent: "File created successfully" + +13. Agent → Conversation: Done, final response + +14. Conversation → User: "File created successfully" +``` + +Throughout this flow: +- **Events** are emitted for observability +- **Condenser** may trigger if history gets long +- **Skills** influence LLM's decision-making +- **Security** validates file paths and operations +- **MCP** could provide additional tools if configured + +## Design Patterns + +### Immutability + +All core objects are immutable. Operations return new instances: + +```python +conversation = Conversation(...) +new_conversation = conversation.add_message(message) +# conversation is unchanged, new_conversation has the message +``` + +**Why?** Makes debugging easier, enables time-travel, ensures serializability. + +### Composition Over Inheritance + +Agents are composed from: +- LLM provider +- Tool list +- Skill list +- Condenser strategy +- Security policy + +You don't subclass Agent - you configure it. + +**Why?** More flexible, easier to test, enables runtime configuration. + +### Type Safety + +Everything uses Pydantic models: +- Messages, actions, observations are typed +- Validation happens automatically +- Schemas generate from types + +**Why?** Catches errors early, provides IDE support, self-documenting. + +## Next Steps + +### For Usage Examples + +- [Getting Started](/sdk/getting-started) - Build your first agent +- [Custom Tools](/sdk/guides/custom-tools) - Extend capabilities +- [LLM Configuration](/sdk/guides/llm-registry) - Configure providers +- [Conversation Management](/sdk/guides/convo-persistence) - State handling + +### For Related Architecture + +- [Tool System](/sdk/arch/tool-system) - Built-in tool implementations +- [Workspace Architecture](/sdk/arch/workspace) - Execution environments +- [Agent Server Architecture](/sdk/arch/agent-server) - Remote execution + +### For Implementation Details + +- [`openhands-sdk/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk) - SDK source code +- [`openhands-tools/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools) - Tools source code +- [`openhands-workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace) - Workspace source code +- [`examples/`](https://github.com/OpenHands/software-agent-sdk/tree/main/examples) - Working examples + +### Security +Source: https://docs.openhands.dev/sdk/arch/security.md + +The **Security** system evaluates agent actions for potential risks before execution. It provides pluggable security analyzers that assess action risk levels and enforce confirmation policies based on security characteristics. + +**Source:** [`openhands-sdk/penhands/sdk/security/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/security) + +## Core Responsibilities + +The Security system has four primary responsibilities: + +1. **Risk Assessment** - Capture and validate LLM-provided risk levels for actions +2. **Confirmation Policy** - Determine when user approval is required based on risk +3. **Action Validation** - Enforce security policies before execution +4. **Audit Trail** - Record security decisions in event history + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["SecurityAnalyzerBase
Abstract analyzer"] + end + + subgraph Implementations["Concrete Analyzers"] + LLM["LLMSecurityAnalyzer
Inline risk prediction"] + NoOp["NoOpSecurityAnalyzer
No analysis"] + end + + subgraph Risk["Risk Levels"] + Low["LOW
Safe operations"] + Medium["MEDIUM
Moderate risk"] + High["HIGH
Dangerous ops"] + Unknown["UNKNOWN
Unanalyzed"] + end + + subgraph Policy["Confirmation Policy"] + Check["should_require_confirmation()"] + Mode["Confirmation Mode"] + Decision["Require / Allow"] + end + + Base --> LLM + Base --> NoOp + + Implementations --> Low + Implementations --> Medium + Implementations --> High + Implementations --> Unknown + + Low --> Check + Medium --> Check + High --> Check + Unknown --> Check + + Check --> Mode + Mode --> Decision + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + classDef danger fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + + class Base primary + class LLM secondary + class High danger + class Check tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`SecurityAnalyzerBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Abstract interface | Defines `security_risk()` contract | +| **[`LLMSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/llm_analyzer.py)** | Inline risk assessment | Returns LLM-provided risk from action arguments | +| **[`NoOpSecurityAnalyzer`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py)** | Passthrough analyzer | Always returns UNKNOWN | +| **[`SecurityRisk`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/risk.py)** | Risk enum | LOW, MEDIUM, HIGH, UNKNOWN | +| **[`ConfirmationPolicy`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py)** | Decision logic | Maps risk levels to confirmation requirements | + +## Risk Levels + +Security analyzers return one of four risk levels: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + Action["ActionEvent"] + Analyze["Security Analyzer"] + + subgraph Levels["Risk Levels"] + Low["LOW
Read-only, safe"] + Medium["MEDIUM
Modify files"] + High["HIGH
Delete, execute"] + Unknown["UNKNOWN
Not analyzed"] + end + + Action --> Analyze + Analyze --> Low + Analyze --> Medium + Analyze --> High + Analyze --> Unknown + + style Low fill:#d1fae5,stroke:#10b981,stroke-width:2px + style Medium fill:#fef3c7,stroke:#f59e0b,stroke-width:2px + style High fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + style Unknown fill:#f3f4f6,stroke:#6b7280,stroke-width:2px +``` + +### Risk Level Definitions + +| Level | Characteristics | Examples | +|-------|----------------|----------| +| **LOW** | Read-only, no state changes | File reading, directory listing, search | +| **MEDIUM** | Modifies user data | File editing, creating files, API calls | +| **HIGH** | Dangerous operations | File deletion, system commands, privilege escalation | +| **UNKNOWN** | Not analyzed or indeterminate | Complex commands, ambiguous operations | + +## Security Analyzers + +### LLMSecurityAnalyzer + +Leverages the LLM's inline risk assessment during action generation: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Schema["Tool Schema
+ security_risk param"] + LLM["LLM generates action
with security_risk"] + ToolCall["Tool Call Arguments
{command: 'rm -rf', security_risk: 'HIGH'}"] + Extract["Extract security_risk
from arguments"] + ActionEvent["ActionEvent
with security_risk set"] + Analyzer["LLMSecurityAnalyzer
returns security_risk"] + + Schema --> LLM + LLM --> ToolCall + ToolCall --> Extract + Extract --> ActionEvent + ActionEvent --> Analyzer + + style Schema fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Extract fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Analyzer fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Analysis Process:** + +1. **Schema Enhancement:** A required `security_risk` parameter is added to each tool's schema +2. **LLM Generation:** The LLM generates tool calls with `security_risk` as part of the arguments +3. **Risk Extraction:** The agent extracts the `security_risk` value from the tool call arguments +4. **ActionEvent Creation:** The security risk is stored on the `ActionEvent` +5. **Analyzer Query:** `LLMSecurityAnalyzer.security_risk()` returns the pre-assigned risk level +6. **No Additional LLM Calls:** Risk assessment happens inline—no separate analysis step + +**Example Tool Call:** +```json +{ + "name": "execute_bash", + "arguments": { + "command": "rm -rf /tmp/cache", + "security_risk": "HIGH" + } +} +``` + +The LLM reasons about risk in context when generating the action, eliminating the need for a separate security analysis call. + +**Configuration:** +- **Enabled When:** A `LLMSecurityAnalyzer` is configured for the agent +- **Schema Modification:** Automatically adds `security_risk` field to non-read-only tools +- **Zero Overhead:** No additional LLM calls or latency beyond normal action generation + +### NoOpSecurityAnalyzer + +Passthrough analyzer that skips analysis: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Action["ActionEvent"] + NoOp["NoOpSecurityAnalyzer"] + Unknown["SecurityRisk.UNKNOWN"] + + Action --> NoOp --> Unknown + + style NoOp fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Use Case:** Development, trusted environments, or when confirmation mode handles all actions + +## Confirmation Policy + +The confirmation policy determines when user approval is required. There are three policy implementations: + +**Source:** [`confirmation_policy.py`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py) + +### Policy Types + +| Policy | Behavior | Use Case | +|--------|----------|----------| +| **[`AlwaysConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L27-L32)** | Requires confirmation for **all** actions | Maximum safety, interactive workflows | +| **[`NeverConfirm`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L35-L40)** | Never requires confirmation | Fully autonomous agents, trusted environments | +| **[`ConfirmRisky`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/confirmation_policy.py#L43-L62)** | Configurable risk-based policy | Balanced approach, production use | + +### ConfirmRisky (Default Policy) + +The most flexible policy with configurable thresholds: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Risk["SecurityRisk"] + CheckUnknown{"Risk ==
UNKNOWN?"} + UseConfirmUnknown{"confirm_unknown
setting?"} + CheckThreshold{"risk.is_riskier
(threshold)?"} + + Confirm["Require Confirmation"] + Allow["Allow Execution"] + + Risk --> CheckUnknown + CheckUnknown -->|Yes| UseConfirmUnknown + CheckUnknown -->|No| CheckThreshold + + UseConfirmUnknown -->|True| Confirm + UseConfirmUnknown -->|False| Allow + + CheckThreshold -->|Yes| Confirm + CheckThreshold -->|No| Allow + + style CheckUnknown fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Confirm fill:#ffe8e8,stroke:#dc2626,stroke-width:2px + style Allow fill:#d1fae5,stroke:#10b981,stroke-width:2px +``` + +**Configuration:** +- **`threshold`** (default: `HIGH`) - Risk level at or above which confirmation is required + - Cannot be set to `UNKNOWN` + - Uses reflexive comparison: `risk.is_riskier(threshold)` returns `True` if `risk >= threshold` +- **`confirm_unknown`** (default: `True`) - Whether `UNKNOWN` risk requires confirmation + +### Confirmation Rules by Policy + +#### ConfirmRisky with threshold=HIGH (Default) + +| Risk Level | `confirm_unknown=True` (default) | `confirm_unknown=False` | +|------------|----------------------------------|-------------------------| +| **LOW** | ✅ Allow | ✅ Allow | +| **MEDIUM** | ✅ Allow | ✅ Allow | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +#### ConfirmRisky with threshold=MEDIUM + +| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | +|------------|------------------------|-------------------------| +| **LOW** | ✅ Allow | ✅ Allow | +| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +#### ConfirmRisky with threshold=LOW + +| Risk Level | `confirm_unknown=True` | `confirm_unknown=False` | +|------------|------------------------|-------------------------| +| **LOW** | 🔒 Require confirmation | 🔒 Require confirmation | +| **MEDIUM** | 🔒 Require confirmation | 🔒 Require confirmation | +| **HIGH** | 🔒 Require confirmation | 🔒 Require confirmation | +| **UNKNOWN** | 🔒 Require confirmation | ✅ Allow | + +**Key Rules:** +- **Risk comparison** is **reflexive**: `HIGH.is_riskier(HIGH)` returns `True` +- **UNKNOWN handling** is configurable via `confirm_unknown` flag +- **Threshold cannot be UNKNOWN** - validated at policy creation time + + +## Component Relationships + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Security["Security Analyzer"] + Agent["Agent"] + Conversation["Conversation"] + Tools["Tools"] + MCP["MCP Tools"] + + Agent -->|Validates actions| Security + Security -->|Checks| Tools + Security -->|Uses hints| MCP + Conversation -->|Pauses for confirmation| Agent + + style Security fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Conversation fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Agent → Security**: Validates actions before execution +- **Security → Tools**: Examines tool characteristics (annotations) +- **Security → MCP**: Uses MCP hints for risk assessment +- **Conversation → Agent**: Pauses for user confirmation when required +- **Optional Component**: Security analyzer can be disabled for trusted environments + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use security analyzers +- **[Tool System](/sdk/arch/tool-system)** - Tool annotations and metadata; includes MCP tool hints +- **[Security Guide](/sdk/guides/security)** - Configuring security policies + +### Skill +Source: https://docs.openhands.dev/sdk/arch/skill.md + +The **Skill** system provides a mechanism for injecting reusable, specialized knowledge into agent context. Skills use trigger-based activation to determine when they should be included in the agent's prompt. + +**Source:** [`openhands/sdk/context/skills/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/context/skills) + +## Core Responsibilities + +The Skill system has four primary responsibilities: + +1. **Context Injection** - Add specialized prompts to agent context based on triggers +2. **Trigger Evaluation** - Determine when skills should activate (always, keyword, task) +3. **MCP Integration** - Load MCP tools associated with repository skills +4. **Third-Party Support** - Parse `.cursorrules`, `agents.md`, and other skill formats + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 35}} }%% +flowchart TB + subgraph Types["Skill Types"] + Repo["Repository Skill
trigger: None"] + Knowledge["Knowledge Skill
trigger: KeywordTrigger"] + Task["Task Skill
trigger: TaskTrigger"] + end + + subgraph Triggers["Trigger Evaluation"] + Always["Always Active
Repository guidelines"] + Keyword["Keyword Match
String matching on user messages"] + TaskMatch["Keyword Match + Inputs
Same as KeywordTrigger + user inputs"] + end + + subgraph Content["Skill Content"] + Markdown["Markdown with Frontmatter"] + MCPTools["MCP Tools Config
Repo skills only"] + Inputs["Input Metadata
Task skills only"] + end + + subgraph Integration["Agent Integration"] + Context["Agent Context"] + Prompt["System Prompt"] + end + + Repo --> Always + Knowledge --> Keyword + Task --> TaskMatch + + Always --> Markdown + Keyword --> Markdown + TaskMatch --> Markdown + + Repo -.->|Optional| MCPTools + Task -.->|Requires| Inputs + + Markdown --> Context + MCPTools --> Context + Context --> Prompt + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Repo,Knowledge,Task primary + class Always,Keyword,TaskMatch secondary + class Context tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`Skill`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/skill.py)** | Core skill model | Pydantic model with name, content, trigger | +| **[`KeywordTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Keyword-based activation | String matching on user messages | +| **[`TaskTrigger`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/trigger.py)** | Task-based activation | Special type of KeywordTrigger for skills with user inputs | +| **[`InputMetadata`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/skills/types.py)** | Task input parameters | Defines user inputs for task skills | +| **Skill Loader** | File parsing | Reads markdown with frontmatter, validates schema | + +## Skill Types + +### Repository Skills + +Always-active, repository-specific guidelines. + +**Recommended:** put these permanent instructions in `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`) at the repo root. + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + File["AGENTS.md"] + Parse["Parse Frontmatter"] + Skill["Skill(trigger=None)"] + Context["Always in Context"] + + File --> Parse + Parse --> Skill + Skill --> Context + + style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Context fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Characteristics:** +- **Trigger:** `None` (always active) +- **Purpose:** Project conventions, coding standards, architecture rules +- **MCP Tools:** Can include MCP tool configuration +- **Location:** `AGENTS.md` (recommended) and/or `.agents/skills/*.md` (supported) + +**Example Files (permanent context):** +- `AGENTS.md` - General agent instructions +- `GEMINI.md` - Gemini-specific instructions +- `CLAUDE.md` - Claude-specific instructions + +**Other supported formats:** +- `.cursorrules` - Cursor IDE guidelines +- `agents.md` / `agent.md` - General agent instructions + +### Knowledge Skills + +Keyword-triggered skills for specialized domains: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + User["User Message"] + Check["Check Keywords"] + Match{"Match?"} + Activate["Activate Skill"] + Skip["Skip Skill"] + Context["Add to Context"] + + User --> Check + Check --> Match + Match -->|Yes| Activate + Match -->|No| Skip + Activate --> Context + + style Check fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Activate fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Characteristics:** +- **Trigger:** `KeywordTrigger` with regex patterns +- **Purpose:** Domain-specific knowledge (e.g., "kubernetes", "machine learning") +- **Activation:** Keywords detected in user messages +- **Location:** System or user-defined knowledge base + +**Trigger Example:** +```yaml +--- +name: kubernetes +trigger: + type: keyword + keywords: ["kubernetes", "k8s", "kubectl"] +--- +``` + +### Task Skills + +Keyword-triggered skills with structured inputs for guided workflows: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + User["User Message"] + Match{"Keyword
Match?"} + Inputs["Collect User Inputs"] + Template["Apply Template"] + Context["Add to Context"] + Skip["Skip Skill"] + + User --> Match + Match -->|Yes| Inputs + Match -->|No| Skip + Inputs --> Template + Template --> Context + + style Match fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Template fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Characteristics:** +- **Trigger:** `TaskTrigger` (a special type of KeywordTrigger for skills with user inputs) +- **Activation:** Keywords/triggers detected in user messages (same matching logic as KeywordTrigger) +- **Purpose:** Guided workflows (e.g., bug fixing, feature implementation) +- **Inputs:** User-provided parameters (e.g., bug description, acceptance criteria) +- **Location:** System-defined or custom task templates + +**Trigger Example:** +```yaml +--- +name: bug_fix +triggers: ["/bug_fix", "fix bug", "bug report"] +inputs: + - name: bug_description + description: "Describe the bug" + required: true +--- +``` + +**Note:** TaskTrigger uses the same keyword matching mechanism as KeywordTrigger. The distinction is semantic - TaskTrigger is used for skills that require structured user inputs, while KeywordTrigger is for knowledge-based skills. + +## Trigger Evaluation + +Skills are evaluated at different points in the agent lifecycle: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Start["Agent Step Start"] + + Repo["Check Repository Skills
trigger: None"] + AddRepo["Always Add to Context"] + + Message["Check User Message"] + Keyword["Match Keyword Triggers"] + AddKeyword["Add Matched Skills"] + + TaskType["Check Task Type"] + TaskMatch["Match Task Triggers"] + AddTask["Add Task Skill"] + + Build["Build Agent Context"] + + Start --> Repo + Repo --> AddRepo + + Start --> Message + Message --> Keyword + Keyword --> AddKeyword + + Start --> TaskType + TaskType --> TaskMatch + TaskMatch --> AddTask + + AddRepo --> Build + AddKeyword --> Build + AddTask --> Build + + style Repo fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Keyword fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style TaskMatch fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Evaluation Rules:** + +| Trigger Type | Evaluation Point | Activation Condition | +|--------------|------------------|----------------------| +| **None** | Every step | Always active | +| **KeywordTrigger** | On user message | Keyword/string match in message | +| **TaskTrigger** | On user message | Keyword/string match in message (same as KeywordTrigger) | + +**Note:** Both KeywordTrigger and TaskTrigger use identical string matching logic. TaskTrigger is simply a semantic variant used for skills that include user input parameters. + +## MCP Tool Integration + +Repository skills can include MCP tool configurations: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Skill["Repository Skill"] + MCPConfig["mcp_tools Config"] + Client["MCP Client"] + Tools["Tool Registry"] + + Skill -->|Contains| MCPConfig + MCPConfig -->|Spawns| Client + Client -->|Registers| Tools + + style Skill fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style MCPConfig fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Tools fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**MCP Configuration Format:** + +Skills can embed MCP server configuration following the [FastMCP format](https://gofastmcp.com/clients/client#configuration-format): + +```yaml +--- +name: repo_skill +mcp_tools: + mcpServers: + filesystem: + command: "npx" + args: ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/project"] +--- +``` + +**Workflow:** +1. **Load Skill:** Parse markdown file with frontmatter +2. **Extract MCP Config:** Read `mcp_tools` field +3. **Spawn MCP Servers:** Create MCP clients for each server +4. **Register Tools:** Add MCP tools to agent's tool registry +5. **Inject Context:** Add skill content to agent prompt + +## Skill File Format + +Skills are defined in markdown files with YAML frontmatter: + +```markdown +--- +name: skill_name +trigger: + type: keyword + keywords: ["pattern1", "pattern2"] +--- + +# Skill Content + +This is the instruction text that will be added to the agent's context. +``` + +**Frontmatter Fields:** + +| Field | Required | Description | +|-------|----------|-------------| +| **name** | Yes | Unique skill identifier | +| **trigger** | Yes* | Activation trigger (`null` for always active) | +| **mcp_tools** | No | MCP server configuration (repo skills only) | +| **inputs** | No | User input metadata (task skills only) | + +*Repository skills use `trigger: null` (or omit trigger field) + +## Component Relationships + +### How Skills Integrate + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Skills["Skill System"] + Context["Agent Context"] + Agent["Agent"] + MCP["MCP Client"] + + Skills -->|Injects content| Context + Skills -.->|Spawns tools| MCP + Context -->|System prompt| Agent + MCP -->|Tool| Agent + + style Skills fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Context fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Agent fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Skills → Agent Context**: Active skills contribute their content to system prompt +- **Skills → MCP**: Repository skills can spawn MCP servers and register tools +- **Context → Agent**: Combined skill content becomes part of agent's instructions +- **Skills Lifecycle**: Loaded at conversation start, evaluated each step + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents use skills for context +- **[Tool System](/sdk/arch/tool-system#mcp-integration)** - MCP tool spawning and client management +- **[Context Management Guide](/sdk/guides/skill)** - Using skills in applications + +### Tool System & MCP +Source: https://docs.openhands.dev/sdk/arch/tool-system.md + +The **Tool System** provides a type-safe, extensible framework for defining agent capabilities. It standardizes how agents interact with external systems through a structured Action-Observation pattern with automatic validation and schema generation. + +**Source:** [`openhands-sdk/openhands/sdk/tool/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/tool) + +## Core Responsibilities + +The Tool System has four primary responsibilities: + +1. **Type Safety** - Enforce action/observation schemas via Pydantic models +2. **Schema Generation** - Auto-generate LLM-compatible tool descriptions from Pydantic schemas +3. **Execution Lifecycle** - Validate inputs, execute logic, wrap outputs +4. **Tool Registry** - Discover and resolve tools by name or pattern + +## Tool System + +### Architecture Overview + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph Definition["Tool Definition"] + Action["Action
Input schema"] + Observation["Observation
Output schema"] + Executor["Executor
Business logic"] + end + + subgraph Framework["Tool Framework"] + Base["ToolBase
Abstract base"] + Impl["Tool Implementation
Concrete tool"] + Registry["Tool Registry
Spec → Tool"] + end + + Agent["Agent"] + LLM["LLM"] + ToolSpec["Tool Spec
name + params"] + + Base -.->|Extends| Impl + + ToolSpec -->|resolve_tool| Registry + Registry -->|Create instances| Impl + Impl -->|Available in| Agent + Impl -->|Generate schema| LLM + LLM -->|Generate tool call| Agent + Agent -->|Parse & validate| Action + Agent -->|Execute via Tool.\_\_call\_\_| Executor + Executor -->|Return| Observation + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class Action,Observation,Executor secondary + class Registry tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`ToolBase`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Abstract base class | Generic over Action and Observation types, defines abstract `create()` | +| **[`ToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Concrete tool class | Can be instantiated directly or subclassed for factory pattern | +| **[`Action`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Input model | Pydantic model with `visualize` property | +| **[`Observation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/schema.py)** | Output model | Pydantic model with `to_llm_content` property | +| **[`ToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Execution interface | ABC with `__call__()` method, optional `close()` | +| **[`ToolAnnotations`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/tool.py)** | Behavioral hints | MCP-spec hints (readOnly, destructive, idempotent, openWorld) | +| **[`Tool` (spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** | Tool specification | Configuration object with name and params | +| **[`ToolRegistry`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/registry.py)** | Tool discovery | Resolves Tool specs to ToolDefinition instances | + +### Action-Observation Pattern + +The tool system follows a **strict input-output contract**: `Action → Observation`. The Agent layer wraps these in events for conversation management. + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + subgraph Agent["Agent Layer"] + ToolCall["MessageToolCall
from LLM"] + ParseJSON["Parse JSON
arguments"] + CreateAction["tool.action_from_arguments()
Pydantic validation"] + WrapAction["ActionEvent
wraps Action"] + WrapObs["ObservationEvent
wraps Observation"] + Error["AgentErrorEvent"] + end + + subgraph ToolSystem["Tool System"] + ActionType["Action
Pydantic model"] + ToolCall2["tool.\_\_call\_\_(action)
type-safe execution"] + Execute["ToolExecutor
business logic"] + ObsType["Observation
Pydantic model"] + end + + ToolCall --> ParseJSON + ParseJSON -->|Valid JSON| CreateAction + ParseJSON -->|Invalid JSON| Error + CreateAction -->|Valid| ActionType + CreateAction -->|Invalid| Error + ActionType --> WrapAction + ActionType --> ToolCall2 + ToolCall2 --> Execute + Execute --> ObsType + ObsType --> WrapObs + + style ToolSystem fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style ActionType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px + style ObsType fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px +``` + +**Tool System Boundary:** +- **Input**: `dict[str, Any]` (JSON arguments) → validated `Action` instance +- **Output**: `Observation` instance with structured result +- **No knowledge of**: Events, LLM messages, conversation state + +### Tool Definition + +Tools are defined using two patterns depending on complexity: + +#### Pattern 1: Direct Instantiation (Simple Tools) + +For stateless tools that don't need runtime configuration (e.g., `finish`, `think`): + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% +flowchart LR + Action["Define Action
with visualize"] + Obs["Define Observation
with to_llm_content"] + Exec["Define Executor
stateless logic"] + Tool["ToolDefinition(...,
executor=Executor())"] + + Action --> Tool + Obs --> Tool + Exec --> Tool + + style Tool fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px +``` + +**Components:** +1. **Action** - Pydantic model with `visualize` property for display +2. **Observation** - Pydantic model with `to_llm_content` property for LLM +3. **ToolExecutor** - Stateless executor with `__call__(action) → observation` +4. **ToolDefinition** - Direct instantiation with executor instance + +#### Pattern 2: Subclass with Factory (Stateful Tools) + +For tools requiring runtime configuration or persistent state (e.g., `execute_bash`, `file_editor`, `glob`): + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 20}} }%% +flowchart LR + Action["Define Action
with visualize"] + Obs["Define Observation
with to_llm_content"] + Exec["Define Executor
with \_\_init\_\_ and state"] + Subclass["class MyTool(ToolDefinition)
with create() method"] + Instance["Return [MyTool(...,
executor=instance)]"] + + Action --> Subclass + Obs --> Subclass + Exec --> Subclass + Subclass --> Instance + + style Instance fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Components:** +1. **Action/Observation** - Same as Pattern 1 +2. **ToolExecutor** - Stateful executor with `__init__()` for configuration and optional `close()` for cleanup +3. **MyTool(ToolDefinition)** - Subclass with `@classmethod create(conv_state, ...)` factory method +4. **Factory Method** - Returns sequence of configured tool instances + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + subgraph Pattern1["Pattern 1: Direct Instantiation"] + P1A["Define Action/Observation
with visualize/to_llm_content"] + P1E["Define ToolExecutor
with \_\_call\_\_()"] + P1T["ToolDefinition(...,
executor=Executor())"] + end + + subgraph Pattern2["Pattern 2: Subclass with Factory"] + P2A["Define Action/Observation
with visualize/to_llm_content"] + P2E["Define Stateful ToolExecutor
with \_\_init\_\_() and \_\_call\_\_()"] + P2C["class MyTool(ToolDefinition)
@classmethod create()"] + P2I["Return [MyTool(...,
executor=instance)]"] + end + + P1A --> P1E + P1E --> P1T + + P2A --> P2E + P2E --> P2C + P2C --> P2I + + style P1T fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style P2I fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Key Design Elements:** + +| Component | Purpose | Requirements | +|-----------|---------|--------------| +| **Action** | Defines LLM-provided parameters | Extends `Action`, includes `visualize` property returning Rich Text | +| **Observation** | Defines structured output | Extends `Observation`, includes `to_llm_content` property returning content list | +| **ToolExecutor** | Implements business logic | Extends `ToolExecutor[ActionT, ObservationT]`, implements `__call__()` method | +| **ToolDefinition** | Ties everything together | Either instantiate directly (Pattern 1) or subclass with `create()` method (Pattern 2) | + +**When to Use Each Pattern:** + +| Pattern | Use Case | Examples | +|---------|----------|----------| +| **Direct Instantiation** | Stateless tools with no configuration needs | `finish`, `think`, simple utilities | +| **Subclass with Factory** | Tools requiring runtime state or configuration | `execute_bash`, `file_editor`, `glob`, `grep` | + +### Tool Annotations + +Tools include optional `ToolAnnotations` based on the [Model Context Protocol (MCP) spec](https://github.com/modelcontextprotocol/modelcontextprotocol) that provide behavioral hints to LLMs: + +| Field | Meaning | Examples | +|-------|---------|----------| +| `readOnlyHint` | Tool doesn't modify state | `glob` (True), `execute_bash` (False) | +| `destructiveHint` | May delete/overwrite data | `file_editor` (True), `task_tracker` (False) | +| `idempotentHint` | Repeated calls are safe | `glob` (True), `execute_bash` (False) | +| `openWorldHint` | Interacts beyond closed domain | `execute_bash` (True), `task_tracker` (False) | + +**Key Behaviors:** +- [LLM-based Security risk prediction](/sdk/guides/security) automatically added for tools with `readOnlyHint=False` +- Annotations help LLMs reason about tool safety and side effects + +### Tool Registry + +The registry enables **dynamic tool discovery** and instantiation from tool specifications: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + ToolSpec["Tool Spec
name + params"] + + subgraph Registry["Tool Registry"] + Resolver["Resolver
name → factory"] + Factory["Factory
create(params)"] + end + + Instance["Tool Instance
with executor"] + Agent["Agent"] + + ToolSpec -->|"resolve_tool(spec)"| Resolver + Resolver -->|Lookup factory| Factory + Factory -->|"create(**params)"| Instance + Instance -->|Used by| Agent + + style Registry fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Factory fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Resolution Workflow:** + +1. **[Tool (Spec)](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/tool/spec.py)** - Configuration object with `name` (e.g., "BashTool") and `params` (e.g., `{"working_dir": "/workspace"}`) +2. **Resolver Lookup** - Registry finds the registered resolver for the tool name +3. **Factory Invocation** - Resolver calls the tool's `.create()` method with params and conversation state +4. **Instance Creation** - Tool instance(s) are created with configured executors +5. **Agent Usage** - Instances are added to the agent's tools_map for execution + +**Registration Types:** + +| Type | Registration | Resolver Behavior | +|------|-------------|-------------------| +| **Tool Instance** | `register_tool(name, instance)` | Returns the fixed instance (params not allowed) | +| **Tool Subclass** | `register_tool(name, ToolClass)` | Calls `ToolClass.create(**params, conv_state=state)` | +| **Factory Function** | `register_tool(name, factory)` | Calls `factory(**params, conv_state=state)` | + +### File Organization + +Tools follow a consistent file structure for maintainability: + +``` +openhands-tools/openhands/tools/my_tool/ +├── __init__.py # Export MyTool +├── definition.py # Action, Observation, MyTool(ToolDefinition) +├── impl.py # MyExecutor(ToolExecutor) +└── [other modules] # Tool-specific utilities +``` + +**File Responsibilities:** + +| File | Contains | Purpose | +|------|----------|---------| +| `definition.py` | Action, Observation, ToolDefinition subclass | Public API, schema definitions, factory method | +| `impl.py` | ToolExecutor implementation | Business logic, state management, execution | +| `__init__.py` | Tool exports | Package interface | + +**Benefits:** +- **Separation of Concerns** - Public API separate from implementation +- **Avoid Circular Imports** - Import `impl` only inside `create()` method +- **Consistency** - All tools follow same structure for discoverability + +**Example Reference:** See [`terminal/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools/terminal) for complete implementation + + +## MCP Integration + +The tool system supports external tools via the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). MCP tools are **configured separately from the tool registry** via the `mcp_config` field in `Agent` class and are automatically discovered from MCP servers during agent initialization. + +**Source:** [`openhands-sdk/openhands/sdk/mcp/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp) + +### Architecture Overview + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 50}} }%% +flowchart TB + subgraph External["External MCP Server"] + Server["MCP Server
stdio/HTTP"] + ExtTools["External Tools"] + end + + subgraph Bridge["MCP Integration Layer"] + MCPClient["MCPClient
Sync/Async bridge"] + Convert["Schema Conversion
MCP → MCPToolDefinition"] + MCPExec["MCPToolExecutor
Bridges to MCP calls"] + end + + subgraph Agent["Agent System"] + ToolsMap["tools_map
str -> ToolDefinition"] + AgentLogic["Agent Execution"] + end + + Server -.->|Spawns| ExtTools + MCPClient --> Server + Server --> Convert + Convert -->|create_mcp_tools| MCPExec + MCPExec -->|Added during
agent.initialize| ToolsMap + ToolsMap --> AgentLogic + AgentLogic -->|Tool call| MCPExec + MCPExec --> MCPClient + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef external fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class MCPClient primary + class Convert,MCPExec secondary + class Server,ExtTools external +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`MCPClient`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py)** | MCP server connection | Extends FastMCP with sync/async bridge | +| **[`MCPToolDefinition`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Tool wrapper | Wraps MCP tools as SDK `ToolDefinition` with dynamic validation | +| **[`MCPToolExecutor`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Execution handler | Bridges agent actions to MCP tool calls via MCPClient | +| **[`MCPToolAction`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Generic action wrapper | Simple `dict[str, Any]` wrapper for MCP tool arguments | +| **[`MCPToolObservation`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/definition.py)** | Result wrapper | Wraps MCP tool results as observations with content blocks | +| **[`_create_mcp_action_type()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/tool.py)** | Dynamic schema | Runtime Pydantic model generated from MCP `inputSchema` for validation | + +### Sync/Async Bridge + +MCP protocol is asynchronous, but SDK tools execute synchronously. The bridge pattern in [client.py](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/client.py) solves this: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Sync["Sync Tool Execution"] + Bridge["call_async_from_sync()"] + Loop["Background Event Loop"] + Async["Async MCP Call"] + Result["Return Result"] + + Sync --> Bridge + Bridge --> Loop + Loop --> Async + Async --> Result + Result --> Sync + + style Bridge fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Loop fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Bridge Features:** +- **Background Event Loop** - Executes async code from sync contexts +- **Timeout Support** - Configurable timeouts for MCP operations +- **Error Handling** - Wraps MCP errors in observations +- **Connection Pooling** - Reuses connections across tool calls + +### Tool Discovery Flow + +**Source:** [`create_mcp_tools()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/mcp/utils.py) | [`agent._initialize()`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/base.py) + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart TB + Config["MCP Server Config
command + args"] + Spawn["Spawn Server Process
MCPClient"] + List["List Available Tools
client.list_tools()"] + + subgraph Convert["For Each MCP Tool"] + Store["Store MCP metadata
name, description, inputSchema"] + CreateExec["Create MCPToolExecutor
bound to tool + client"] + Def["Create MCPToolDefinition
generic MCPToolAction type"] + end + + Register["Add to Agent's tools_map
bypasses ToolRegistry"] + Ready["Tools Available
Dynamic models created on-demand"] + + Config --> Spawn + Spawn --> List + List --> Store + Store --> CreateExec + CreateExec --> Def + Def --> Register + Register --> Ready + + style Spawn fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Def fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Register fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Discovery Steps:** +1. **Spawn Server** - Launch MCP server via stdio protocol (using `MCPClient`) +2. **List Tools** - Call MCP `tools/list` endpoint to retrieve available tools +3. **Parse Schemas** - Extract tool names, descriptions, and `inputSchema` from MCP response +4. **Create Definitions** - For each tool, call `MCPToolDefinition.create()` which: + - Creates an `MCPToolExecutor` instance bound to the tool name and client + - Wraps the MCP tool metadata in `MCPToolDefinition` + - Uses generic `MCPToolAction` as the action type (NOT dynamic models yet) +5. **Add to Agent** - All `MCPToolDefinition` instances are added to agent's `tools_map` during `initialize()` (bypasses ToolRegistry) +6. **Lazy Validation** - Dynamic Pydantic models are generated lazily when: + - `action_from_arguments()` is called (argument validation) + - `to_openai_tool()` is called (schema export to LLM) + +**Schema Handling:** + +| MCP Schema | SDK Integration | When Used | +|------------|----------------|-----------| +| `name` | Tool name (stored in `MCPToolDefinition`) | Discovery, execution | +| `description` | Tool description for LLM | Discovery, LLM prompt | +| `inputSchema` | Stored in `mcp_tool.inputSchema` | Lazy model generation | +| `inputSchema` fields | Converted to Pydantic fields via `Schema.from_mcp_schema()` | Validation, schema export | +| `annotations` | Mapped to `ToolAnnotations` | Security analysis, LLM hints | + +### MCP Server Configuration + +MCP servers are configured via the `mcp_config` field on the `Agent` class. Configuration follows [FastMCP config format](https://gofastmcp.com/clients/client#configuration-format): + +```python +from openhands.sdk import Agent + +agent = Agent( + mcp_config={ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "filesystem": { + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"] + } + } + } +) +``` + +## Component Relationships + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart TB + subgraph Sources["Tool Sources"] + Native["Native Tools"] + MCP["MCP Tools"] + end + + Registry["Tool Registry
resolve_tool"] + ToolsMap["Agent.tools_map
Merged tool dict"] + + subgraph AgentSystem["Agent System"] + Agent["Agent Logic"] + LLM["LLM"] + end + + Security["Security Analyzer"] + Conversation["Conversation State"] + + Native -->|register_tool| Registry + Registry --> ToolsMap + MCP -->|create_mcp_tools| ToolsMap + ToolsMap -->|Provide schemas| LLM + Agent -->|Execute tools| ToolsMap + ToolsMap -.->|Action risk| Security + ToolsMap -.->|Read state| Conversation + + style ToolsMap fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Agent fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style Security fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Native → Registry → tools_map**: Native tools resolved via `ToolRegistry` +- **MCP → tools_map**: MCP tools bypass registry, added directly during `initialize()` +- **tools_map → LLM**: Generate schemas describing all available capabilities +- **Agent → tools_map**: Execute actions, receive observations +- **tools_map → Conversation**: Read state for context-aware execution +- **tools_map → Security**: Tool annotations inform risk assessment + +## See Also + +- **[Agent Architecture](/sdk/arch/agent)** - How agents select and execute tools +- **[Events](/sdk/arch/events)** - ActionEvent and ObservationEvent structures +- **[Security Analyzer](/sdk/arch/security)** - Action risk assessment +- **[Skill Architecture](/sdk/arch/skill)** - Embedding MCP configs in repository skills +- **[Custom Tools Guide](/sdk/guides/custom-tools)** - Building your own tools +- **[FastMCP Documentation](https://gofastmcp.com/)** - Underlying MCP client library + +### Workspace +Source: https://docs.openhands.dev/sdk/arch/workspace.md + +The **Workspace** component abstracts execution environments for agent operations. It provides a unified interface for command execution and file operations across local processes, containers, and remote servers. + +**Source:** [`openhands/sdk/workspace/`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/workspace) + +## Core Responsibilities + +The Workspace system has four primary responsibilities: + +1. **Execution Abstraction** - Unified interface for command execution across environments +2. **File Operations** - Upload, download, and manipulate files in workspace +3. **Resource Management** - Context manager protocol for setup/teardown +4. **Environment Isolation** - Separate agent execution from host system + +## Architecture + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 25, "rankSpacing": 60}} }%% +flowchart TB + subgraph Interface["Abstract Interface"] + Base["BaseWorkspace
Abstract base class"] + end + + subgraph Implementations["Concrete Implementations"] + Local["LocalWorkspace
Direct subprocess"] + Remote["RemoteWorkspace
HTTP API calls"] + end + + subgraph Operations["Core Operations"] + Command["execute_command()"] + Upload["file_upload()"] + Download["file_download()"] + Context["__enter__ / __exit__"] + end + + subgraph Targets["Execution Targets"] + Process["Local Process"] + Container["Docker Container"] + Server["Remote Server"] + end + + Base --> Local + Base --> Remote + + Base -.->|Defines| Operations + + Local --> Process + Remote --> Container + Remote --> Server + + classDef primary fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + classDef secondary fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + classDef tertiary fill:#fff4df,stroke:#b7791f,stroke-width:2px + + class Base primary + class Local,Remote secondary + class Command,Upload tertiary +``` + +### Key Components + +| Component | Purpose | Design | +|-----------|---------|--------| +| **[`BaseWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)** | Abstract interface | Defines execution and file operation contracts | +| **[`LocalWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/local.py)** | Local execution | Subprocess-based command execution | +| **[`RemoteWorkspace`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/remote/base.py)** | Remote execution | HTTP API-based execution via agent-server | +| **[`CommandResult`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/models.py)** | Execution output | Structured result with stdout, stderr, exit_code | +| **[`FileOperationResult`](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/models.py)** | File op outcome | Success status and metadata | + +## Workspace Types + +### Local vs Remote Execution + + +| Aspect | LocalWorkspace | RemoteWorkspace | +|--------|----------------|-----------------| +| **Execution** | Direct subprocess | HTTP → agent-server | +| **Isolation** | Process-level | Container/VM-level | +| **Performance** | Fast (no network) | Network overhead | +| **Security** | Host system access | Sandboxed | +| **Use Case** | Development, CLI | Production, web apps | + +## Core Operations + +### Command Execution + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30, "rankSpacing": 40}} }%% +flowchart LR + Tool["Tool invokes
execute_command()"] + + Decision{"Workspace
type?"} + + LocalExec["subprocess.run()
Direct execution"] + RemoteExec["POST /command
HTTP API"] + + Result["CommandResult
stdout, stderr, exit_code"] + + Tool --> Decision + Decision -->|Local| LocalExec + Decision -->|Remote| RemoteExec + + LocalExec --> Result + RemoteExec --> Result + + style Decision fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style LocalExec fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style RemoteExec fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Command Result Structure:** + +| Field | Type | Description | +|-------|------|-------------| +| **stdout** | str | Standard output stream | +| **stderr** | str | Standard error stream | +| **exit_code** | int | Process exit code (0 = success) | +| **timeout** | bool | Whether command timed out | +| **duration** | float | Execution time in seconds | + +### File Operations + +| Operation | Local Implementation | Remote Implementation | +|-----------|---------------------|----------------------| +| **Upload** | `shutil.copy()` | `POST /file/upload` with multipart | +| **Download** | `shutil.copy()` | `GET /file/download` stream | +| **Result** | `FileOperationResult` | `FileOperationResult` | + +## Resource Management + +Workspaces use **context manager** for safe resource handling: + +**Lifecycle Hooks:** + +| Phase | LocalWorkspace | RemoteWorkspace | +|-------|----------------|-----------------| +| **Enter** | Create working directory | Connect to agent-server, verify | +| **Use** | Execute commands | Proxy commands via HTTP | +| **Exit** | No cleanup (persistent) | Disconnect, optionally stop container | + +## Remote Workspace Extensions + +The SDK provides remote workspace implementations in `openhands-workspace` package: + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 50}} }%% +flowchart TB + Base["RemoteWorkspace
SDK base class"] + + Docker["DockerWorkspace
Auto-spawn containers"] + API["RemoteAPIWorkspace
Connect to existing server"] + + Base -.->|Extended by| Docker + Base -.->|Extended by| API + + Docker -->|Creates| Container["Docker Container
with agent-server"] + API -->|Connects| Server["Remote Agent Server"] + + style Base fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Docker fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px + style API fill:#fff4df,stroke:#b7791f,stroke-width:2px +``` + +**Implementation Comparison:** + +| Type | Setup | Isolation | Use Case | +|------|-------|-----------|----------| +| **LocalWorkspace** | Immediate | Process | Development, trusted code | +| **DockerWorkspace** | Spawn container | Container | Multi-user, untrusted code | +| **RemoteAPIWorkspace** | Connect to URL | Remote server | Distributed systems, cloud | + +**Source:** +- **DockerWorkspace**: [`openhands-workspace/openhands/workspace/docker`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace/docker) +- **RemoteAPIWorkspace**: [`openhands-workspace/openhands/workspace/remote_api`](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-workspace/openhands/workspace/remote_api) + +## Component Relationships + +### How Workspace Integrates + +```mermaid +%%{init: {"theme": "default", "flowchart": {"nodeSpacing": 30}} }%% +flowchart LR + Workspace["Workspace"] + Conversation["Conversation"] + AgentServer["Agent Server"] + + Conversation -->|Configures| Workspace + Workspace -.->|Remote type| AgentServer + + style Workspace fill:#f3e8ff,stroke:#7c3aed,stroke-width:2px + style Conversation fill:#e8f3ff,stroke:#2b6cb0,stroke-width:2px +``` + +**Relationship Characteristics:** +- **Conversation → Workspace**: Conversation factory uses workspace type to select LocalConversation or RemoteConversation +- **Workspace → Agent Server**: RemoteWorkspace delegates operations to agent-server API +- **Tools Independence**: Tools run in the same environment as workspace + +## See Also + +- **[Conversation Architecture](/sdk/arch/conversation)** - How workspace type determines conversation implementation +- **[Agent Server](/sdk/arch/agent-server)** - Remote execution API +- **[Tool System](/sdk/arch/tool-system)** - Tools that use workspace for execution + +### FAQ +Source: https://docs.openhands.dev/sdk/faq.md + +## How do I use AWS Bedrock with the SDK? + +**Yes, the OpenHands SDK supports AWS Bedrock through LiteLLM.** + +Since LiteLLM requires `boto3` for Bedrock requests, you need to install it alongside the SDK. + + + +### Step 1: Install boto3 + +Install the SDK with boto3: + +```bash +# Using pip +pip install openhands-sdk boto3 + +# Using uv +uv pip install openhands-sdk boto3 + +# Or when installing as a CLI tool +uv tool install openhands --with boto3 +``` + +### Step 2: Configure Authentication + +You have two authentication options: + +**Option A: API Key Authentication (Recommended)** + +Use the `AWS_BEARER_TOKEN_BEDROCK` environment variable: + +```bash +export AWS_BEARER_TOKEN_BEDROCK="your-bedrock-api-key" +``` + +**Option B: AWS Credentials** + +Use traditional AWS credentials: + +```bash +export AWS_ACCESS_KEY_ID="your-access-key" +export AWS_SECRET_ACCESS_KEY="your-secret-key" +export AWS_REGION_NAME="us-west-2" +``` + +### Step 3: Configure the Model + +Use the `bedrock/` prefix for your model name: + +```python +from openhands.sdk import LLM, Agent + +llm = LLM( + model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0", + # api_key is read from AWS_BEARER_TOKEN_BEDROCK automatically +) +``` + +For cross-region inference profiles, include the region prefix: + +```python +llm = LLM( + model="bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0", # US region + # or + model="bedrock/apac.anthropic.claude-sonnet-4-20250514-v1:0", # APAC region +) +``` + + + +For more details on Bedrock configuration options, see the [LiteLLM Bedrock documentation](https://docs.litellm.ai/docs/providers/bedrock). + +## Does the agent SDK support parallel tool calling? + +**Yes, the OpenHands SDK supports parallel tool calling by default.** + +The SDK automatically handles parallel tool calls when the underlying LLM (like Claude or GPT-4) returns multiple tool calls in a single response. This allows agents to execute multiple independent actions before the next LLM call. + + +When the LLM generates multiple tool calls in parallel, the SDK groups them using a shared `llm_response_id`: + +```python +ActionEvent(llm_response_id="abc123", thought="Let me check...", tool_call=tool1) +ActionEvent(llm_response_id="abc123", thought=[], tool_call=tool2) +# Combined into: Message(role="assistant", content="Let me check...", tool_calls=[tool1, tool2]) +``` + +Multiple `ActionEvent`s with the same `llm_response_id` are grouped together and combined into a single LLM message with multiple `tool_calls`. Only the first event's thought/reasoning is included. The parallel tool calling implementation can be found in the [Events Architecture](/sdk/arch/events#event-types) for detailed explanation of how parallel function calling works, the [`prepare_llm_messages` in utils.py](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/utils.py) which groups ActionEvents by `llm_response_id` when converting events to LLM messages, the [agent step method](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/agent/agent.py#L200-L300) where actions are created with shared `llm_response_id`, and the [`ActionEvent` class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/event/llm_convertible/action.py) which includes the `llm_response_id` field. For more details, see the **[Events Architecture](/sdk/arch/events)** for a deep dive into the event system and parallel function calling, the **[Tool System](/sdk/arch/tool-system)** for understanding how tools work with the agent, and the **[Agent Architecture](/sdk/arch/agent)** for how agents process and execute actions. + + +## Does the agent SDK support image content? + +**Yes, the OpenHands SDK fully supports image content for vision-capable LLMs.** + +The SDK supports both HTTP/HTTPS URLs and base64-encoded images through the `ImageContent` class. + + + +### Check Vision Support + +Before sending images, verify your LLM supports vision: + +```python +from openhands.sdk import LLM +from pydantic import SecretStr + +llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + usage_id="my-agent" +) + +# Check if vision is active +assert llm.vision_is_active(), "Model does not support vision" +``` + +### Using HTTP URLs + +```python +from openhands.sdk import ImageContent, Message, TextContent + +message = Message( + role="user", + content=[ + TextContent(text="What do you see in this image?"), + ImageContent(image_urls=["https://example.com/image.png"]), + ], +) +``` + +### Using Base64 Images + +Base64 images are supported using data URLs: + +```python +import base64 +from openhands.sdk import ImageContent, Message, TextContent + +# Read and encode an image file +with open("my_image.png", "rb") as f: + image_base64 = base64.b64encode(f.read()).decode("utf-8") + +# Create message with base64 image +message = Message( + role="user", + content=[ + TextContent(text="Describe this image"), + ImageContent(image_urls=[f"data:image/png;base64,{image_base64}"]), + ], +) +``` + +### Supported Image Formats + +The data URL format is: `data:;base64,` + +Supported MIME types: +- `image/png` +- `image/jpeg` +- `image/gif` +- `image/webp` +- `image/bmp` + +### Built-in Image Support + +Several SDK tools automatically handle images: + +- **FileEditorTool**: When viewing image files (`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.bmp`), they're automatically converted to base64 and sent to the LLM +- **BrowserUseTool**: Screenshots are captured and sent as base64 images +- **MCP Tools**: Image content from MCP tool results is automatically converted to base64 data URLs + +### Disabling Vision + +To disable vision for cost reduction (even on vision-capable models): + +```python +llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + usage_id="my-agent", + disable_vision=True, # Images will be filtered out +) +``` + + + +For a complete example, see the [image input example](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) in the SDK repository. + +## How do I handle MessageEvent in one-off tasks? + +**The SDK provides utilities to automatically respond to agent messages when running tasks end-to-end.** + +When running one-off tasks, some models may send a `MessageEvent` (proposing an action or asking for confirmation) instead of directly using tools. This causes `conversation.run()` to return, even though the agent hasn't finished the task. + + + +When an agent sends a message (via `MessageEvent`) instead of using the `finish` tool, the conversation ends because it's waiting for user input. In automated pipelines, there's no human to respond, so the task appears incomplete. + +**Key event types:** +- `ActionEvent`: Agent uses a tool (terminal, file editor, etc.) +- `MessageEvent`: Agent sends a text message (waiting for user response) +- `FinishAction`: Agent explicitly signals task completion + +The solution is to automatically send a "fake user response" when the agent sends a message, prompting it to continue. + + + + + +The [`run_conversation_with_fake_user_response`](https://github.com/OpenHands/benchmarks/blob/main/benchmarks/utils/fake_user_response.py) function wraps your conversation and automatically handles agent messages: + +```python +from openhands.sdk.conversation.state import ConversationExecutionStatus +from openhands.sdk.event import ActionEvent, MessageEvent +from openhands.sdk.tool.builtins.finish import FinishAction + +def run_conversation_with_fake_user_response(conversation, max_responses: int = 10): + """Run conversation, auto-responding to agent messages until finish or limit.""" + for _ in range(max_responses): + conversation.run() + if conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + break + events = list(conversation.state.events) + # Check if agent used finish tool + if any(isinstance(e, ActionEvent) and isinstance(e.action, FinishAction) for e in reversed(events)): + break + # Check if agent sent a message (needs response) + if not any(isinstance(e, MessageEvent) and e.source == "agent" for e in reversed(events)): + break + # Send continuation prompt + conversation.send_message( + "Please continue. Use the finish tool when done. DO NOT ask for human help." + ) +``` + + + + + +```python +from openhands.sdk import Agent, Conversation, LLM +from openhands.workspace import DockerWorkspace +from openhands.tools.preset.default import get_default_tools + +llm = LLM(model="anthropic/claude-sonnet-4-20250514", api_key="...") +agent = Agent(llm=llm, tools=get_default_tools()) +workspace = DockerWorkspace() +conversation = Conversation(agent=agent, workspace=workspace, max_iteration_per_run=100) + +conversation.send_message("Fix the bug in src/utils.py") +run_conversation_with_fake_user_response(conversation, max_responses=10) +# Results available in conversation.state.events +``` + + + + +**Pro tip:** Add a hint to your task prompt: +> "If you're 100% done with the task, use the finish action. Otherwise, keep going until you're finished." + +This encourages the agent to use the finish tool rather than asking for confirmation. + + +For the full implementation used in OpenHands benchmarks, see the [fake_user_response.py](https://github.com/OpenHands/benchmarks/blob/main/benchmarks/utils/fake_user_response.py) module. + +## More questions? + +If you have additional questions: + +- **[Join our Slack Community](https://openhands.dev/joinslack)** - Ask questions and get help from the community +- **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs, request features, or start a discussion + +### Getting Started +Source: https://docs.openhands.dev/sdk/getting-started.md + +The OpenHands SDK is a modular framework for building AI agents that interact with code, files, and system commands. Agents can execute bash commands, edit files, browse the web, and more. + +## Prerequisites + +Install the **[uv package manager](https://docs.astral.sh/uv/)** (version 0.8.13+): + +```bash +curl -LsSf https://astral.sh/uv/install.sh | sh +``` + +## Installation + +### Step 1: Acquire an LLM API Key + +The SDK requires an LLM API key from any [LiteLLM-supported provider](https://docs.litellm.ai/docs/providers). See our [recommended models](/openhands/usage/llms/llms) for best results. + + + + Bring your own API key from providers like: + - [Anthropic](https://console.anthropic.com/) + - [OpenAI](https://platform.openai.com/) + - [Other LiteLLM-supported providers](https://docs.litellm.ai/docs/providers) + + Example: + ```bash + export LLM_API_KEY="your-api-key" + uv run python examples/01_standalone_sdk/01_hello_world.py + ``` + + + + Sign up for [OpenHands Cloud](https://app.all-hands.dev) and get an LLM API key from the [API keys page](https://app.all-hands.dev/settings/api-keys). This gives you access to models verified to work well with OpenHands, with no markup. + + Example: + ```bash + export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" + uv run python examples/01_standalone_sdk/01_hello_world.py + ``` + + [Learn more →](/openhands/usage/llms/openhands-llms) + + + + If you have a ChatGPT Plus or Pro subscription, you can use `LLM.subscription_login()` to authenticate with your ChatGPT account and access Codex models without consuming API credits. + + ```python + from openhands.sdk import LLM + + llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") + ``` + + [Learn more →](/sdk/guides/llm-subscriptions) + + + +> Tip: Model name prefixes depend on your provider +> +> - If you bring your own provider key (Anthropic/OpenAI/etc.), use that provider's model name, e.g. `anthropic/claude-sonnet-4-5-20250929` +OpenHands supports [dozens of models](https://docs.openhands.dev/sdk/arch/llm#llm-providers), you can choose the model you want to try. +> - If you use OpenHands Cloud, use `openhands/`-prefixed models, e.g. `openhands/claude-sonnet-4-5-20250929` +> +> Many examples in the docs read the model from the `LLM_MODEL` environment variable. You can set it like: +> +> ```bash +> export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" # for OpenHands Provider +> ``` + +**Set Your API Key:** + +```bash +export LLM_API_KEY=your-api-key-here +``` + +### Step 2: Install the SDK + + + + ```bash + pip install openhands-sdk # Core SDK (openhands.sdk) + pip install openhands-tools # Built-in tools (openhands.tools) + # Optional: required for sandboxed workspaces in Docker or remote servers + pip install openhands-workspace # Workspace backends (openhands.workspace) + pip install openhands-agent-server # Remote agent server (openhands.agent_server) + ``` + + + + ```bash + # Clone the repository + git clone https://github.com/OpenHands/software-agent-sdk.git + cd software-agent-sdk + + # Install dependencies and setup development environment + make build + ``` + + + + +### Step 3: Run Your First Agent + +Here's a complete example that creates an agent and asks it to perform a simple task: + +```python icon="python" expandable examples/01_standalone_sdk/01_hello_world.py +import os + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), +) + +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) + +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) + +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("All done!") +``` + +Run the example: + +```bash +# Using a direct provider key (Anthropic/OpenAI/etc.) +uv run python examples/01_standalone_sdk/01_hello_world.py +``` + +```bash +# Using OpenHands Cloud +export LLM_MODEL="openhands/claude-sonnet-4-5-20250929" +uv run python examples/01_standalone_sdk/01_hello_world.py +``` + +You should see the agent understand your request, explore the project, and create a file with facts about it. + +## Core Concepts + +**Agent**: An AI-powered entity that can reason, plan, and execute actions using tools. + +**Tools**: Capabilities like executing bash commands, editing files, or browsing the web. + +**Workspace**: The execution environment where agents operate (local, Docker, or remote). + +**Conversation**: Manages the interaction lifecycle between you and the agent. + +## Basic Workflow + +1. **Configure LLM**: Choose model and provide API key +2. **Create Agent**: Use preset or custom configuration +3. **Add Tools**: Enable capabilities (bash, file editing, etc.) +4. **Start Conversation**: Create conversation context +5. **Send Message**: Provide task description +6. **Run Agent**: Agent executes until task completes or stops +7. **Get Result**: Review agent's output and actions + + +## Try More Examples + +The repository includes 24+ examples demonstrating various capabilities: + +```bash +# Simple hello world +uv run python examples/01_standalone_sdk/01_hello_world.py + +# Custom tools +uv run python examples/01_standalone_sdk/02_custom_tools.py + +# With skills +uv run python examples/01_standalone_sdk/03_activate_microagent.py + +# See all examples +ls examples/01_standalone_sdk/ +``` + + +## Next Steps + +### Explore Documentation + +- **[SDK Architecture](/sdk/arch/sdk)** - Deep dive into components +- **[Tool System](/sdk/arch/tool-system)** - Available tools +- **[Workspace Architecture](/sdk/arch/workspace)** - Execution environments +- **[LLM Configuration](/sdk/arch/llm)** - Deep dive into language model configuration + +### Build Custom Solutions + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools to expand agent capabilities +- **[MCP Integration](/sdk/guides/mcp)** - Connect to external tools via Model Context Protocol +- **[Docker Workspaces](/sdk/guides/agent-server/docker-sandbox)** - Sandbox agent execution in containers + +### Get Help + +- **[Slack Community](https://openhands.dev/joinslack)** - Ask questions and share projects +- **[GitHub Issues](https://github.com/OpenHands/software-agent-sdk/issues)** - Report bugs or request features +- **[Example Directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples)** - Browse working code samples + +### Browser Use +Source: https://docs.openhands.dev/sdk/guides/agent-browser-use.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The BrowserToolSet integration enables your agent to interact with web pages through automated browser control. Built +on top of [browser-use](https://github.com/browser-use/browser-use), it provides capabilities for navigating websites, clicking elements, filling forms, +and extracting content - all through natural language instructions. + +## How It Works + +The [ready-to-run example](#ready-to-run-example) demonstrates combining multiple tools to create a capable web research agent: + +1. **BrowserToolSet**: Provides automated browser control for web interaction +2. **FileEditorTool**: Allows the agent to read and write files if needed +3. **BashTool**: Enables command-line operations for additional functionality + +The agent uses these tools to: +- Navigate to specified URLs +- Interact with web page elements (clicking, scrolling, etc.) +- Extract and analyze content from web pages +- Summarize information from multiple sources + +In this example, the agent visits the openhands.dev blog, finds the latest blog post, and provides a summary of its main points. + +## Customization + +For advanced use cases requiring only a subset of browser tools or custom configurations, you can manually +register individual browser tools. Refer to the [BrowserToolSet definition](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/browser_use/definition.py) to see the available individual +tools and create a `BrowserToolExecutor` with customized tool configurations before constructing the Agent. +This gives you fine-grained control over which browser capabilities are exposed to the agent. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/15_browser_use.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/15_browser_use.py) + + +```python icon="python" expandable examples/01_standalone_sdk/15_browser_use.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.browser_use import BrowserToolSet +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=BrowserToolSet.name), +] + +# If you need fine-grained browser control, you can manually register individual browser +# tools by creating a BrowserToolExecutor and providing factories that return customized +# Tool instances before constructing the Agent. + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Could you go to https://openhands.dev/ blog page and summarize main " + "points of the latest blog?" +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external services + +### Creating Custom Agent +Source: https://docs.openhands.dev/sdk/guides/agent-custom.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +This guide demonstrates how to create custom agents tailored for specific use cases. Using the planning agent as a concrete example, you'll learn how to design specialized agents with custom tool sets, system prompts, and configurations that optimize performance for particular workflows. + + +This example is available on GitHub: [examples/01_standalone_sdk/24_planning_agent_workflow.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/24_planning_agent_workflow.py) + + + +The example showcases a two-phase workflow where a custom planning agent (with read-only tools) analyzes tasks and creates structured plans, followed by an execution agent that implements those plans with full editing capabilities. + +```python icon="python" expandable examples/01_standalone_sdk/24_planning_agent_workflow.py +#!/usr/bin/env python3 +""" +Planning Agent Workflow Example + +This example demonstrates a two-stage workflow: +1. Planning Agent: Analyzes the task and creates a detailed implementation plan +2. Execution Agent: Implements the plan with full editing capabilities + +The task: Create a Python web scraper that extracts article titles and URLs +from a news website, handles rate limiting, and saves results to JSON. +""" + +import os +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.sdk.llm import content_to_str +from openhands.tools.preset.default import get_default_agent +from openhands.tools.preset.planning import get_planning_agent + + +def get_event_content(event): + """Extract content from an event.""" + if hasattr(event, "llm_message"): + return "".join(content_to_str(event.llm_message.content)) + return str(event) + + +"""Run the planning agent workflow example.""" + +# Create a temporary workspace +workspace_dir = Path(tempfile.mkdtemp()) +print(f"Working in: {workspace_dir}") + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="agent", +) + +# Task description +task = """ +Create a Python web scraper with the following requirements: +- Scrape article titles and URLs from a news website +- Handle HTTP errors gracefully with retry logic +- Save results to a JSON file with timestamp +- Use requests and BeautifulSoup for scraping + +Do NOT ask for any clarifying questions. Directly create your implementation plan. +""" + +print("=" * 80) +print("PHASE 1: PLANNING") +print("=" * 80) + +# Create Planning Agent with read-only tools +planning_agent = get_planning_agent(llm=llm) + +# Create conversation for planning +planning_conversation = Conversation( + agent=planning_agent, + workspace=str(workspace_dir), +) + +# Run planning phase +print("Planning Agent is analyzing the task and creating implementation plan...") +planning_conversation.send_message( + f"Please analyze this web scraping task and create a detailed " + f"implementation plan:\n\n{task}" +) +planning_conversation.run() + +print("\n" + "=" * 80) +print("PLANNING COMPLETE") +print("=" * 80) +print(f"Implementation plan saved to: {workspace_dir}/PLAN.md") + +print("\n" + "=" * 80) +print("PHASE 2: EXECUTION") +print("=" * 80) + +# Create Execution Agent with full editing capabilities +execution_agent = get_default_agent(llm=llm, cli_mode=True) + +# Create conversation for execution +execution_conversation = Conversation( + agent=execution_agent, + workspace=str(workspace_dir), +) + +# Prepare execution prompt with reference to the plan file +execution_prompt = f""" +Please implement the web scraping project according to the implementation plan. + +The detailed implementation plan has been created and saved at: {workspace_dir}/PLAN.md + +Please read the plan from PLAN.md and implement all components according to it. + +Create all necessary files, implement the functionality, and ensure everything +works together properly. +""" + +print("Execution Agent is implementing the plan...") +execution_conversation.send_message(execution_prompt) +execution_conversation.run() + +# Get the last message from the conversation +execution_result = execution_conversation.state.events[-1] + +print("\n" + "=" * 80) +print("EXECUTION RESULT:") +print("=" * 80) +print(get_event_content(execution_result)) + +print("\n" + "=" * 80) +print("WORKFLOW COMPLETE") +print("=" * 80) +print(f"Project files created in: {workspace_dir}") + +# List created files +print("\nCreated files:") +for file_path in workspace_dir.rglob("*"): + if file_path.is_file(): + print(f" - {file_path.relative_to(workspace_dir)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Anatomy of a Custom Agent + +The planning agent demonstrates the two key components for creating specialized agent: + +### 1. Custom Tool Selection + +Choose tools that match your agent's specific role. Here's how the planning agent defines its tools: + +```python icon="python" + +def register_planning_tools() -> None: + """Register the planning agent tools.""" + from openhands.tools.glob import GlobTool + from openhands.tools.grep import GrepTool + from openhands.tools.planning_file_editor import PlanningFileEditorTool + + register_tool("GlobTool", GlobTool) + logger.debug("Tool: GlobTool registered.") + register_tool("GrepTool", GrepTool) + logger.debug("Tool: GrepTool registered.") + register_tool("PlanningFileEditorTool", PlanningFileEditorTool) + logger.debug("Tool: PlanningFileEditorTool registered.") + + +def get_planning_tools() -> list[Tool]: + """Get the planning agent tool specifications. + + Returns: + List of tools optimized for planning and analysis tasks, including + file viewing and PLAN.md editing capabilities for advanced + code discovery and navigation. + """ + register_planning_tools() + + return [ + Tool(name="GlobTool"), + Tool(name="GrepTool"), + Tool(name="PlanningFileEditorTool"), + ] +``` + +The planning agent uses: +- **GlobTool**: For discovering files and directories matching patterns +- **GrepTool**: For searching specific content across files +- **PlanningFileEditorTool**: For writing structured plans to `PLAN.md` only + +This read-only approach (except for `PLAN.md`) keeps the agent focused on analysis without implementation distractions. + +### 2. System Prompt Customization + +Custom agents can use specialized system prompts to guide behavior. The planning agent uses `system_prompt_planning.j2` with injected plan structure that enforces: +1. **Objective**: Clear goal statement +2. **Context Summary**: Relevant system components and constraints +3. **Approach Overview**: High-level strategy and rationale +4. **Implementation Steps**: Detailed step-by-step execution plan +5. **Testing and Validation**: Verification methods and success criteria + +### Complete Implementation Reference + +For a complete implementation example showing all these components working together, refer to the [planning agent preset source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/preset/planning.py). + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools for your use case +- **[Context Condenser](/sdk/guides/context-condenser)** - Optimize context management +- **[MCP Integration](/sdk/guides/mcp)** - Add MCP + +### Sub-Agent Delegation +Source: https://docs.openhands.dev/sdk/guides/agent-delegation.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## Overview + +Agent delegation allows a main agent to spawn multiple sub-agents and delegate tasks to them for parallel processing. Each sub-agent runs independently with its own conversation context and returns results that the main agent can consolidate and process further. + +This pattern is useful when: +- Breaking down complex problems into independent subtasks +- Processing multiple related tasks in parallel +- Separating concerns between different specialized sub-agents +- Improving throughput for parallelizable work + +## How It Works + +The delegation system consists of two main operations: + +### 1. Spawning Sub-Agents + +Before delegating work, the agent must first spawn sub-agents with meaningful identifiers: + +```python icon="python" wrap +# Agent uses the delegate tool to spawn sub-agents +{ + "command": "spawn", + "ids": ["lodging", "activities"] +} +``` + +Each spawned sub-agent: +- Gets a unique identifier that the agent specify (e.g., "lodging", "activities") +- Inherits the same LLM configuration as the parent agent +- Operates in the same workspace as the main agent +- Maintains its own independent conversation context + +### 2. Delegating Tasks + +Once sub-agents are spawned, the agent can delegate tasks to them: + +```python icon="python" wrap +# Agent uses the delegate tool to assign tasks +{ + "command": "delegate", + "tasks": { + "lodging": "Find the best budget-friendly areas to stay in London", + "activities": "List top 5 must-see attractions and hidden gems in London" + } +} +``` + +The delegate operation: +- Runs all sub-agent tasks in parallel using threads +- Blocks until all sub-agents complete their work +- Returns a single consolidated observation with all results +- Handles errors gracefully and reports them per sub-agent + +## Setting Up the DelegateTool + + + + ### Register the Tool + + ```python icon="python" wrap + from openhands.sdk.tool import register_tool + from openhands.tools.delegate import DelegateTool + + register_tool("DelegateTool", DelegateTool) + ``` + + + ### Add to Agent Tools + + ```python icon="python" wrap + from openhands.sdk import Tool + from openhands.tools.preset.default import get_default_tools + + tools = get_default_tools(enable_browser=False) + tools.append(Tool(name="DelegateTool")) + + agent = Agent(llm=llm, tools=tools) + ``` + + + ### Configure Maximum Sub-Agents (Optional) + + The user can limit the maximum number of concurrent sub-agents: + + ```python icon="python" wrap + from openhands.tools.delegate import DelegateTool + + class CustomDelegateTool(DelegateTool): + @classmethod + def create(cls, conv_state, max_children: int = 3): + # Only allow up to 3 sub-agents + return super().create(conv_state, max_children=max_children) + + register_tool("DelegateTool", CustomDelegateTool) + ``` + + + + +## Tool Commands + +### spawn + +Initialize sub-agents with meaningful identifiers. + +**Parameters:** +- `command`: `"spawn"` +- `ids`: List of string identifiers (e.g., `["research", "implementation", "testing"]`) + +**Returns:** +A message indicating the sub-agents were successfully spawned. + +**Example:** +```python icon="python" wrap +{ + "command": "spawn", + "ids": ["research", "implementation", "testing"] +} +``` + +### delegate + +Send tasks to specific sub-agents and wait for results. + +**Parameters:** +- `command`: `"delegate"` +- `tasks`: Dictionary mapping sub-agent IDs to task descriptions + +**Returns:** +A consolidated message containing all results from the sub-agents. + +**Example:** +```python icon="python" wrap +{ + "command": "delegate", + "tasks": { + "research": "Find best practices for async code", + "implementation": "Refactor the MyClass class", + "testing": "Write unit tests for the refactored code" + } +} +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/25_agent_delegation.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/25_agent_delegation.py) + + +```python icon="python" expandable examples/01_standalone_sdk/25_agent_delegation.py +""" +Agent Delegation Example + +This example demonstrates the agent delegation feature where a main agent +delegates tasks to sub-agents for parallel processing. +Each sub-agent runs independently and returns its results to the main agent, +which then merges both analyses into a single consolidated report. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + AgentContext, + Conversation, + Tool, + get_logger, +) +from openhands.sdk.context import Skill +from openhands.sdk.tool import register_tool +from openhands.tools.delegate import ( + DelegateTool, + DelegationVisualizer, + register_agent, +) +from openhands.tools.preset.default import get_default_tools + + +ONLY_RUN_SIMPLE_DELEGATION = False + +logger = get_logger(__name__) + +# Configure LLM and agent +# You can get an API key from https://app.all-hands.dev/settings/api-keys +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=os.environ.get("LLM_BASE_URL", None), + usage_id="agent", +) + +cwd = os.getcwd() + +register_tool("DelegateTool", DelegateTool) +tools = get_default_tools(enable_browser=False) +tools.append(Tool(name="DelegateTool")) + +main_agent = Agent( + llm=llm, + tools=tools, +) +conversation = Conversation( + agent=main_agent, + workspace=cwd, + visualizer=DelegationVisualizer(name="Delegator"), +) + +task_message = ( + "Forget about coding. Let's switch to travel planning. " + "Let's plan a trip to London. I have two issues I need to solve: " + "Lodging: what are the best areas to stay at while keeping budget in mind? " + "Activities: what are the top 5 must-see attractions and hidden gems? " + "Please use the delegation tools to handle these two tasks in parallel. " + "Make sure the sub-agents use their own knowledge " + "and dont rely on internet access. " + "They should keep it short. After getting the results, merge both analyses " + "into a single consolidated report.\n\n" +) +conversation.send_message(task_message) +conversation.run() + +conversation.send_message( + "Ask the lodging sub-agent what it thinks about Covent Garden." +) +conversation.run() + +# Report cost for simple delegation example +cost_1 = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST (simple delegation): {cost_1}") + +print("Simple delegation example done!", "\n" * 20) + + +# -------- Agent Delegation Second Part: User-Defined Agent Types -------- + +if ONLY_RUN_SIMPLE_DELEGATION: + exit(0) + + +def create_lodging_planner(llm: LLM) -> Agent: + """Create a lodging planner focused on London stays.""" + skills = [ + Skill( + name="lodging_planning", + content=( + "You specialize in finding great places to stay in London. " + "Provide 3-4 hotel recommendations with neighborhoods, quick " + "pros/cons, " + "and notes on transit convenience. Keep options varied by budget." + ), + trigger=None, + ) + ] + return Agent( + llm=llm, + tools=[], + agent_context=AgentContext( + skills=skills, + system_message_suffix="Focus only on London lodging recommendations.", + ), + ) + + +def create_activities_planner(llm: LLM) -> Agent: + """Create an activities planner focused on London itineraries.""" + skills = [ + Skill( + name="activities_planning", + content=( + "You design concise London itineraries. Suggest 2-3 daily " + "highlights, grouped by proximity to minimize travel time. " + "Include food/coffee stops " + "and note required tickets/reservations." + ), + trigger=None, + ) + ] + return Agent( + llm=llm, + tools=[], + agent_context=AgentContext( + skills=skills, + system_message_suffix="Plan practical, time-efficient days in London.", + ), + ) + + +# Register user-defined agent types (default agent type is always available) +register_agent( + name="lodging_planner", + factory_func=create_lodging_planner, + description="Finds London lodging options with transit-friendly picks.", +) +register_agent( + name="activities_planner", + factory_func=create_activities_planner, + description="Creates time-efficient London activity itineraries.", +) + +# Make the delegation tool available to the main agent +register_tool("DelegateTool", DelegateTool) + +main_agent = Agent( + llm=llm, + tools=[Tool(name="DelegateTool")], +) +conversation = Conversation( + agent=main_agent, + workspace=cwd, + visualizer=DelegationVisualizer(name="Delegator"), +) + +task_message = ( + "Plan a 3-day London trip. " + "1) Spawn two sub-agents: lodging_planner (hotel options) and " + "activities_planner (itinerary). " + "2) Ask lodging_planner for 3-4 central London hotel recommendations with " + "neighborhoods, quick pros/cons, and transit notes by budget. " + "3) Ask activities_planner for a concise 3-day itinerary with nearby stops, " + " food/coffee suggestions, and any ticket/reservation notes. " + "4) Share both sub-agent results and propose a combined plan." +) + +print("=" * 100) +print("Demonstrating London trip delegation (lodging + activities)...") +print("=" * 100) + +conversation.send_message(task_message) +conversation.run() + +conversation.send_message( + "Ask the lodging sub-agent what it thinks about Covent Garden." +) +conversation.run() + +# Report cost for user-defined agent types example +cost_2 = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST (user-defined agents): {cost_2}") + +print("All done!") + +# Full example cost report for CI workflow +print(f"EXAMPLE_COST: {cost_1 + cost_2}") +``` + + + +### Interactive Terminal +Source: https://docs.openhands.dev/sdk/guides/agent-interactive-terminal.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The `BashTool` provides agents with the ability to interact with terminal applications that require back-and-forth communication, such as Python's interactive mode, ipython, database CLIs, and other REPL environments. This enables agents to execute commands within these interactive sessions, receive output, and send follow-up commands based on the results. + + +## How It Works + +```python icon="python" focus={4-7} +cwd = os.getcwd() +register_tool("BashTool", BashTool) +tools = [ + Tool( + name="BashTool", + params={"no_change_timeout_seconds": 3}, + ) +] +``` + + +The `BashTool` is configured with a `no_change_timeout_seconds` parameter that determines how long to wait for terminal updates before sending the output back to the agent. + +In the example above, the agent should: +1. Enters Python's interactive mode by running `python3` +2. Executes Python code to get the current time +3. Exits the Python interpreter + +The `BashTool` maintains the session state throughout these interactions, allowing the agent to send multiple commands within the same terminal session. Review the [BashTool](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/terminal/definition.py) and [terminal source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-tools/openhands/tools/terminal/terminal/terminal_session.py) to better understand how the interactive session is configured and managed. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py) + + + +```python icon="python" expandable examples/01_standalone_sdk/06_interactive_terminal_w_reasoning.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + params={"no_change_timeout_seconds": 3}, + ) +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Enter python interactive mode by directly running `python3`, then tell me " + "the current time, and exit python interactive mode." +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create your own tools for specific use cases + +### API-based Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/api-sandbox.md + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +The API-sandboxed agent server demonstrates how to use `APIRemoteWorkspace` to connect to a [OpenHands runtime API service](https://runtime.all-hands.dev/). This eliminates the need to manage your own infrastructure, providing automatic scaling, monitoring, and secure sandboxed execution. + +## Key Concepts + +### APIRemoteWorkspace + +The `APIRemoteWorkspace` connects to a hosted runtime API service: + +```python icon="python" +with APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:main-python", +) as workspace: +``` + +This workspace type: +- Connects to a remote runtime API service +- Automatically provisions sandboxed environments +- Manages container lifecycle through the API +- Handles all infrastructure concerns + +### Runtime API Authentication + +The example requires a runtime API key for authentication: + +```python icon="python" +runtime_api_key = os.getenv("RUNTIME_API_KEY") +if not runtime_api_key: + logger.error("RUNTIME_API_KEY required") + exit(1) +``` + +This key authenticates your requests to the hosted runtime service. + +### Pre-built Image Selection + +You can specify which pre-built agent server image to use: + +```python icon="python" focus={4} +APIRemoteWorkspace( + runtime_api_url="https://runtime.eval.all-hands.dev", + runtime_api_key=runtime_api_key, + server_image="ghcr.io/openhands/agent-server:main-python", +) +``` + +The runtime API will pull and run the specified image in a sandboxed environment. + +### Workspace Testing + +Just like with `DockerWorkspace`, you can test the workspace before running the agent: + +```python icon="python" focus={1-3} +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info(f"Command completed: {result.exit_code}, {result.stdout}") +``` + +This verifies connectivity to the remote runtime and ensures the environment is ready. + +### Automatic RemoteConversation + +The conversation uses WebSocket communication with the remote server: + +```python icon="python" focus={1, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True +) +assert isinstance(conversation, RemoteConversation) +``` + +All agent execution happens on the remote runtime infrastructure. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py) + + +This example shows how to connect to a hosted runtime API for fully managed agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +"""Example: APIRemoteWorkspace with Dynamic Build. + +This example demonstrates building an agent-server image on-the-fly from the SDK +codebase and launching it in a remote sandboxed environment via Runtime API. + +Usage: + uv run examples/24_remote_convo_with_api_sandboxed_server.py + +Requirements: + - LLM_API_KEY: API key for LLM access + - RUNTIME_API_KEY: API key for runtime API access +""" + +import os +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import APIRemoteWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") +assert api_key, "LLM_API_KEY required" + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + +runtime_api_key = os.getenv("RUNTIME_API_KEY") +if not runtime_api_key: + logger.error("RUNTIME_API_KEY required") + exit(1) + + +# If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency +# Otherwise, use the latest image from main +server_image_sha = os.getenv("GITHUB_SHA") or "main" +server_image = f"ghcr.io/openhands/agent-server:{server_image_sha[:7]}-python-amd64" +logger.info(f"Using server image: {server_image}") + +with APIRemoteWorkspace( + runtime_api_url=os.getenv("RUNTIME_API_URL", "https://runtime.eval.all-hands.dev"), + runtime_api_key=runtime_api_key, + server_image=server_image, + image_pull_policy="Always", +) as workspace: + agent = get_default_agent(llm=llm, cli_mode=True) + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + received_events.append(event) + last_event_time["ts"] = time.time() + + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info(f"Command completed: {result.exit_code}, {result.stdout}") + + conversation = Conversation( + agent=agent, workspace=workspace, callbacks=[event_callback] + ) + assert isinstance(conversation, RemoteConversation) + + try: + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + conversation.run() + + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + + conversation.send_message("Great! Now delete that file.") + conversation.run() + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + conversation.close() +``` + +You can run the example code as-is. + +```bash Running the Example +export LLM_API_KEY="your-api-key" +# If using the OpenHands LLM proxy, set its base URL: +export LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" +export RUNTIME_API_KEY="your-runtime-api-key" +# Set the runtime API URL for the remote sandbox +export RUNTIME_API_URL="https://runtime.eval.all-hands.dev" +cd agent-sdk +uv run python examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py +``` + +## Next Steps + +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture + +### Apptainer Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/apptainer-sandbox.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#basic-apptainer-sandbox-example)! + +The Apptainer sandboxed agent server demonstrates how to run agents in isolated Apptainer containers using ApptainerWorkspace. + +Apptainer (formerly Singularity) is a container runtime designed for HPC environments that doesn't require root access, making it ideal for shared computing environments, university clusters, and systems where Docker is not available. + +## When to Use Apptainer + +Use Apptainer instead of Docker when: +- Running on HPC clusters or shared computing environments +- Root access is not available +- Docker daemon cannot be installed +- Working in academic or research computing environments +- Security policies restrict Docker usage + +## Prerequisites + +Before running this example, ensure you have: +- Apptainer installed ([Installation Guide](https://apptainer.org/docs/user/main/quick_start.html)) +- LLM API key set in environment + +## Basic Apptainer Sandbox Example + + +This example is available on GitHub: [examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py) + + +This example shows how to create an `ApptainerWorkspace` that automatically manages Apptainer containers for agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import ApptainerWorkspace + + +logger = get_logger(__name__) + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +# 2) Create an Apptainer-based remote workspace that will set up and manage +# the Apptainer container automatically. Use `ApptainerWorkspace` with a +# pre-built agent server image. +# Apptainer (formerly Singularity) doesn't require root access, making it +# ideal for HPC and shared computing environments. +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with ApptainerWorkspace( + # use pre-built image for faster startup + server_image=server_image, + host_port=8010, + platform=detect_platform(), +) as workspace: + # 3) Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # 4) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 5) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") + + # Report cost (must be before conversation.close()) + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + + + +## Configuration Options + +The `ApptainerWorkspace` supports several configuration options: + +### Option 1: Pre-built Image (Recommended) + +Use a pre-built agent server image for fastest startup: + +```python icon="python" focus={2} +with ApptainerWorkspace( + server_image="ghcr.io/openhands/agent-server:main-python", + host_port=8010, +) as workspace: + # Your code here +``` + +### Option 2: Build from Base Image + +Build from a base image when you need custom dependencies: + +```python icon="python" focus={2} +with ApptainerWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, +) as workspace: + # Your code here +``` + + +Building from a base image requires internet access and may take several minutes on first run. The built image is cached for subsequent runs. + + +### Option 3: Use Existing SIF File + +If you have a pre-built Apptainer SIF file: + +```python icon="python" focus={2} +with ApptainerWorkspace( + sif_file="/path/to/your/agent-server.sif", + host_port=8010, +) as workspace: + # Your code here +``` + +## Key Features + +### Rootless Container Execution + +Apptainer runs completely without root privileges: +- No daemon process required +- User namespace isolation +- Compatible with most HPC security policies + +### Image Caching + +Apptainer automatically caches container images: +- First run builds/pulls the image +- Subsequent runs reuse cached SIF files +- Cache location: `~/.cache/apptainer/` + +### Port Mapping + +The workspace exposes ports for agent services: +```python icon="python" focus={1, 3} +with ApptainerWorkspace( + server_image="ghcr.io/openhands/agent-server:main-python", + host_port=8010, # Maps to container port 8010 +) as workspace: + # Access agent server at http://localhost:8010 +``` + +## Differences from Docker + +While the API is similar to DockerWorkspace, there are some differences: + +| Feature | Docker | Apptainer | +|---------|--------|-----------| +| Root access required | Yes (daemon) | No | +| Installation | Requires Docker Engine | Single binary | +| Image format | OCI/Docker | SIF | +| Build speed | Fast (layers) | Slower (monolithic) | +| HPC compatibility | Limited | Excellent | +| Networking | Bridge/overlay | Host networking | + +## Troubleshooting + +### Apptainer Not Found + +If you see `apptainer: command not found`: +1. Install Apptainer following the [official guide](https://apptainer.org/docs/user/main/quick_start.html) +2. Ensure it's in your PATH: `which apptainer` + +### Permission Errors + +Apptainer should work without root. If you see permission errors: +- Check that your user has access to `/tmp` +- Verify Apptainer is properly installed: `apptainer version` +- Ensure the cache directory is writable: `ls -la ~/.cache/apptainer/` + +## Next Steps + +- **[Docker Sandbox](/sdk/guides/agent-server/docker-sandbox)** - Alternative container runtime +- **[API Sandbox](/sdk/guides/agent-server/api-sandbox)** - Remote API-based sandboxing +- **[Local Server](/sdk/guides/agent-server/local-server)** - Non-sandboxed local execution + +### OpenHands Cloud Workspace +Source: https://docs.openhands.dev/sdk/guides/agent-server/cloud-workspace.md + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The `OpenHandsCloudWorkspace` demonstrates how to use the [OpenHands Cloud](https://app.all-hands.dev) to provision and manage sandboxed environments for agent execution. This provides a seamless experience with automatic sandbox provisioning, monitoring, and secure execution without managing your own infrastructure. + +## Key Concepts + +### OpenHandsCloudWorkspace + +The `OpenHandsCloudWorkspace` connects to OpenHands Cloud to provision sandboxes: + +```python icon="python" focus={1-2} +with OpenHandsCloudWorkspace( + cloud_api_url="https://app.all-hands.dev", + cloud_api_key=cloud_api_key, +) as workspace: +``` + +This workspace type: +- Connects to OpenHands Cloud API +- Automatically provisions sandboxed environments +- Manages sandbox lifecycle (create, poll status, delete) +- Handles all infrastructure concerns + +### Getting Your API Key + +To use OpenHands Cloud, you need an API key: + +1. Go to [app.all-hands.dev](https://app.all-hands.dev) +2. Sign in to your account +3. Navigate to Settings → API Keys +4. Create a new API key + +Store this key securely and use it as the `OPENHANDS_CLOUD_API_KEY` environment variable. + + +### Configuration Options + +The `OpenHandsCloudWorkspace` supports several configuration options: + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `cloud_api_url` | `str` | Required | OpenHands Cloud API URL | +| `cloud_api_key` | `str` | Required | API key for authentication | +| `sandbox_spec_id` | `str \| None` | `None` | Custom sandbox specification ID | +| `init_timeout` | `float` | `300.0` | Timeout for sandbox initialization (seconds) | +| `api_timeout` | `float` | `60.0` | Timeout for API requests (seconds) | +| `keep_alive` | `bool` | `False` | Keep sandbox running after cleanup | + +### Keep Alive Mode + +By default, the sandbox is deleted when the workspace is closed. To keep it running: + +```python icon="python" focus={4} +workspace = OpenHandsCloudWorkspace( + cloud_api_url="https://app.all-hands.dev", + cloud_api_key=cloud_api_key, + keep_alive=True, +) +``` + +This is useful for debugging or when you want to inspect the sandbox state after execution. + +### Workspace Testing + +You can test the workspace before running the agent: + +```python icon="python" focus={1-3} +result = workspace.execute_command( + "echo 'Hello from OpenHands Cloud sandbox!' && pwd" +) +logger.info(f"Command completed: {result.exit_code}, {result.stdout}") +``` + +This verifies connectivity to the cloud sandbox and ensures the environment is ready. + +## Comparison with Other Workspace Types + +| Feature | OpenHandsCloudWorkspace | APIRemoteWorkspace | DockerWorkspace | +|---------|------------------------|-------------------|-----------------| +| Infrastructure | OpenHands Cloud | Runtime API | Local Docker | +| Authentication | API Key | API Key | None | +| Setup Required | None | Runtime API access | Docker installed | +| Custom Images | Via sandbox specs | Direct image specification | Direct image specification | +| Best For | Production use | Custom runtime environments | Local development | + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/07_convo_with_cloud_workspace.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/07_convo_with_cloud_workspace.py) + + +This example shows how to connect to OpenHands Cloud for fully managed agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/07_convo_with_cloud_workspace.py +"""Example: OpenHandsCloudWorkspace for OpenHands Cloud API. + +This example demonstrates using OpenHandsCloudWorkspace to provision a sandbox +via OpenHands Cloud (app.all-hands.dev) and run an agent conversation. + +Usage: + uv run examples/02_remote_agent_server/06_convo_with_cloud_workspace.py + +Requirements: + - LLM_API_KEY: API key for direct LLM provider access (e.g., Anthropic API key) + - OPENHANDS_CLOUD_API_KEY: API key for OpenHands Cloud access + +Note: + The LLM configuration is sent to the cloud sandbox, so you need an API key + that works directly with the LLM provider (not a local proxy). If using + Anthropic, set LLM_API_KEY to your Anthropic API key. +""" + +import os +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import OpenHandsCloudWorkspace + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") +assert api_key, "LLM_API_KEY required" + +# Note: Don't use a local proxy URL here - the cloud sandbox needs direct access +# to the LLM provider. Use None for base_url to let LiteLLM use the default +# provider endpoint, or specify the provider's direct URL. +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL") or None, + api_key=SecretStr(api_key), +) + +cloud_api_key = os.getenv("OPENHANDS_CLOUD_API_KEY") +if not cloud_api_key: + logger.error("OPENHANDS_CLOUD_API_KEY required") + exit(1) + +cloud_api_url = os.getenv("OPENHANDS_CLOUD_API_URL", "https://app.all-hands.dev") +logger.info(f"Using OpenHands Cloud API: {cloud_api_url}") + +with OpenHandsCloudWorkspace( + cloud_api_url=cloud_api_url, + cloud_api_key=cloud_api_key, +) as workspace: + agent = get_default_agent(llm=llm, cli_mode=True) + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + received_events.append(event) + last_event_time["ts"] = time.time() + + result = workspace.execute_command( + "echo 'Hello from OpenHands Cloud sandbox!' && pwd" + ) + logger.info(f"Command completed: {result.exit_code}, {result.stdout}") + + conversation = Conversation( + agent=agent, workspace=workspace, callbacks=[event_callback] + ) + assert isinstance(conversation, RemoteConversation) + + try: + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + conversation.run() + + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + + conversation.send_message("Great! Now delete that file.") + conversation.run() + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + conversation.close() + + logger.info("✅ Conversation completed successfully.") + logger.info(f"Total {len(received_events)} events received during conversation.") +``` + + +```bash Running the Example +export LLM_API_KEY="your-llm-api-key" +export OPENHANDS_CLOUD_API_KEY="your-cloud-api-key" +# Optional: specify a custom sandbox spec +# export OPENHANDS_SANDBOX_SPEC_ID="your-sandbox-spec-id" +cd agent-sdk +uv run python examples/02_remote_agent_server/07_convo_with_cloud_workspace.py +``` + +## Next Steps + +- **[API-based Sandbox](/sdk/guides/agent-server/api-sandbox)** - Connect to Runtime API service +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run locally with Docker +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Development without containers +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details + +### Custom Tools with Remote Agent Server +Source: https://docs.openhands.dev/sdk/guides/agent-server/custom-tools.md + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +When using a [remote agent server](/sdk/guides/agent-server/overview), custom tools must be available in the server's Python environment. This guide shows how to build a custom base image with your tools and use `DockerDevWorkspace` to automatically build the agent server on top of it. + + +For standalone custom tools (without remote agent server), see the [Custom Tools guide](/sdk/guides/custom-tools). + + +## How It Works + +1. **Define custom tool** with `register_tool()` at module level +2. **Create Dockerfile** that copies tools and sets `PYTHONPATH` +3. **Build custom base image** with your tools +4. **Use `DockerDevWorkspace`** with `base_image` parameter - it builds the agent server on top +5. **Import tool module** in client before creating conversation +6. **Server imports modules** dynamically, triggering registration + +## Key Files + +### Custom Tool (`custom_tools/log_data.py`) + +```python icon="python" expandable examples/02_remote_agent_server/06_custom_tool/custom_tools/log_data.py +"""Log Data Tool - Example custom tool for logging structured data to JSON. + +This tool demonstrates how to create a custom tool that logs structured data +to a local JSON file during agent execution. The data can be retrieved and +verified after the agent completes. +""" + +import json +from collections.abc import Sequence +from datetime import UTC, datetime +from enum import Enum +from pathlib import Path +from typing import Any + +from pydantic import Field + +from openhands.sdk import ( + Action, + ImageContent, + Observation, + TextContent, + ToolDefinition, +) +from openhands.sdk.tool import ToolExecutor, register_tool + + +# --- Enums and Models --- + + +class LogLevel(str, Enum): + """Log level for entries.""" + + DEBUG = "debug" + INFO = "info" + WARNING = "warning" + ERROR = "error" + + +class LogDataAction(Action): + """Action to log structured data to a JSON file.""" + + message: str = Field(description="The log message") + level: LogLevel = Field( + default=LogLevel.INFO, + description="Log level (debug, info, warning, error)", + ) + data: dict[str, Any] = Field( + default_factory=dict, + description="Additional structured data to include in the log entry", + ) + + +class LogDataObservation(Observation): + """Observation returned after logging data.""" + + success: bool = Field(description="Whether the data was successfully logged") + log_file: str = Field(description="Path to the log file") + entry_count: int = Field(description="Total number of entries in the log file") + + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + """Convert observation to LLM content.""" + if self.success: + return [ + TextContent( + text=( + f"✅ Data logged successfully to {self.log_file}\n" + f"Total entries: {self.entry_count}" + ) + ) + ] + return [TextContent(text="❌ Failed to log data")] + + +# --- Executor --- + +# Default log file path +DEFAULT_LOG_FILE = "/tmp/agent_data.json" + + +class LogDataExecutor(ToolExecutor[LogDataAction, LogDataObservation]): + """Executor that logs structured data to a JSON file.""" + + def __init__(self, log_file: str = DEFAULT_LOG_FILE): + """Initialize the log data executor. + + Args: + log_file: Path to the JSON log file + """ + self.log_file = Path(log_file) + + def __call__( + self, + action: LogDataAction, + conversation=None, # noqa: ARG002 + ) -> LogDataObservation: + """Execute the log data action. + + Args: + action: The log data action + conversation: Optional conversation context (not used) + + Returns: + LogDataObservation with the result + """ + # Load existing entries or start fresh + entries: list[dict[str, Any]] = [] + if self.log_file.exists(): + try: + with open(self.log_file) as f: + entries = json.load(f) + except (json.JSONDecodeError, OSError): + entries = [] + + # Create new entry with timestamp + entry = { + "timestamp": datetime.now(UTC).isoformat(), + "level": action.level.value, + "message": action.message, + "data": action.data, + } + entries.append(entry) + + # Write back to file + self.log_file.parent.mkdir(parents=True, exist_ok=True) + with open(self.log_file, "w") as f: + json.dump(entries, f, indent=2) + + return LogDataObservation( + success=True, + log_file=str(self.log_file), + entry_count=len(entries), + ) + + +# --- Tool Definition --- + +_LOG_DATA_DESCRIPTION = """Log structured data to a JSON file. + +Use this tool to record information, findings, or events during your work. +Each log entry includes a timestamp and can contain arbitrary structured data. + +Parameters: +* message: A descriptive message for the log entry +* level: Log level - one of 'debug', 'info', 'warning', 'error' (default: info) +* data: Optional dictionary of additional structured data to include + +Example usage: +- Log a finding: message="Found potential issue", level="warning", data={"file": "app.py", "line": 42} +- Log progress: message="Completed analysis", level="info", data={"files_checked": 10} +""" # noqa: E501 + + +class LogDataTool(ToolDefinition[LogDataAction, LogDataObservation]): + """Tool for logging structured data to a JSON file.""" + + @classmethod + def create(cls, conv_state, **params) -> Sequence[ToolDefinition]: # noqa: ARG003 + """Create LogDataTool instance. + + Args: + conv_state: Conversation state (not used in this example) + **params: Additional parameters: + - log_file: Path to the JSON log file (default: /tmp/agent_data.json) + + Returns: + A sequence containing a single LogDataTool instance + """ + log_file = params.get("log_file", DEFAULT_LOG_FILE) + executor = LogDataExecutor(log_file=log_file) + + return [ + cls( + description=_LOG_DATA_DESCRIPTION, + action_type=LogDataAction, + observation_type=LogDataObservation, + executor=executor, + ) + ] + + +# Auto-register the tool when this module is imported +# This is what enables dynamic tool registration in the remote agent server +register_tool("LogDataTool", LogDataTool) +``` + +### Dockerfile + +```dockerfile icon="docker" +FROM nikolaik/python-nodejs:python3.12-nodejs22 + +COPY custom_tools /app/custom_tools +ENV PYTHONPATH="/app:${PYTHONPATH}" +``` + +## Troubleshooting + +| Issue | Solution | +|-------|----------| +| Tool not found | Ensure `register_tool()` is called at module level, import tool before creating conversation | +| Import errors on server | Check `PYTHONPATH` in Dockerfile, verify all dependencies installed | +| Build failures | Verify file paths in `COPY` commands, ensure Python 3.12+ | + + +**Binary Mode Limitation**: Custom tools only work with **source mode** deployments. When using `DockerDevWorkspace`, set `target="source"` (the default). See [GitHub issue #1531](https://github.com/OpenHands/software-agent-sdk/issues/1531) for details. + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/06_custom_tool/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/02_remote_agent_server/06_custom_tool) + + +```python icon="python" expandable examples/02_remote_agent_server/06_custom_tool/custom_tool_example.py +"""Example: Using custom tools with remote agent server. + +This example demonstrates how to use custom tools with a remote agent server +by building a custom base image that includes the tool implementation. + +Prerequisites: + 1. Build the custom base image first: + cd examples/02_remote_agent_server/05_custom_tool + ./build_custom_image.sh + + 2. Set LLM_API_KEY environment variable + +The workflow is: +1. Define a custom tool (LogDataTool for logging structured data to JSON) +2. Create a simple Dockerfile that copies the tool into the base image +3. Build the custom base image +4. Use DockerDevWorkspace with base_image pointing to the custom image +5. DockerDevWorkspace builds the agent server on top of the custom base image +6. The server dynamically registers tools when the client creates a conversation +7. The agent can use the custom tool during execution +8. Verify the logged data by reading the JSON file from the workspace + +This pattern is useful for: +- Collecting structured data during agent runs (logs, metrics, events) +- Implementing custom integrations with external systems +- Adding domain-specific operations to the agent +""" + +import os +import platform +import subprocess +import sys +import time +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + Tool, + get_logger, +) +from openhands.workspace import DockerDevWorkspace + + +logger = get_logger(__name__) + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +# Get the directory containing this script +example_dir = Path(__file__).parent.absolute() + +# Custom base image tag (contains custom tools, agent server built on top) +CUSTOM_BASE_IMAGE_TAG = "custom-base-image:latest" + +# 2) Check if custom base image exists, build if not +logger.info(f"🔍 Checking for custom base image: {CUSTOM_BASE_IMAGE_TAG}") +result = subprocess.run( + ["docker", "images", "-q", CUSTOM_BASE_IMAGE_TAG], + capture_output=True, + text=True, + check=False, +) + +if not result.stdout.strip(): + logger.info("⚠️ Custom base image not found. Building...") + logger.info("📦 Building custom base image with custom tools...") + build_script = example_dir / "build_custom_image.sh" + try: + subprocess.run( + [str(build_script), CUSTOM_BASE_IMAGE_TAG], + cwd=str(example_dir), + check=True, + ) + logger.info("✅ Custom base image built successfully!") + except subprocess.CalledProcessError as e: + logger.error(f"❌ Failed to build custom base image: {e}") + logger.error("Please run ./build_custom_image.sh manually and fix any errors.") + sys.exit(1) +else: + logger.info(f"✅ Custom base image found: {CUSTOM_BASE_IMAGE_TAG}") + +# 3) Create a DockerDevWorkspace with the custom base image +# DockerDevWorkspace will build the agent server on top of this base image +logger.info("🚀 Building and starting agent server with custom tools...") +logger.info("📦 This may take a few minutes on first run...") + +with DockerDevWorkspace( + base_image=CUSTOM_BASE_IMAGE_TAG, + host_port=8011, + platform=detect_platform(), + target="source", # NOTE: "binary" target does not work with custom tools +) as workspace: + logger.info("✅ Custom agent server started!") + + # 4) Import custom tools to register them in the client's registry + # This allows the client to send the module qualname to the server + # The server will then import the same module and execute the tool + import custom_tools.log_data # noqa: F401 + + # 5) Create agent with custom tools + # Note: We specify the tool here, but it's actually executed on the server + # Get default tools and add our custom tool + from openhands.sdk import Agent + from openhands.tools.preset.default import get_default_condenser, get_default_tools + + tools = get_default_tools(enable_browser=False) + # Add our custom tool! + tools.append(Tool(name="LogDataTool")) + + agent = Agent( + llm=llm, + tools=tools, + system_prompt_kwargs={"cli_mode": True}, + condenser=get_default_condenser( + llm=llm.model_copy(update={"usage_id": "condenser"}) + ), + ) + + # 6) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 7) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Custom agent server ready!' && python --version" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + + # 8) Create conversation with the custom agent + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + logger.info("📝 Sending task to analyze files and log findings...") + conversation.send_message( + "Please analyze the Python files in the current directory. " + "Use the LogDataTool to log your findings as you work. " + "For example:\n" + "- Log when you start analyzing a file (level: info)\n" + "- Log any interesting patterns you find (level: info)\n" + "- Log any potential issues (level: warning)\n" + "- Include relevant data like file names, line numbers, etc.\n\n" + "Make at least 3 log entries using the LogDataTool." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ Task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + # 9) Read the logged data from the JSON file using file_download API + logger.info("\n📊 Logged Data Summary:") + logger.info("=" * 80) + + # Download the log file from the workspace using the file download API + import json + import tempfile + + with tempfile.NamedTemporaryFile( + mode="w", suffix=".json", delete=False + ) as tmp_file: + local_path = tmp_file.name + + download_result = workspace.file_download( + source_path="/tmp/agent_data.json", + destination_path=local_path, + ) + + if download_result.success: + try: + with open(local_path) as f: + log_entries = json.load(f) + logger.info(f"Found {len(log_entries)} log entries:\n") + for i, entry in enumerate(log_entries, 1): + logger.info(f"Entry {i}:") + logger.info(f" Timestamp: {entry.get('timestamp', 'N/A')}") + logger.info(f" Level: {entry.get('level', 'N/A')}") + logger.info(f" Message: {entry.get('message', 'N/A')}") + if entry.get("data"): + logger.info(f" Data: {json.dumps(entry['data'], indent=4)}") + logger.info("") + except json.JSONDecodeError: + logger.info("Log file exists but couldn't parse JSON") + with open(local_path) as f: + logger.info(f"Raw content: {f.read()}") + finally: + # Clean up the temporary file + Path(local_path).unlink(missing_ok=True) + else: + logger.info("No log file found (agent may not have used the tool)") + if download_result.error: + logger.debug(f"Download error: {download_result.error}") + + logger.info("=" * 80) + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") + + finally: + logger.info("\n🧹 Cleaning up conversation...") + conversation.close() + +logger.info("\n✅ Example completed successfully!") +logger.info("\nThis example demonstrated how to:") +logger.info("1. Create a custom tool that logs structured data to JSON") +logger.info("2. Build a simple base image with the custom tool") +logger.info("3. Use DockerDevWorkspace with base_image to build agent server on top") +logger.info("4. Enable dynamic tool registration on the server") +logger.info("5. Use the custom tool during agent execution") +logger.info("6. Read the logged data back from the workspace") +``` + +```bash Running the Example +# Build the custom base image first +cd examples/02_remote_agent_server/06_custom_tool +./build_custom_image.sh + +# Run the example +export LLM_API_KEY="your-api-key" +uv run python custom_tool_example.py +``` + + +## Next Steps + +- **[Custom Tools (Standalone)](/sdk/guides/custom-tools)** - For local execution without remote server +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Understanding remote agent servers + +### Docker Sandbox +Source: https://docs.openhands.dev/sdk/guides/agent-server/docker-sandbox.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +The docker sandboxed agent server demonstrates how to run agents in isolated Docker containers using `DockerWorkspace`. + +This provides complete isolation from the host system, making it ideal for production deployments, testing, and executing untrusted code safely. + +Use `DockerWorkspace` with a pre-built agent server image for the fastest startup. When you need to build your own image from a base image, switch to `DockerDevWorkspace`. + +the Docker sandbox image ships with features configured in the [Dockerfile](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-agent-server/openhands/agent_server/docker/Dockerfile) (e.g., secure defaults and services like VSCode and VNC exposed behind well-defined ports), which are not available in the local (non-Docker) agent server. + +## 1) Basic Docker Sandbox + +> A ready-to-run example is available [here](#ready-to-run-example-docker-sandbox)! + +### Key Concepts + +#### DockerWorkspace Context Manager + +The `DockerWorkspace` uses a context manager to automatically handle container lifecycle: + +```python icon="python" +with DockerWorkspace( + # use pre-built image for faster startup (recommended) + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), +) as workspace: + # Container is running here + # Work with the workspace + pass +# Container is automatically stopped and cleaned up here +``` + +The workspace automatically: +- Pulls or builds the Docker image +- Starts the container with an agent server +- Waits for the server to be ready +- Cleans up the container when done + +#### Platform Detection + +The example includes platform detection to ensure the correct Docker image is built and used: + +```python icon="python" +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" +``` + +This ensures compatibility across different CPU architectures (Intel/AMD vs ARM/Apple Silicon). + + +#### Testing the Workspace + +Before creating a conversation, the example tests the workspace connection: + +```python icon="python" +result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" +) +logger.info( + f"Command '{result.command}' completed" + f"with exit code {result.exit_code}" +) +logger.info(f"Output: {result.stdout}") +``` + +This verifies the workspace is properly initialized and can execute commands. + +#### Automatic RemoteConversation + +When you use a DockerWorkspace, the Conversation automatically becomes a RemoteConversation: + +```python icon="python" focus={1, 3, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` + +The SDK detects the remote workspace and uses WebSocket communication for real-time event streaming. + + +#### DockerWorkspace vs DockerDevWorkspace + +Use `DockerWorkspace` when you can rely on the official pre-built images for the agent server. Switch to `DockerDevWorkspace` when you need to build or customize the image on-demand (slower startup, requires the SDK source tree and Docker build support). + +```python icon="python" +# ✅ Fast: Use pre-built image (recommended) +DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, +) + +# 🛠️ Custom: Build on the fly (requires SDK tooling) +DockerDevWorkspace( + base_image="nikolaik/python-nodejs:python3.12-nodejs22", + host_port=8010, + target="source", +) +``` + +### Ready-tu-run Example Docker Sandbox + +This example is available on GitHub: [examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py) + + +This example shows how to create a DockerWorkspace that automatically manages Docker containers for agent execution: + +```python icon="python" expandable examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + RemoteConversation, + get_logger, +) +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + +# 1) Ensure we have LLM API key +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +# 2) Create a Docker-based remote workspace that will set up and manage +# the Docker container automatically. Use `DockerWorkspace` with a pre-built +# image or `DockerDevWorkspace` to automatically build the image on-demand. +# with DockerDevWorkspace( +# # dynamically build agent-server image +# base_image="nikolaik/python-nodejs:python3.13-nodejs22", +# host_port=8010, +# platform=detect_platform(), +# ) as workspace: +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + # use pre-built image for faster startup + server_image=server_image, + host_port=8010, + platform=detect_platform(), +) as workspace: + # 3) Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # 4) Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # 5) Test the workspace with a simple command + result = workspace.execute_command( + "echo 'Hello from sandboxed environment!' && pwd" + ) + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + logger.info("🚀 Running conversation...") + conversation.run() + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to settle (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - last_event_time["ts"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + finally: + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + + + + +--- + +## 2) VS Code in Docker Sandbox + +> A ready-to-run example is available [here](#ready-to-run-example-vs-code)! + +VS Code with Docker demonstrates how to enable VS Code Web integration in a Docker-sandboxed environment. This allows you to access a full VS Code editor running in the container, making it easy to inspect, edit, and manage files that the agent is working with. + +### Key Concepts + +#### VS Code-Enabled DockerWorkspace + +The workspace is configured with extra ports for VS Code access: + +```python icon="python" focus={1, 5} +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=18010, + platform="linux/arm64", # or "linux/amd64" depending on your architecture + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to access VSCode at localhost:18011""" +``` + +The `extra_ports=True` setting exposes: +- Port `host_port+1`: VS Code Web interface (host_port + 1) +- Port `host_port+2`: VNC viewer for visual access + +If you need to customize the agent-server image, swap in `DockerDevWorkspace` with the same parameters and provide `base_image`/`target` to build on demand. + +#### VS Code URL Generation + +The example retrieves the VS Code URL with authentication token: + +```python icon="python" +# Get VSCode URL with token +vscode_port = (workspace.host_port or 8010) + 1 +try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) +except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" +``` + +This generates a properly authenticated URL with the workspace directory pre-opened. + +#### VS Code URL Format + +```text +http://localhost:{vscode_port}/?tkn={token}&folder={workspace_dir} +``` +where: +- `vscode_port`: Usually host_port + 1 (e.g., 8011) +- `token`: Authentication token for security +- `workspace_dir`: Workspace directory to open + +### Ready-to-run Example VS Code + + +This example is available on GitHub: [examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py) + + + +```python icon="python" expandable examples/02_remote_agent_server/05_vscode_with_docker_sandboxed_server.py +import os +import platform +import time + +import httpx +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +# Create a Docker-based remote workspace with extra ports for VSCode access +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + server_image=server_image, + host_port=18010, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to access VSCode at localhost:18011""" + + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, + ) + + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + logger.info("📝 Sending first message...") + conversation.send_message("Create a simple Python script that prints Hello World") + conversation.run() + + # Get VSCode URL with token + vscode_port = (workspace.host_port or 8010) + 1 + try: + response = httpx.get( + f"{workspace.host}/api/vscode/url", + params={"workspace_dir": workspace.working_dir}, + ) + vscode_data = response.json() + vscode_url = vscode_data.get("url", "").replace( + "localhost:8001", f"localhost:{vscode_port}" + ) + except Exception: + # Fallback if server route not available + folder = ( + f"/{workspace.working_dir}" + if not str(workspace.working_dir).startswith("/") + else str(workspace.working_dir) + ) + vscode_url = f"http://localhost:{vscode_port}/?folder={folder}" + + # Wait for user to explore VSCode + y = None + while y != "y": + y = input( + "\n" + "Because you've enabled extra_ports=True in DockerDevWorkspace, " + "you can open VSCode Web to see the workspace.\n\n" + f"VSCode URL: {vscode_url}\n\n" + "The VSCode should have the OpenHands settings extension installed:\n" + " - Dark theme enabled\n" + " - Auto-save enabled\n" + " - Telemetry disabled\n" + " - Auto-updates disabled\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + + + +--- + +## 3) Browser in Docker Sandbox +> A ready-to-run example is available [here](#ready-to-run-example-browser)! + +Browser with Docker demonstrates how to enable browser automation capabilities in a Docker-sandboxed environment. This allows agents to browse websites, interact with web content, and perform web automation tasks while maintaining complete isolation from your host system. + +### Key Concepts + +#### Browser-Enabled DockerWorkspace + +The workspace is configured with extra ports for browser access: + +```python icon="python" focus={1-5} +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest-python", + host_port=8010, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" +``` + +The `extra_ports=True` setting exposes additional ports for: +- Port `host_port+1`: VS Code Web interface +- Port `host_port+2`: VNC viewer for browser visualization + +If you need to pre-build a custom browser image, replace `DockerWorkspace` with `DockerDevWorkspace` and provide `base_image`/`target` to build before launch. + + +#### Enabling Browser Tools + +Browser tools are enabled by setting `cli_mode=False`: + +```python icon="python" focus={2, 4} +# Create agent with browser tools enabled +agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools +) +``` + +When `cli_mode=False`, the agent gains access to browser automation tools for web interaction. + +When VNC is available and `extra_ports=True`, the browser will be opened in the VNC desktop to visualize agent's work. You can watch the browser in real-time via VNC. Demo video: + + +#### VNC Access + +The VNC interface provides real-time visual access to the browser: + +```text +http://localhost:8012/vnc.html?autoconnect=1&resize=remote +``` + +- `autoconnect=1`: Automatically connect to VNC server +- `resize=remote`: Automatically adjust resolution + +--- + +### Ready-to-run Example Browser + + +This example is available on GitHub: [examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py) + + +This example shows how to configure `DockerWorkspace` with browser capabilities and VNC access: + +```python icon="python" expandable examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py +import os +import platform +import time + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, get_logger +from openhands.sdk.conversation.impl.remote_conversation import RemoteConversation +from openhands.tools.preset.default import get_default_agent +from openhands.workspace import DockerWorkspace + + +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + + +def detect_platform(): + """Detects the correct Docker platform string.""" + machine = platform.machine().lower() + if "arm" in machine or "aarch64" in machine: + return "linux/arm64" + return "linux/amd64" + + +def get_server_image(): + """Get the server image tag, using PR-specific image in CI.""" + platform_str = detect_platform() + arch = "arm64" if "arm64" in platform_str else "amd64" + # If GITHUB_SHA is set (e.g. running in CI of a PR), use that to ensure consistency + # Otherwise, use the latest image from main + github_sha = os.getenv("GITHUB_SHA") + if github_sha: + return f"ghcr.io/openhands/agent-server:{github_sha[:7]}-python-{arch}" + return "ghcr.io/openhands/agent-server:latest-python" + + +# Create a Docker-based remote workspace with extra ports for browser access. +# Use `DockerWorkspace` with a pre-built image or `DockerDevWorkspace` to +# automatically build the image on-demand. +# with DockerDevWorkspace( +# # dynamically build agent-server image +# base_image="nikolaik/python-nodejs:python3.13-nodejs22", +# host_port=8010, +# platform=detect_platform(), +# ) as workspace: +server_image = get_server_image() +logger.info(f"Using server image: {server_image}") +with DockerWorkspace( + server_image=server_image, + host_port=8011, + platform=detect_platform(), + extra_ports=True, # Expose extra ports for VSCode and VNC +) as workspace: + """Extra ports allows you to check localhost:8012 for VNC""" + + # Create agent with browser tools enabled + agent = get_default_agent( + llm=llm, + cli_mode=False, # CLI mode = False will enable browser tools + ) + + # Set up callback collection + received_events: list = [] + last_event_time = {"ts": time.time()} + + def event_callback(event) -> None: + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + last_event_time["ts"] = time.time() + + # Create RemoteConversation using the workspace + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + logger.info("📝 Sending first message...") + conversation.send_message( + "Could you go to https://openhands.dev/ blog page and summarize main " + "points of the latest blog?" + ) + conversation.run() + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + + if os.getenv("CI"): + logger.info( + "CI environment detected; skipping interactive prompt and closing workspace." # noqa: E501 + ) + else: + # Wait for user confirm to exit when running locally + y = None + while y != "y": + y = input( + "Because you've enabled extra_ports=True in DockerDevWorkspace, " + "you can open a browser tab to see the *actual* browser OpenHands " + "is interacting with via VNC.\n\n" + "Link: http://localhost:8012/vnc.html?autoconnect=1&resize=remote\n\n" + "Press 'y' and Enter to exit and terminate the workspace.\n" + ">> " + ) +``` + + + +## Next Steps + +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture + +### Local Agent Server +Source: https://docs.openhands.dev/sdk/guides/agent-server/local-server.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The Local Agent Server demonstrates how to run a remote agent server locally and connect to it using `RemoteConversation`. This pattern is useful for local development, testing, and scenarios where you want to separate the client code from the agent execution environment. + +## Key Concepts + +### Managed API Server + +The ready-to-run example includes a `ManagedAPIServer` context manager that handles starting and stopping the server subprocess: + +```python icon="python" focus={1, 2, 4, 5} +class ManagedAPIServer: + """Context manager for subprocess-managed OpenHands API server.""" + + def __enter__(self): + """Start the API server subprocess.""" + self.process = subprocess.Popen( + [ + "python", + "-m", + "openhands.agent_server", + "--port", + str(self.port), + "--host", + self.host, + ], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + env={"LOG_JSON": "true", **os.environ}, + ) +``` + +The server starts with `python -m openhands.agent_server` and automatically handles health checks to ensure it's ready before proceeding. + +### Remote Workspace + +When connecting to a remote server, you need to provide a `Workspace` that connects to that server: + +```python icon="python" +workspace = Workspace(host=server.base_url) +result = workspace.execute_command("pwd") +``` + +When `host` is provided, the `Workspace` returns an instance of `RemoteWorkspace` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/workspace.py)). +The `Workspace` object communicates with the remote server's API to execute commands and manage files. + +### RemoteConversation + +When you pass a remote `Workspace` to `Conversation`, it automatically becomes a `RemoteConversation` ([source](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/conversation.py)): + +```python icon="python" focus={1, 3, 7} +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + visualize=True, +) +assert isinstance(conversation, RemoteConversation) +``` + +`RemoteConversation` handles communication with the remote agent server over WebSocket for real-time event streaming. + +### Event Callbacks + +Callbacks receive events in real-time as they happen on the remote server: + +```python icon="python" +def event_callback(event): + """Callback to capture events for testing.""" + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + event_tracker["last_event_time"] = time.time() +``` + +This enables monitoring agent activity, tracking progress, and implementing custom event handling logic. + +### Conversation State + +The conversation state provides access to all events and status: + +```python icon="python" +# Count total events using state.events +total_events = len(conversation.state.events) +logger.info(f"📈 Total events in conversation: {total_events}") + +# Get recent events (last 5) using state.events +all_events = conversation.state.events +recent_events = all_events[-5:] if len(all_events) >= 5 else all_events +``` + +This allows you to inspect the conversation history, analyze agent behavior, and build custom monitoring tools. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/02_remote_agent_server/01_convo_with_local_agent_server.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py) + + +This example shows how to programmatically start a local agent server and interact with it through a `RemoteConversation`: + +```python icon="python" expandable examples/02_remote_agent_server/01_convo_with_local_agent_server.py +import os +import subprocess +import sys +import threading +import time + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation, RemoteConversation, Workspace, get_logger +from openhands.sdk.event import ConversationStateUpdateEvent +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + + +def _stream_output(stream, prefix, target_stream): + """Stream output from subprocess to target stream with prefix.""" + try: + for line in iter(stream.readline, ""): + if line: + target_stream.write(f"[{prefix}] {line}") + target_stream.flush() + except Exception as e: + print(f"Error streaming {prefix}: {e}", file=sys.stderr) + finally: + stream.close() + + +class ManagedAPIServer: + """Context manager for subprocess-managed OpenHands API server.""" + + def __init__(self, port: int = 8000, host: str = "127.0.0.1"): + self.port: int = port + self.host: str = host + self.process: subprocess.Popen[str] | None = None + self.base_url: str = f"http://{host}:{port}" + self.stdout_thread: threading.Thread | None = None + self.stderr_thread: threading.Thread | None = None + + def __enter__(self): + """Start the API server subprocess.""" + print(f"Starting OpenHands API server on {self.base_url}...") + + # Start the server process + self.process = subprocess.Popen( + [ + "python", + "-m", + "openhands.agent_server", + "--port", + str(self.port), + "--host", + self.host, + ], + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + env={"LOG_JSON": "true", **os.environ}, + ) + + # Start threads to stream stdout and stderr + assert self.process is not None + assert self.process.stdout is not None + assert self.process.stderr is not None + self.stdout_thread = threading.Thread( + target=_stream_output, + args=(self.process.stdout, "SERVER", sys.stdout), + daemon=True, + ) + self.stderr_thread = threading.Thread( + target=_stream_output, + args=(self.process.stderr, "SERVER", sys.stderr), + daemon=True, + ) + + self.stdout_thread.start() + self.stderr_thread.start() + + # Wait for server to be ready + max_retries = 30 + for i in range(max_retries): + try: + import httpx + + response = httpx.get(f"{self.base_url}/health", timeout=1.0) + if response.status_code == 200: + print(f"API server is ready at {self.base_url}") + return self + except Exception: + pass + + assert self.process is not None + if self.process.poll() is not None: + # Process has terminated + raise RuntimeError( + "Server process terminated unexpectedly. " + "Check the server logs above for details." + ) + + time.sleep(1) + + raise RuntimeError(f"Server failed to start after {max_retries} seconds") + + def __exit__(self, exc_type, exc_val, exc_tb): + """Stop the API server subprocess.""" + if self.process: + print("Stopping API server...") + self.process.terminate() + try: + self.process.wait(timeout=5) + except subprocess.TimeoutExpired: + print("Force killing API server...") + self.process.kill() + self.process.wait() + + # Wait for streaming threads to finish (they're daemon threads, + # so they'll stop automatically) + # But give them a moment to flush any remaining output + time.sleep(0.5) + print("API server stopped.") + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) +title_gen_llm = LLM( + usage_id="title-gen-llm", + model=os.getenv("LLM_MODEL", "openhands/gpt-5-mini-2025-08-07"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + +# Use managed API server +with ManagedAPIServer(port=8001) as server: + # Create agent + agent = get_default_agent( + llm=llm, + cli_mode=True, # Disable browser tools for simplicity + ) + + # Define callbacks to test the WebSocket functionality + received_events = [] + event_tracker = {"last_event_time": time.time()} + + def event_callback(event): + """Callback to capture events for testing.""" + event_type = type(event).__name__ + logger.info(f"🔔 Callback received event: {event_type}\n{event}") + received_events.append(event) + event_tracker["last_event_time"] = time.time() + + # Create RemoteConversation with callbacks + # NOTE: Workspace is required for RemoteConversation + workspace = Workspace(host=server.base_url) + result = workspace.execute_command("pwd") + logger.info( + f"Command '{result.command}' completed with exit code {result.exit_code}" + ) + logger.info(f"Output: {result.stdout}") + + conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[event_callback], + ) + assert isinstance(conversation, RemoteConversation) + + try: + logger.info(f"\n📋 Conversation ID: {conversation.state.id}") + + # Send first message and run + logger.info("📝 Sending first message...") + conversation.send_message( + "Read the current repo and write 3 facts about the project into FACTS.txt." + ) + + # Generate title using a specific LLM + title = conversation.generate_title(max_length=60, llm=title_gen_llm) + logger.info(f"Generated conversation title: {title}") + + logger.info("🚀 Running conversation...") + conversation.run() + + logger.info("✅ First task completed!") + logger.info(f"Agent status: {conversation.state.execution_status}") + + # Wait for events to stop coming (no events for 2 seconds) + logger.info("⏳ Waiting for events to stop...") + while time.time() - event_tracker["last_event_time"] < 2.0: + time.sleep(0.1) + logger.info("✅ Events have stopped") + + logger.info("🚀 Running conversation again...") + conversation.send_message("Great! Now delete that file.") + conversation.run() + logger.info("✅ Second task completed!") + + # Demonstrate state.events functionality + logger.info("\n" + "=" * 50) + logger.info("📊 Demonstrating State Events API") + logger.info("=" * 50) + + # Count total events using state.events + total_events = len(conversation.state.events) + logger.info(f"📈 Total events in conversation: {total_events}") + + # Get recent events (last 5) using state.events + logger.info("\n🔍 Getting last 5 events using state.events...") + all_events = conversation.state.events + recent_events = all_events[-5:] if len(all_events) >= 5 else all_events + + for i, event in enumerate(recent_events, 1): + event_type = type(event).__name__ + timestamp = getattr(event, "timestamp", "Unknown") + logger.info(f" {i}. {event_type} at {timestamp}") + + # Let's see what the actual event types are + logger.info("\n🔍 Event types found:") + event_types = set() + for event in recent_events: + event_type = type(event).__name__ + event_types.add(event_type) + for event_type in sorted(event_types): + logger.info(f" - {event_type}") + + # Print all ConversationStateUpdateEvent + logger.info("\n🗂️ ConversationStateUpdateEvent events:") + for event in conversation.state.events: + if isinstance(event, ConversationStateUpdateEvent): + logger.info(f" - {event}") + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + + finally: + # Clean up + print("\n🧹 Cleaning up conversation...") + conversation.close() +``` + + + +## Next Steps + +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run server in Docker for isolation +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted API service +- **[Agent Server Overview](/sdk/guides/agent-server/overview)** - Architecture and implementation details +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture + +### Overview +Source: https://docs.openhands.dev/sdk/guides/agent-server/overview.md + +Remote Agent Servers package the Software Agent SDK into containers you can deploy anywhere (Kubernetes, VMs, on‑prem, any cloud) with strong isolation. The remote path uses the exact same SDK API as local—switching is just changing the workspace argument; your Conversation code stays the same. + + +For example, switching from a local workspace to a Docker‑based remote agent server: + +```python icon="python" lines +# Local → Docker +conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] +from openhands.workspace import DockerWorkspace # [!code ++] +with DockerWorkspace( # [!code ++] + server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] +) as workspace: # [!code ++] + conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] +``` + +Use `DockerWorkspace` with the pre-built agent server image for the fastest startup. When you need to build from a custom base image, switch to [`DockerDevWorkspace`](/sdk/guides/agent-server/docker-sandbox). + +Or switching to an API‑based remote workspace (via [OpenHands Runtime API](https://runtime.all-hands.dev/)): + +```python icon="python" lines +# Local → Remote API +conversation = Conversation(agent=agent, workspace=os.getcwd()) # [!code --] +from openhands.workspace import APIRemoteWorkspace # [!code ++] +with APIRemoteWorkspace( # [!code ++] + runtime_api_url="https://runtime.eval.all-hands.dev", # [!code ++] + runtime_api_key="YOUR_API_KEY", # [!code ++] + server_image="ghcr.io/openhands/agent-server:latest-python", # [!code ++] +) as workspace: # [!code ++] + conversation = Conversation(agent=agent, workspace=workspace) # [!code ++] +``` + + +## What is a Remote Agent Server? + +A Remote Agent Server is an HTTP/WebSocket server that: +- **Package the Software Agent SDK into containers** and deploy on your own infrastructure (Kubernetes, VMs, on-prem, or cloud) +- **Runs agents** on dedicated infrastructure +- **Manages workspaces** (Docker containers or remote sandboxes) +- **Streams events** to clients via WebSocket +- **Handles command and file operations** (execute command, upload, download), check [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py) for more details +- **Provides isolation** between different agent executions + +Think of it as the "backend" for your agent, while your Python code acts as the "frontend" client. + +{/* +Same interfaces as local: +[BaseConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), +[ConversationStateProtocol](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/base.py), +[EventsListBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/events_list_base.py). Server-backed impl: +[RemoteConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py). + */} + + +## Architecture Overview + +Remote Agent Servers follow a simple three-part architecture: + +```mermaid +graph TD + Client[Client Code] -->|HTTP / WebSocket| Server[Agent Server] + Server --> Workspace[Workspace] + + subgraph Workspace Types + Workspace --> Local[Local Folder] + Workspace --> Docker[Docker Container] + Workspace --> API[Remote Sandbox via API] + end + + Local --> Files[File System] + Docker --> Container[Isolated Runtime] + API --> Cloud[Cloud Infrastructure] + + style Client fill:#e1f5fe + style Server fill:#fff3e0 + style Workspace fill:#e8f5e8 +``` + +1. **Client (Python SDK)** — Your application creates and controls conversations using the SDK. +2. **Agent Server** — A lightweight HTTP/WebSocket service that runs the agent and manages workspace execution. +3. **Workspace** — An isolated environment (local, Docker, or remote VM) where the agent code runs. + +The same SDK API works across all three workspace types—you just switch which workspace the conversation connects to. + +## How Remote Conversations Work + +Each step in the diagram maps directly to how the SDK and server interact: + +### 1. Workspace Connection → *(Client → Server)* + +When you create a conversation with a remote workspace (e.g., `DockerWorkspace` or `APIRemoteWorkspace`), the SDK automatically starts or connects to an agent server inside that workspace: + +```python icon="python" +with DockerWorkspace( + server_image="ghcr.io/openhands/agent-server:latest" +) as workspace: + conversation = Conversation(agent=agent, workspace=workspace) +``` + +This turns the local `Conversation` into a **[RemoteConversation](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py)** that speaks to the agent server over HTTP/WebSocket. + + +### 2. Server Initialization → *(Server → Workspace)* + +Once the workspace starts: +- It launches the agent server process. +- Waits for it to be ready. +- Shares the server URL with the SDK client. + +You don’t need to manage this manually—the workspace context handles startup and teardown automatically. + +### 3. Event Streaming → *(Bidirectional WebSocket)* + +The client and agent server maintain a live WebSocket connection for streaming events: + +```python icon="python" +def on_event(event): + print(f"Received: {type(event).__name__}") + +conversation = Conversation( + agent=agent, + workspace=workspace, + callbacks=[on_event], +) +``` + +This allows you to see real-time updates from the running agent as it executes tasks inside the workspace. + +### 4. Workspace Supports File and Command Operations → *(Server ↔ Workspace)* + +Workspace supports file and command operations via the agent server API ([base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/workspace/base.py)), ensuring isolation and consistent behavior: + +```python icon="python" +workspace.file_upload(local_path, remote_path) +workspace.file_download(remote_path, local_path) +result = workspace.execute_command("ls -la") +print(result.stdout) +``` + +These commands are proxied through the agent server, whether it’s a Docker container or a remote VM, keeping your client code environment-agnostic. + +### Summary + +The architecture makes remote execution seamless: +- Your **client code** stays the same. +- The **agent server** manages execution and streaming. +- The **workspace** provides secure, isolated runtime environments. + +Switching from local to remote is just a matter of swapping the workspace class—no code rewrites needed. + +## Next Steps + +Explore different deployment options: + +- **[Local Agent Server](/sdk/guides/agent-server/local-server)** - Run agent server in the same process +- **[Docker Sandboxed Server](/sdk/guides/agent-server/docker-sandbox)** - Run agent server in isolated Docker containers +- **[API Sandboxed Server](/sdk/guides/agent-server/api-sandbox)** - Connect to hosted agent server via API + +For architectural details: +- **[Agent Server Package Architecture](/sdk/arch/agent-server)** - Remote execution architecture and deployment + +### Stuck Detector +Source: https://docs.openhands.dev/sdk/guides/agent-stuck-detector.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The Stuck Detector automatically identifies when an agent enters unproductive patterns such as repeating the same actions, encountering repeated errors, or engaging in monologues. By analyzing the conversation history after the last user message, it detects five types of stuck patterns: + +1. **Repeating Action-Observation Cycles**: The same action produces the same observation repeatedly (4+ times) +2. **Repeating Action-Error Cycles**: The same action repeatedly results in errors (3+ times) +3. **Agent Monologue**: The agent sends multiple consecutive messages without user input or meaningful progress (3+ messages) +4. **Alternating Patterns**: Two different action-observation pairs alternate in a ping-pong pattern (6+ cycles) +5. **Context Window Errors**: Repeated context window errors that indicate memory management issues + +When enabled (which is the default), the stuck detector monitors the conversation in real-time and can automatically halt execution when stuck patterns are detected, preventing infinite loops and wasted resources. + + + For more information about the detection algorithms and how pattern matching works, refer to the [StuckDetector source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/stuck_detector.py). + + + +## How It Works + +In the [ready-to-run example](#ready-to-run-example), the agent is deliberately given a task designed to trigger stuck detection - executing the same `ls` +command 5 times in a row. The stuck detector analyzes the event history and identifies the repetitive pattern: + +1. The conversation proceeds normally until the agent starts repeating actions +2. After detecting the pattern (4 identical action-observation pairs), the stuck detector flags the conversation as stuck +3. The conversation can then handle this gracefully, either by stopping execution or taking corrective action + +The example demonstrates that stuck detection is enabled by default (`stuck_detection=True`), and you can check the +stuck status at any point using `conversation.stuck_detector.is_stuck()`. + +## Pattern Detection + +The stuck detector compares events based on their semantic content rather than object identity. For example: +- **Actions** are compared by their tool name, action content, and thought (ignoring IDs and metrics) +- **Observations** are compared by their observation content and tool name +- **Errors** are compared by their error messages +- **Messages** are compared by their content and source + +This allows the detector to identify truly repetitive behavior while ignoring superficial differences like timestamps or event IDs. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/20_stuck_detector.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/20_stuck_detector.py) + + + +```python icon="python" expandable examples/01_standalone_sdk/20_stuck_detector.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +agent = get_default_agent(llm=llm) + +llm_messages = [] + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Create conversation with built-in stuck detection +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), + # This is by default True, shown here for clarity of the example + stuck_detection=True, +) + +# Send a task that will be caught by stuck detection +conversation.send_message( + "Please execute 'ls' command 5 times, each in its own " + "action without any thought and then exit at the 6th step." +) + +# Run the conversation - stuck detection happens automatically +conversation.run() + +assert conversation.stuck_detector is not None +final_stuck_check = conversation.stuck_detector.is_stuck() +print(f"Final stuck status: {final_stuck_check}") + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Next Steps + +- **[Conversation Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Manual execution control +- **[Hello World](/sdk/guides/hello-world)** - Learn the basics of the SDK + +### Theory of Mind (TOM) Agent +Source: https://docs.openhands.dev/sdk/guides/agent-tom-agent.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +## Overview + +Tom (Theory of Mind) Agent provides advanced user understanding capabilities that help your agent interpret vague instructions and adapt to user preferences over time. Built on research in user mental modeling, Tom agents can: + +- Understand unclear or ambiguous user requests +- Provide personalized guidance based on user modeling +- Build long-term user preference profiles +- Adapt responses based on conversation history + +This is particularly useful when: +- User instructions are vague or incomplete +- You need to infer user intent from minimal context +- Building personalized experiences across multiple conversations +- Understanding user preferences and working patterns + +## Research Foundation + +Tom agent is based on the TOM-SWE research paper on user mental modeling for software engineering agents: + +```bibtex Citation +@misc{zhou2025tomsweusermentalmodeling, + title={TOM-SWE: User Mental Modeling For Software Engineering Agents}, + author={Xuhui Zhou and Valerie Chen and Zora Zhiruo Wang and Graham Neubig and Maarten Sap and Xingyao Wang}, + year={2025}, + eprint={2510.21903}, + archivePrefix={arXiv}, + primaryClass={cs.SE}, + url={https://arxiv.org/abs/2510.21903}, +} +``` + + +Paper: [TOM-SWE on arXiv](https://arxiv.org/abs/2510.21903) + + +## Quick Start + + +This example is available on GitHub: [examples/01_standalone_sdk/30_tom_agent.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/30_tom_agent.py) + + +```python icon="python" expandable examples/01_standalone_sdk/30_tom_agent.py +"""Example demonstrating Tom agent with Theory of Mind capabilities. + +This example shows how to set up an agent with Tom tools for getting +personalized guidance based on user modeling. Tom tools include: +- TomConsultTool: Get guidance for vague or unclear tasks +- SleeptimeComputeTool: Index conversations for user modeling +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation +from openhands.sdk.tool import Tool +from openhands.tools.preset.default import get_default_tools +from openhands.tools.tom_consult import ( + SleeptimeComputeAction, + SleeptimeComputeObservation, + SleeptimeComputeTool, + TomConsultTool, +) + + +# Configure LLM +api_key: str | None = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." + +llm: LLM = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), + usage_id="agent", + drop_params=True, +) + +# Build tools list with Tom tools +# Note: Tom tools are automatically registered on import (PR #862) +tools = get_default_tools(enable_browser=False) + +# Configure Tom tools with parameters +tom_params: dict[str, bool | str] = { + "enable_rag": True, # Enable RAG in Tom agent +} + +# Add LLM configuration for Tom tools (uses same LLM as main agent) +tom_params["llm_model"] = llm.model +if llm.api_key: + if isinstance(llm.api_key, SecretStr): + tom_params["api_key"] = llm.api_key.get_secret_value() + else: + tom_params["api_key"] = llm.api_key +if llm.base_url: + tom_params["api_base"] = llm.base_url + +# Add both Tom tools to the agent +tools.append(Tool(name=TomConsultTool.name, params=tom_params)) +tools.append(Tool(name=SleeptimeComputeTool.name, params=tom_params)) + +# Create agent with Tom capabilities +# This agent can consult Tom for personalized guidance +# Note: Tom's user modeling data will be stored in ~/.openhands/ +agent: Agent = Agent(llm=llm, tools=tools) + +# Start conversation +cwd: str = os.getcwd() +PERSISTENCE_DIR = os.path.expanduser("~/.openhands") +CONVERSATIONS_DIR = os.path.join(PERSISTENCE_DIR, "conversations") +conversation = Conversation( + agent=agent, workspace=cwd, persistence_dir=CONVERSATIONS_DIR +) + +# Optionally run sleeptime compute to index existing conversations +# This builds user preferences and patterns from conversation history +# Using execute_tool allows running tools before conversation.run() +print("\nRunning sleeptime compute to index conversations...") +try: + sleeptime_result = conversation.execute_tool( + "sleeptime_compute", SleeptimeComputeAction() + ) + # Cast to the expected observation type for type-safe access + if isinstance(sleeptime_result, SleeptimeComputeObservation): + print(f"Result: {sleeptime_result.message}") + print(f"Sessions processed: {sleeptime_result.sessions_processed}") + else: + print(f"Result: {sleeptime_result.text}") +except KeyError as e: + print(f"Tool not available: {e}") + +# Send a potentially vague message where Tom consultation might help +conversation.send_message( + "I need to debug some code but I'm not sure where to start. " + + "Can you help me figure out the best approach?" +) +conversation.run() + +print("\n" + "=" * 80) +print("Tom agent consultation example completed!") +print("=" * 80) + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") + + +# Optional: Index this conversation for Tom's user modeling +# This builds user preferences and patterns from conversation history +# Uncomment the lines below to index the conversation: +# +# conversation.send_message("Please index this conversation using sleeptime_compute") +# conversation.run() +# print("\nConversation indexed for user modeling!") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Tom Tools + +### TomConsultTool + +The consultation tool provides personalized guidance when the agent encounters vague or unclear user requests: + +```python icon="python" +# The agent can automatically call this tool when needed +# Example: User says "I need to debug something" +# Tom analyzes the vague request and provides specific guidance +``` + +Key features: +- Analyzes conversation history for context +- Provides personalized suggestions based on user modeling +- Helps disambiguate vague instructions +- Adapts to user communication patterns + +### SleeptimeComputeTool + +The indexing tool processes conversation history to build user preference profiles: + +```python icon="python" +# Index conversations for future personalization +sleeptime_compute_tool = conversation.agent.tools_map.get("sleeptime_compute") +if sleeptime_compute_tool: + result = sleeptime_compute_tool.executor( + SleeptimeComputeAction(), conversation + ) +``` + +Key features: +- Processes conversation history into user models +- Stores preferences in `~/.openhands/` directory +- Builds understanding of user patterns over time +- Enables long-term personalization across sessions + +## Configuration + +### RAG Support + +Enable retrieval-augmented generation for enhanced context awareness: + +```python icon="python" +tom_params = { + "enable_rag": True, # Enable RAG for better context retrieval +} +``` + +### Custom LLM for Tom + +You can optionally use a different LLM for Tom's internal reasoning: + +```python icon="python" +# Use the same LLM as main agent +tom_params["llm_model"] = llm.model +tom_params["api_key"] = llm.api_key.get_secret_value() + +# Or configure a separate LLM for Tom +tom_llm = LLM(model="gpt-4", api_key=SecretStr("different-key")) +tom_params["llm_model"] = tom_llm.model +tom_params["api_key"] = tom_llm.api_key.get_secret_value() +``` + +## Data Storage + +Tom stores user modeling data persistently in `~/.openhands/`: + + + + + + + + + + + + + + + + + +where +- `user_models/` stores user preference profiles, with each user having their own subdirectory containing `user_model.json` (the current user model). +- `conversations/` contains indexed conversation data + +This persistent storage enables Tom to: +- Remember user preferences across sessions +- Track which conversations have been indexed +- Build long-term understanding of user patterns + +## Use Cases + +### 1. Handling Vague Requests + +When a user provides minimal information: + +```python icon="python" +conversation.send_message("Help me with that bug") +# Tom analyzes history to determine which bug and suggest approach +``` + +### 2. Personalized Recommendations + +Tom adapts suggestions based on past interactions: + +```python icon="python" +# After multiple conversations, Tom learns: +# - User prefers minimal explanations +# - User typically works with Python +# - User values efficiency over verbosity +``` + +### 3. Intent Inference + +Understanding what the user really wants: + +```python icon="python" +conversation.send_message("Make it better") +# Tom infers from context what "it" is and how to improve it +``` + +## Best Practices + +1. **Enable RAG**: For better context awareness, always enable RAG: + ```python icon="python" + tom_params = {"enable_rag": True} + ``` + +2. **Index Regularly**: Run sleeptime compute after important conversations to build better user models + +3. **Provide Context**: Even with Tom, providing more context leads to better results + +4. **Monitor Data**: Check `~/.openhands/` periodically to understand what's being learned + +5. **Privacy Considerations**: Be aware that conversation data is stored locally for user modeling + +## Next Steps + +- **[Agent Delegation](/sdk/guides/agent-delegation)** - Combine Tom with sub-agents for complex workflows +- **[Context Condenser](/sdk/guides/context-condenser)** - Manage long conversation histories effectively +- **[Custom Tools](/sdk/guides/custom-tools)** - Create tools that work with Tom's insights + +### Browser Session Recording +Source: https://docs.openhands.dev/sdk/guides/browser-session-recording.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The browser session recording feature allows you to capture your agent's browser interactions and replay them later using [rrweb](https://github.com/rrweb-io/rrweb). This is useful for debugging, auditing, and understanding how your agent interacts with web pages. + +## How It Works + +The recording feature uses rrweb to capture DOM mutations, mouse movements, scrolling, and other browser events. The recordings are saved as JSON files that can be replayed using rrweb-player or the online viewer. + +The [ready-to-run example](#ready-to-run-example) demonstrates: + +1. **Starting a recording**: Use `browser_start_recording` to begin capturing browser events +2. **Browsing and interacting**: Navigate to websites and perform actions while recording +3. **Stopping the recording**: Use `browser_stop_recording` to stop and save the recording + +The recording files are automatically saved to the persistence directory when the recording is stopped. + +## Replaying Recordings + +After recording a session, you can replay it using: + +- **rrweb-player**: A standalone player component - [GitHub](https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player) +- **Online viewer**: Upload your recording at [rrweb.io/demo](https://www.rrweb.io/) + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/38_browser_session_recording.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/38_browser_session_recording.py) + + +```python icon="python" expandable examples/01_standalone_sdk/38_browser_session_recording.py +"""Browser Session Recording Example + +This example demonstrates how to use the browser session recording feature +to capture and save a recording of the agent's browser interactions using rrweb. + +The recording can be replayed later using rrweb-player to visualize the agent's +browsing session. + +The recording will be automatically saved to the persistence directory when +browser_stop_recording is called. You can replay it with: + - rrweb-player: https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player + - Online viewer: https://www.rrweb.io/ +""" + +import json +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.browser_use import BrowserToolSet +from openhands.tools.browser_use.definition import BROWSER_RECORDING_OUTPUT_DIR + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools - including browser tools with recording capability +cwd = os.getcwd() +tools = [ + Tool(name=BrowserToolSet.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Create conversation with persistence_dir set to save browser recordings +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir="./.conversations", +) + +# The prompt instructs the agent to: +# 1. Start recording the browser session +# 2. Browse to a website and perform some actions +# 3. Stop recording (auto-saves to file) +PROMPT = """ +Please complete the following task to demonstrate browser session recording: + +1. First, use `browser_start_recording` to begin recording the browser session. + +2. Then navigate to https://docs.openhands.dev/ and: + - Get the page content + - Scroll down the page + - Get the browser state to see interactive elements + +3. Next, navigate to https://docs.openhands.dev/openhands/usage/cli/installation and: + - Get the page content + - Scroll down to see more content + +4. Finally, use `browser_stop_recording` to stop the recording. + Events are automatically saved. +""" + +print("=" * 80) +print("Browser Session Recording Example") +print("=" * 80) +print("\nTask: Record an agent's browser session and save it for replay") +print("\nStarting conversation with agent...\n") + +conversation.send_message(PROMPT) +conversation.run() + +print("\n" + "=" * 80) +print("Conversation finished!") +print("=" * 80) + +# Check if the recording files were created +# Recordings are saved in BROWSER_RECORDING_OUTPUT_DIR/recording-{timestamp}/ +if os.path.exists(BROWSER_RECORDING_OUTPUT_DIR): + # Find recording subdirectories (they start with "recording-") + recording_dirs = sorted( + [ + d + for d in os.listdir(BROWSER_RECORDING_OUTPUT_DIR) + if d.startswith("recording-") + and os.path.isdir(os.path.join(BROWSER_RECORDING_OUTPUT_DIR, d)) + ] + ) + + if recording_dirs: + # Process the most recent recording directory + latest_recording = recording_dirs[-1] + recording_path = os.path.join(BROWSER_RECORDING_OUTPUT_DIR, latest_recording) + json_files = sorted( + [f for f in os.listdir(recording_path) if f.endswith(".json")] + ) + + print(f"\n✓ Recording saved to: {recording_path}") + print(f"✓ Number of files: {len(json_files)}") + + # Count total events across all files + total_events = 0 + all_event_types: dict[int | str, int] = {} + total_size = 0 + + for json_file in json_files: + filepath = os.path.join(recording_path, json_file) + file_size = os.path.getsize(filepath) + total_size += file_size + + with open(filepath) as f: + events = json.load(f) + + # Events are stored as a list in each file + if isinstance(events, list): + total_events += len(events) + for event in events: + event_type = event.get("type", "unknown") + all_event_types[event_type] = all_event_types.get(event_type, 0) + 1 + + print(f" - {json_file}: {len(events)} events, {file_size} bytes") + + print(f"✓ Total events: {total_events}") + print(f"✓ Total size: {total_size} bytes") + if all_event_types: + print(f"✓ Event types: {all_event_types}") + + print("\nTo replay this recording, you can use:") + print( + " - rrweb-player: " + "https://github.com/rrweb-io/rrweb/tree/master/packages/rrweb-player" + ) + else: + print(f"\n✗ No recording directories found in: {BROWSER_RECORDING_OUTPUT_DIR}") + print(" The agent may not have completed the recording task.") +else: + print(f"\n✗ Observations directory not found: {BROWSER_RECORDING_OUTPUT_DIR}") + print(" The agent may not have completed the recording task.") + +print("\n" + "=" * 100) +print("Conversation finished.") +print(f"Total LLM messages: {len(llm_messages)}") +print("=" * 100) + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"Conversation ID: {conversation.id}") +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Context Condenser +Source: https://docs.openhands.dev/sdk/guides/context-condenser.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## What is a Context Condenser? + +A **context condenser** is a crucial component that addresses one of the most persistent challenges in AI agent development: managing growing conversation context efficiently. As conversations with AI agents grow longer, the cumulative history leads to: + +- **💰 Increased API Costs**: More tokens in the context means higher costs per API call +- **⏱️ Slower Response Times**: Larger contexts take longer to process +- **📉 Reduced Effectiveness**: LLMs become less effective when dealing with excessive irrelevant information + +The context condenser solves this by intelligently summarizing older parts of the conversation while preserving essential information needed for the agent to continue working effectively. + +## Default Implementation: `LLMSummarizingCondenser` + +OpenHands SDK provides `LLMSummarizingCondenser` as the default condenser implementation. This condenser uses an LLM to generate summaries of conversation history when it exceeds the configured size limit. + +### How It Works + +When conversation history exceeds a defined threshold, the LLM-based condenser: + +1. **Keeps recent messages intact** - The most recent exchanges remain unchanged for immediate context +2. **Preserves key information** - Important details like user goals, technical specifications, and critical files are retained +3. **Summarizes older content** - Earlier parts of the conversation are condensed into concise summaries using LLM-generated summaries +4. **Maintains continuity** - The agent retains awareness of past progress without processing every historical interaction + +{/* Auto-switching light/dark mode image. */} +Light mode interface +Dark mode interface + +This approach achieves remarkable efficiency gains: +- Up to **2x reduction** in per-turn API costs +- **Consistent response times** even in long sessions +- **Equivalent or better performance** on software engineering tasks + +Learn more about the implementation and benchmarks in our [blog post on context condensation](https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents). + +### Extensibility + +The `LLMSummarizingCondenser` extends the `RollingCondenser` base class, which provides a framework for condensers that work with rolling conversation history. You can create custom condensers by extending base classes ([source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/context/condenser/base.py)): + +- **`RollingCondenser`** - For condensers that apply condensation to rolling history +- **`CondenserBase`** - For more specialized condensation strategies + +This architecture allows you to implement custom condensation logic tailored to your specific needs while leveraging the SDK's conversation management infrastructure. + + +### Setting Up Condensing + +Create a `LLMSummarizingCondenser` to manage the context. +The condenser will automatically truncate conversation history when it exceeds max_size, and replaces the dropped events with an LLM-generated summary. + +This condenser triggers when there are more than `max_context_length` events in +the conversation history, and always keeps the first `keep_first` events (system prompts, +initial user messages) to preserve important context. + +```python focus={3-4} icon="python" +from openhands.sdk.context import LLMSummarizingCondenser + +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 +) + +# Agent with condenser +agent = Agent(llm=llm, tools=tools, condenser=condenser) +``` + +### Ready-to-run example + + +This example is available on GitHub: [examples/01_standalone_sdk/14_context_condenser.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/14_context_condenser.py) + + + +Automatically condense conversation history when context length exceeds limits, reducing token usage while preserving important information: + +```python icon="python" expandable examples/01_standalone_sdk/14_context_condenser.py +""" +To manage context in long-running conversations, the agent can use a context condenser +that keeps the conversation history within a specified size limit. This example +demonstrates using the `LLMSummarizingCondenser`, which automatically summarizes +older parts of the conversation when the history exceeds a defined threshold. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.context.condenser import LLMSummarizingCondenser +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] + +# Create a condenser to manage the context. The condenser will automatically truncate +# conversation history when it exceeds max_size, and replaces the dropped events with an +# LLM-generated summary. This condenser triggers when there are more than ten events in +# the conversation history, and always keeps the first two events (system prompts, +# initial user messages) to preserve important context. +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2 +) + +# Agent with condenser +agent = Agent(llm=llm, tools=tools, condenser=condenser) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + persistence_dir="./.conversations", + workspace=".", +) + +# Send multiple messages to demonstrate condensation +print("Sending multiple messages to demonstrate LLM Summarizing Condenser...") + +conversation.send_message( + "Hello! Can you create a Python file named math_utils.py with functions for " + "basic arithmetic operations (add, subtract, multiply, divide)?" +) +conversation.run() + +conversation.send_message( + "Great! Now add a function to calculate the factorial of a number." +) +conversation.run() + +conversation.send_message("Add a function to check if a number is prime.") +conversation.run() + +conversation.send_message( + "Add a function to calculate the greatest common divisor (GCD) of two numbers." +) +conversation.run() + +conversation.send_message( + "Now create a test file to verify all these functions work correctly." +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Conversation persistence +print("Serializing conversation...") + +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + persistence_dir="./.conversations", + workspace=".", +) + +print("Sending message to deserialized conversation...") +conversation.send_message("Finally, clean up by deleting both files.") +conversation.run() + +print("=" * 100) +print("Conversation finished with LLM Summarizing Condenser.") +print(f"Total LLM messages collected: {len(llm_messages)}") +print("\nThe condenser automatically summarized older conversation history") +print("when the conversation exceeded the configured max_size threshold.") +print("This helps manage context length while preserving important information.") + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage reduction and analyze cost savings + +### Ask Agent Questions +Source: https://docs.openhands.dev/sdk/guides/convo-ask-agent.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +Use `ask_agent()` to get quick responses from the agent about the current conversation state without +interrupting the main execution flow. + +## Key Features + +The `ask_agent()` method provides several important capabilities: + +#### Context-Aware Responses + +The agent has access to the full conversation history when answering questions: + +```python focus={2-3} icon="python" wrap +# Agent can reference what it has done so far +response = conversation.ask_agent( + "Summarize the activity so far in 1 sentence." +) +print(f"Response: {response}") +``` + +#### Non-Intrusive Operation + +Questions don't interrupt the main conversation flow - they're processed separately: + +```python focus={4-6} icon="python" wrap +# Start main conversation +thread = threading.Thread(target=conversation.run) +thread.start() + +# Ask questions without affecting main execution +response = conversation.ask_agent("How's the progress?") +``` + +#### Works During and After Execution + +You can ask questions while the agent is running or after it has completed: + +```python focus={3,7} icon="python" wrap +# During execution +time.sleep(2) # Let agent start working +response1 = conversation.ask_agent("Have you finished running?") + +# After completion +thread.join() +response2 = conversation.ask_agent("What did you accomplish?") +``` + +### Use Cases + +- **Progress Monitoring**: Check on long-running tasks +- **Status Updates**: Get real-time information about agent activities +- **User Interfaces**: Provide sidebar information in chat applications + +## Ready-to-run Example + + + This example is available on GitHub: + [examples/01_standalone_sdk/28_ask_agent_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/28_ask_agent_example.py) + + +Example demonstrating the ask_agent functionality for getting sidebar replies +from the agent for a running conversation. + +This example shows how to use `ask_agent()` to get quick responses from the agent +about the current conversation state without interrupting the main execution flow. + +```python icon="python" expandable examples/01_standalone_sdk/28_ask_agent_example.py +""" +Example demonstrating the ask_agent functionality for getting sidebar replies +from the agent for a running conversation. + +This example shows how to use ask_agent() to get quick responses from the agent +about the current conversation state without interrupting the main execution flow. +""" + +import os +import threading +import time +from datetime import datetime + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.conversation import ConversationVisualizerBase +from openhands.sdk.event import Event +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] + + +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that print the raw events as they occur.""" + + count = 0 + + def on_event(self, event: Event) -> None: + """Handle events for minimal progress visualization.""" + print(f"\n\n[EVENT {self.count}] {type(event).__name__}") + self.count += 1 + + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation( + agent=agent, workspace=cwd, visualizer=MinimalVisualizer, max_iteration_per_run=5 +) + + +def timestamp() -> str: + return datetime.now().strftime("%H:%M:%S") + + +print("=== Ask Agent Example ===") +print("This example demonstrates asking questions during conversation execution") + +# Step 1: Build conversation context +print(f"\n[{timestamp()}] Building conversation context...") +conversation.send_message("Explore the current directory and describe the architecture") + +# Step 2: Start conversation in background thread +print(f"[{timestamp()}] Starting conversation in background thread...") +thread = threading.Thread(target=conversation.run) +thread.start() + +# Give the agent time to start processing +time.sleep(2) + +# Step 3: Use ask_agent while conversation is running +print(f"\n[{timestamp()}] Using ask_agent while conversation is processing...") + +# Ask context-aware questions +questions_and_responses = [] + +question_1 = "Summarize the activity so far in 1 sentence." +print(f"\n[{timestamp()}] Asking: {question_1}") +response1 = conversation.ask_agent(question_1) +questions_and_responses.append((question_1, response1)) +print(f"Response: {response1}") + +time.sleep(1) + +question_2 = "How's the progress?" +print(f"\n[{timestamp()}] Asking: {question_2}") +response2 = conversation.ask_agent(question_2) +questions_and_responses.append((question_2, response2)) +print(f"Response: {response2}") + +time.sleep(1) + +question_3 = "Have you finished running?" +print(f"\n[{timestamp()}] {question_3}") +response3 = conversation.ask_agent(question_3) +questions_and_responses.append((question_3, response3)) +print(f"Response: {response3}") + +# Step 4: Wait for conversation to complete +print(f"\n[{timestamp()}] Waiting for conversation to complete...") +thread.join() + +# Step 5: Verify conversation state wasn't affected +final_event_count = len(conversation.state.events) +# Step 6: Ask a final question after conversation completion +print(f"\n[{timestamp()}] Asking final question after completion...") +final_response = conversation.ask_agent( + "Can you summarize what you accomplished in this conversation?" +) +print(f"Final response: {final_response}") + +# Step 7: Summary +print("\n" + "=" * 60) +print("SUMMARY OF ASK_AGENT DEMONSTRATION") +print("=" * 60) + +print("\nQuestions and Responses:") +for i, (question, response) in enumerate(questions_and_responses, 1): + print(f"\n{i}. Q: {question}") + print(f" A: {response[:100]}{'...' if len(response) > 100 else ''}") + +final_truncated = final_response[:100] + ("..." if len(final_response) > 100 else "") +print(f"\nFinal Question Response: {final_truncated}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost:.4f}") +``` + + + + +## Next Steps + +- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interrupt and redirect agent execution +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Custom Visualizers](/sdk/guides/convo-custom-visualizer)** - Monitor conversation progress + +### Conversation with Async +Source: https://docs.openhands.dev/sdk/guides/convo-async.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +### Concurrent Agents + +Run multiple agent tasks in parallel using `asyncio.gather()`: + +```python icon="python" wrap +async def main(): + loop = asyncio.get_running_loop() + callback = AsyncCallbackWrapper(callback_coro, loop) + + # Create multiple conversation tasks running in parallel + tasks = [ + loop.run_in_executor(None, run_conversation, callback), + loop.run_in_executor(None, run_conversation, callback), + loop.run_in_executor(None, run_conversation, callback) + ] + results = await asyncio.gather(*tasks) +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/11_async.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/11_async.py) + + +This example demonstrates usage of a Conversation in an async context +(e.g.: From a fastapi server). The conversation is run in a background +thread and a callback with results is executed in the main runloop + +```python icon="python" expandable examples/01_standalone_sdk/11_async.py +""" +This example demonstrates usage of a Conversation in an async context +(e.g.: From a fastapi server). The conversation is run in a background +thread and a callback with results is executed in the main runloop +""" + +import asyncio +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.conversation.types import ConversationCallbackType +from openhands.sdk.tool import Tool +from openhands.sdk.utils.async_utils import AsyncCallbackWrapper +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +# Callback coroutine +async def callback_coro(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Synchronous run conversation +def run_conversation(callback: ConversationCallbackType): + conversation = Conversation(agent=agent, callbacks=[callback]) + + conversation.send_message( + "Hello! Can you create a new Python file named hello.py that prints " + "'Hello, World!'? Use task tracker to plan your steps." + ) + conversation.run() + + conversation.send_message("Great! Now delete that file.") + conversation.run() + + +async def main(): + loop = asyncio.get_running_loop() + + # Create the callback + callback = AsyncCallbackWrapper(callback_coro, loop) + + # Run the conversation in a background thread and wait for it to finish... + await loop.run_in_executor(None, run_conversation, callback) + + print("=" * 100) + print("Conversation finished. Got the following LLM messages:") + for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + + # Report cost + cost = llm.metrics.accumulated_cost + print(f"EXAMPLE_COST: {cost}") + + +if __name__ == "__main__": + asyncio.run(main()) +``` + + + +## Next Steps + +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents + +### Custom Visualizer +Source: https://docs.openhands.dev/sdk/guides/convo-custom-visualizer.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The SDK provides flexible visualization options. You can use the default rich-formatted visualizer, customize it with highlighting patterns, or build completely custom visualizers by subclassing `ConversationVisualizerBase`. + +## Visualizer Configuration Options + +The `visualizer` parameter in `Conversation` controls how events are displayed: + +```python icon="python" focus={4-5, 7-8, 10-11, 13, 18, 20, 25} +from openhands.sdk import Conversation +from openhands.sdk.conversation import DefaultConversationVisualizer, ConversationVisualizerBase + +# Option 1: Use default visualizer (enabled by default) +conversation = Conversation(agent=agent, workspace=workspace) + +# Option 2: Disable visualization +conversation = Conversation(agent=agent, workspace=workspace, visualizer=None) + +# Option 3: Pass a visualizer class (will be instantiated automatically) +conversation = Conversation(agent=agent, workspace=workspace, visualizer=DefaultConversationVisualizer) + +# Option 4: Pass a configured visualizer instance +custom_viz = DefaultConversationVisualizer( + name="MyAgent", + highlight_regex={r"^Reasoning:": "bold cyan"} +) +conversation = Conversation(agent=agent, workspace=workspace, visualizer=custom_viz) + +# Option 5: Use custom visualizer class +class MyVisualizer(ConversationVisualizerBase): + def on_event(self, event): + print(f"Event: {event}") + +conversation = Conversation(agent=agent, workspace=workspace, visualizer=MyVisualizer()) +``` + +## Customizing the Default Visualizer + +`DefaultConversationVisualizer` uses Rich panels and supports customization through configuration: + +```python icon="python" focus={3-14, 19} +from openhands.sdk.conversation import DefaultConversationVisualizer + +# Configure highlighting patterns using regex +custom_visualizer = DefaultConversationVisualizer( + name="MyAgent", # Prefix panel titles with agent name + highlight_regex={ + r"^Reasoning:": "bold cyan", # Lines starting with "Reasoning:" + r"^Thought:": "bold green", # Lines starting with "Thought:" + r"^Action:": "bold yellow", # Lines starting with "Action:" + r"\[ERROR\]": "bold red", # Error markers anywhere + r"\*\*(.*?)\*\*": "bold", # Markdown bold **text** + }, + skip_user_messages=False, # Show user messages +) + +conversation = Conversation( + agent=agent, + workspace=workspace, + visualizer=custom_visualizer +) +``` + +**When to use**: Perfect for customizing colors and highlighting without changing the panel-based layout. + +## Creating Custom Visualizers + +For complete control over visualization, subclass `ConversationVisualizerBase`: + +```python icon="python" focus={4, 11, 28} +from openhands.sdk.conversation import ConversationVisualizerBase +from openhands.sdk.event import ActionEvent, ObservationEvent, AgentErrorEvent, Event + +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that prints raw event information.""" + + def __init__(self, name: str | None = None): + super().__init__(name=name) + self.step_count = 0 + + def on_event(self, event: Event) -> None: + """Handle each event.""" + if isinstance(event, ActionEvent): + self.step_count += 1 + tool_name = event.tool_name or "unknown" + print(f"Step {self.step_count}: {tool_name}") + + elif isinstance(event, ObservationEvent): + print(f" → Result received") + + elif isinstance(event, AgentErrorEvent): + print(f"❌ Error: {event.error}") + +# Use your custom visualizer +conversation = Conversation( + agent=agent, + workspace=workspace, + visualizer=MinimalVisualizer(name="Agent") +) +``` + +### Key Methods + +**`__init__(self, name: str | None = None)`** +- Initialize your visualizer with optional configuration +- `name` parameter is available from the base class for agent identification +- Call `super().__init__(name=name)` to initialize the base class + +**`initialize(self, state: ConversationStateProtocol)`** +- Called automatically by `Conversation` after state is created +- Provides access to conversation state and statistics via `self._state` +- Override if you need custom initialization, but call `super().initialize(state)` + +**`on_event(self, event: Event)`** *(required)* +- Called for each conversation event +- Implement your visualization logic here +- Access conversation stats via `self.conversation_stats` property + +**When to use**: When you need a completely different output format, custom state tracking, or integration with external systems. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/26_custom_visualizer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/26_custom_visualizer.py) + + +```python icon="python" expandable examples/01_standalone_sdk/26_custom_visualizer.py +"""Custom Visualizer Example + +This example demonstrates how to create and use a custom visualizer by subclassing +ConversationVisualizer. This approach provides: +- Clean, testable code with class-based state management +- Direct configuration (just pass the visualizer instance to visualizer parameter) +- Reusable visualizer that can be shared across conversations + +This demonstrates how you can pass a ConversationVisualizer instance directly +to the visualizer parameter for clean, reusable visualization logic. +""" + +import logging +import os + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.sdk.conversation.visualizer import ConversationVisualizerBase +from openhands.sdk.event import ( + Event, +) +from openhands.tools.preset.default import get_default_agent + + +class MinimalVisualizer(ConversationVisualizerBase): + """A minimal visualizer that print the raw events as they occur.""" + + def on_event(self, event: Event) -> None: + """Handle events for minimal progress visualization.""" + print(f"\n\n[EVENT] {type(event).__name__}: {event.model_dump_json()[:200]}...") + + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="agent", +) +agent = get_default_agent(llm=llm, cli_mode=True) + +# ============================================================================ +# Configure Visualization +# ============================================================================ +# Set logging level to reduce verbosity +logging.getLogger().setLevel(logging.WARNING) + +# Start a conversation with custom visualizer +cwd = os.getcwd() +conversation = Conversation( + agent=agent, + workspace=cwd, + visualizer=MinimalVisualizer(), +) + +# Send a message and let the agent run +print("Sending task to agent...") +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("Task completed!") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost:.4f}") +``` + + + +## Next Steps + +Now that you understand custom visualizers, explore these related topics: + +- **[Events](/sdk/arch/events)** - Learn more about different event types +- **[Conversation Metrics](/sdk/guides/metrics)** - Track LLM usage, costs, and performance data +- **[Send Messages While Running](/sdk/guides/convo-send-message-while-running)** - Interactive conversations with real-time updates +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control agent execution flow with custom logic + +### Pause and Resume +Source: https://docs.openhands.dev/sdk/guides/convo-pause-and-resume.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +### Pausing Execution + +Pause the agent from another thread or after a delay using `conversation.pause()`, and +Resume the paused conversation after performing operations by calling `conversation.run()` again. + +```python icon="python" focus={9, 15} wrap +import time +thread = threading.Thread(target=conversation.run) +thread.start() + +print("Letting agent work for 5 seconds...") +time.sleep(5) + +print("Pausing the agent...") +conversation.pause() + +print("Waiting for 5 seconds...") +time.sleep(5) + +print("Resuming the execution...") +conversation.run() +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/09_pause_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/09_pause_example.py) + + +Pause agent execution mid-task by calling `conversation.pause()`: + +```python icon="python" expandable examples/01_standalone_sdk/09_pause_example.py +import os +import threading +import time + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent, workspace=os.getcwd()) + +print("=" * 60) +print("Pause and Continue Example") +print("=" * 60) +print() + +# Phase 1: Start a long-running task +print("Phase 1: Starting agent with a task...") +conversation.send_message( + "Create a file called countdown.txt and write numbers from 100 down to 1, " + "one number per line. After you finish, summarize what you did." +) + +print(f"Initial status: {conversation.state.execution_status}") +print() + +# Start the agent in a background thread +thread = threading.Thread(target=conversation.run) +thread.start() + +# Let the agent work for a few seconds +print("Letting agent work for 2 seconds...") +time.sleep(2) + +# Phase 2: Pause the agent +print() +print("Phase 2: Pausing the agent...") +conversation.pause() + +# Wait for the thread to finish (it will stop when paused) +thread.join() + +print(f"Agent status after pause: {conversation.state.execution_status}") +print() + +# Phase 3: Send a new message while paused +print("Phase 3: Sending a new message while agent is paused...") +conversation.send_message( + "Actually, stop working on countdown.txt. Instead, create a file called " + "hello.txt with just the text 'Hello, World!' in it." +) +print() + +# Phase 4: Resume the agent with .run() +print("Phase 4: Resuming agent with .run()...") +print(f"Status before resume: {conversation.state.execution_status}") + +# Resume execution +conversation.run() + +print(f"Final status: {conversation.state.execution_status}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + + +## Next Steps + +- **[Persistence](/sdk/guides/convo-persistence)** - Save and restore conversation state +- **[Send Message While Processing](/sdk/guides/convo-send-message-while-running)** - Interrupt running agents + +### Persistence +Source: https://docs.openhands.dev/sdk/guides/convo-persistence.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## How to use Persistence + +Save conversation state to disk and restore it later for long-running or multi-session workflows. + +### Saving State + +Create a conversation with a unique ID to enable persistence: + +```python focus={3-4,10-11} icon="python" wrap +import uuid + +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message("Start long task") +conversation.run() # State automatically saved +``` + +### Restoring State + +Restore a conversation using the same ID and persistence directory: + +```python focus={9-10} icon="python" +# Later, in a different session +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) + +conversation.send_message("Continue task") +conversation.run() # Continues from saved state +``` + +## What Gets Persisted + +The conversation state includes information that allows seamless restoration: + +- **Message History**: Complete event log including user messages, agent responses, and system events +- **Agent Configuration**: LLM settings, tools, MCP servers, and agent parameters +- **Execution State**: Current agent status (idle, running, paused, etc.), iteration count, and stuck detection settings +- **Tool Outputs**: Results from bash commands, file operations, and other tool executions +- **Statistics**: LLM usage metrics like token counts and API calls +- **Workspace Context**: Working directory and file system state +- **Activated Skills**: [Skills](/sdk/guides/skill) that have been enabled during the conversation +- **Secrets**: Managed credentials and API keys +- **Agent State**: Custom runtime state stored by agents (see [Agent State](#agent-state) below) + + + For the complete implementation details, see the [ConversationState class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/conversation/state.py) in the source code. + + +## Persistence Directory Structure + +When you set a `persistence_dir`, your conversation will be persisted to a directory structure where each +conversation has its own subdirectory. By default, the persistence directory is `workspace/conversations/` +(unless you specify a custom path). + +**Directory structure:** + + + + + + + + + + + + + + + + + + + + + +Each conversation directory contains: +- **`base_state.json`**: The core conversation state including agent configuration, execution status, statistics, and metadata +- **`events/`**: A subdirectory containing individual event files, each named with a sequential index and event ID (e.g., `event-00000-abc123.json`) + +The collection of event files in the `events/` directory represents the same trajectory data you would find in the `trajectory.json` file from OpenHands V0, but split into individual files for better performance and granular access. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/10_persistence.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/10_persistence.py) + + +```python icon="python" expandable examples/01_standalone_sdk/10_persistence.py +import os +import uuid + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + } +} +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation_id = uuid.uuid4() +persistence_dir = "./.conversations" + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands. Then write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Conversation persistence +print("Serializing conversation...") + +del conversation + +# Deserialize the conversation +print("Deserializing conversation...") +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, + persistence_dir=persistence_dir, + conversation_id=conversation_id, +) + +print("Sending message to deserialized conversation...") +conversation.send_message("Hey what did you create? Return an agent finish action") +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Reading serialized events + +Convert persisted events into LLM-ready messages for reuse or analysis. + + +This example is available on GitHub: [examples/01_standalone_sdk/36_event_json_to_openai_messages.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/36_event_json_to_openai_messages.py) + + +```python icon="python" expandable examples/01_standalone_sdk/36_event_json_to_openai_messages.py +"""Load persisted events and convert them into LLM-ready messages.""" + +import json +import os +import uuid +from pathlib import Path + +from pydantic import SecretStr + + +conversation_id = uuid.uuid4() +persistence_root = Path(".conversations") +log_dir = ( + persistence_root / "logs" / "event-json-to-openai-messages" / conversation_id.hex +) + +os.environ.setdefault("LOG_JSON", "true") +os.environ.setdefault("LOG_TO_FILE", "true") +os.environ.setdefault("LOG_DIR", str(log_dir)) +os.environ.setdefault("LOG_LEVEL", "INFO") + +from openhands.sdk import ( # noqa: E402 + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + Tool, +) +from openhands.sdk.logger import get_logger, setup_logging # noqa: E402 +from openhands.tools.terminal import TerminalTool # noqa: E402 + + +setup_logging(log_to_file=True, log_dir=str(log_dir)) +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") +if not api_key: + raise RuntimeError("LLM_API_KEY environment variable is not set.") + +llm = LLM( + usage_id="agent", + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) + +agent = Agent( + llm=llm, + tools=[Tool(name=TerminalTool.name)], +) + +###### +# Create a conversation that persists its events +###### + +conversation = Conversation( + agent=agent, + workspace=os.getcwd(), + persistence_dir=str(persistence_root), + conversation_id=conversation_id, +) + +conversation.send_message( + "Use the terminal tool to run `pwd` and write the output to tool_output.txt. " + "Reply with a short confirmation once done." +) +conversation.run() + +conversation.send_message( + "Without using any tools, summarize in one sentence what you did." +) +conversation.run() + +assert conversation.state.persistence_dir is not None +persistence_dir = Path(conversation.state.persistence_dir) +event_dir = persistence_dir / "events" + +event_paths = sorted(event_dir.glob("event-*.json")) + +if not event_paths: + raise RuntimeError("No event files found. Was persistence enabled?") + +###### +# Read from serialized events +###### + + +events = [Event.model_validate_json(path.read_text()) for path in event_paths] + +convertible_events = [ + event for event in events if isinstance(event, LLMConvertibleEvent) +] +llm_messages = LLMConvertibleEvent.events_to_messages(convertible_events) + +if llm.uses_responses_api(): + logger.info("Formatting messages for the OpenAI Responses API.") + instructions, input_items = llm.format_messages_for_responses(llm_messages) + logger.info("Responses instructions:\n%s", instructions) + logger.info("Responses input:\n%s", json.dumps(input_items, indent=2)) +else: + logger.info("Formatting messages for the OpenAI Chat Completions API.") + chat_messages = llm.format_messages_for_llm(llm_messages) + logger.info("Chat Completions messages:\n%s", json.dumps(chat_messages, indent=2)) + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## How State Persistence Works + +The SDK uses an **automatic persistence** system that saves state changes immediately when they occur. This ensures that conversation state is always recoverable, even if the process crashes unexpectedly. + +### Auto-Save Mechanism + +When you modify any public field on `ConversationState`, the SDK automatically: + +1. Detects the field change via a custom `__setattr__` implementation +2. Serializes the entire base state to `base_state.json` +3. Triggers any registered state change callbacks + +This happens transparently—you don't need to call any save methods manually. + +```python +# These changes are automatically persisted: +conversation.state.execution_status = ConversationExecutionStatus.RUNNING +conversation.state.max_iterations = 100 +``` + +### Events vs Base State + +The persistence system separates data into two categories: + +| Category | Storage | Contents | +|----------|---------|----------| +| **Base State** | `base_state.json` | Agent configuration, execution status, statistics, secrets, agent_state | +| **Events** | `events/event-*.json` | Message history, tool calls, observations, all conversation events | + +Events are appended incrementally (one file per event), while base state is overwritten on each change. This design optimizes for: +- **Fast event appends**: No need to rewrite the entire history +- **Atomic state updates**: Base state is always consistent +- **Efficient restoration**: Events can be loaded lazily + + + +## Next Steps + +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations + +### Send Message While Running +Source: https://docs.openhands.dev/sdk/guides/convo-send-message-while-running.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + + +This example is available on GitHub: [examples/01_standalone_sdk/18_send_message_while_processing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/18_send_message_while_processing.py) + + +Send additional messages to a running agent mid-execution to provide corrections, updates, or additional context: + +```python icon="python" expandable examples/01_standalone_sdk/18_send_message_while_processing.py +""" +Example demonstrating that user messages can be sent and processed while +an agent is busy. + +This example demonstrates a key capability of the OpenHands agent system: the ability +to receive and process new user messages even while the agent is actively working on +a previous task. This is made possible by the agent's event-driven architecture. + +Demonstration Flow: +1. Send initial message asking agent to: + - Write "Message 1 sent at [time], written at [CURRENT_TIME]" + - Wait 3 seconds + - Write "Message 2 sent at [time], written at [CURRENT_TIME]" + [time] is the time the message was sent to the agent + [CURRENT_TIME] is the time the agent writes the line +2. Start agent processing in a background thread +3. While agent is busy (during the 3-second delay), send a second message asking to add: + - "Message 3 sent at [time], written at [CURRENT_TIME]" +4. Verify that all three lines are processed and included in the final document + +Expected Evidence: +The final document will contain three lines with dual timestamps: +- "Message 1 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written immediately) +- "Message 2 sent at HH:MM:SS, written at HH:MM:SS" (from initial message, written after 3-second delay) +- "Message 3 sent at HH:MM:SS, written at HH:MM:SS" (from second message sent during delay) + +The timestamps will show that Message 3 was sent while the agent was running, +but was still successfully processed and written to the document. + +This proves that: +- The second user message was sent while the agent was processing the first task +- The agent successfully received and processed the second message +- The agent's event system allows for real-time message integration during processing + +Key Components Demonstrated: +- Conversation.send_message(): Adds messages to events list immediately +- Agent.step(): Processes all events including newly added messages +- Threading: Allows message sending while agent is actively processing +""" # noqa + +import os +import threading +import time +from datetime import datetime + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent) + + +def timestamp() -> str: + return datetime.now().strftime("%H:%M:%S") + + +print("=== Send Message While Processing Example ===") + +# Step 1: Send initial message +start_time = timestamp() +conversation.send_message( + f"Create a file called document.txt and write this first sentence: " + f"'Message 1 sent at {start_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write the line. " + f"Then wait 3 seconds and write 'Message 2 sent at {start_time}, written at [CURRENT_TIME].'" # noqa +) + +# Step 2: Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() + +# Step 3: Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working + +second_time = timestamp() + +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) + +# Wait for completion +thread.join() + +# Verification +document_path = os.path.join(cwd, "document.txt") +if os.path.exists(document_path): + with open(document_path) as f: + content = f.read() + + print("\nDocument contents:") + print("─────────────────────") + print(content) + print("─────────────────────") + + # Check if both messages were processed + if "Message 1" in content and "Message 2" in content: + print("\nSUCCESS: Agent processed both messages!") + print( + "This proves the agent received the second message while processing the first task." # noqa + ) + else: + print("\nWARNING: Agent may not have processed the second message") + + # Clean up + os.remove(document_path) +else: + print("WARNING: Document.txt was not created") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Sending Messages During Execution + +As shown in the example above, use threading to send messages while the agent is running: + +```python icon="python" +# Start agent processing in background +thread = threading.Thread(target=conversation.run) +thread.start() + +# Wait then send second message while agent is processing +time.sleep(2) # Give agent time to start working + +second_time = timestamp() + +conversation.send_message( + f"Please also add this second sentence to document.txt: " + f"'Message 3 sent at {second_time}, written at [CURRENT_TIME].' " + f"Replace [CURRENT_TIME] with the actual current time when you write this line." +) + +# Wait for completion +thread.join() +``` + +The key steps are: +1. Start `conversation.run()` in a background thread +2. Send additional messages using `conversation.send_message()` while the agent is processing +3. Use `thread.join()` to wait for completion + +The agent receives and incorporates the new message mid-execution, allowing for real-time corrections and dynamic guidance. + +## Next Steps + +- **[Pause and Resume](/sdk/guides/convo-pause-and-resume)** - Control execution flow +- **[Async Operations](/sdk/guides/convo-async)** - Non-blocking operations + +### Critic (Experimental) +Source: https://docs.openhands.dev/sdk/guides/critic.md + + +**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. + + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +## What is a Critic? + +A **critic** is an evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. The critic runs alongside the agent and provides: + +- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success +- **Real-time feedback**: Scores computed during agent execution, not just at completion +- **Iterative refinement**: Automatic retry with follow-up prompts when scores are below threshold + +You can use critic scores to build automated workflows, such as triggering the agent to reflect on and fix its previous solution when the critic indicates poor task performance. + + +This critic is a more advanced extension of the approach described in our blog post [SOTA on SWE-Bench Verified with Inference-Time Scaling and Critic Model](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model). A technical report with detailed evaluation metrics is forthcoming. + + +## Quick Start + +When using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`), the critic is **automatically configured** - no additional setup required. + +## Understanding Critic Results + +Critic evaluations produce scores and feedback: + +- **`score`**: Float between 0.0 and 1.0 representing predicted success probability +- **`message`**: Optional feedback with detailed probabilities +- **`success`**: Boolean property (True if score >= 0.5) + +Results are automatically displayed in the conversation visualizer: + +![Critic results in SDK visualizer](./assets/critic-sdk-visualizer.png) + +### Accessing Results Programmatically + +```python icon="python" focus={4-7} +from openhands.sdk import Event, ActionEvent, MessageEvent + +def callback(event: Event): + if isinstance(event, (ActionEvent, MessageEvent)): + if event.critic_result is not None: + print(f"Critic score: {event.critic_result.score:.3f}") + print(f"Success: {event.critic_result.success}") + +conversation = Conversation(agent=agent, callbacks=[callback]) +``` + +## Iterative Refinement with a Critic + +The critic supports **automatic iterative refinement** - when the agent finishes a task but the critic score is below a threshold, the conversation automatically continues with a follow-up prompt asking the agent to improve its work. + +### How It Works + +1. Agent completes a task and calls `FinishAction` +2. Critic evaluates the result and produces a score +3. If score < `success_threshold`, a follow-up prompt is sent automatically +4. Agent continues working to address issues +5. Process repeats until score meets threshold or `max_iterations` is reached + +### Configuration + +Use `IterativeRefinementConfig` to enable automatic retries: + +```python icon="python" focus={1,4-7,12} +from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig + +# Configure iterative refinement +iterative_config = IterativeRefinementConfig( + success_threshold=0.7, # Retry if score < 70% + max_iterations=3, # Maximum retry attempts +) + +# Attach to critic +critic = APIBasedCritic( + server_url="https://llm-proxy.eval.all-hands.dev/vllm", + api_key=api_key, + model_name="critic", + iterative_refinement=iterative_config, +) +``` + +### Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `success_threshold` | `float` | `0.6` | Score threshold (0-1) to consider task successful | +| `max_iterations` | `int` | `3` | Maximum number of iterations before giving up | + +### Custom Follow-up Prompts + +By default, the critic generates a generic follow-up prompt. You can customize this by subclassing `CriticBase` and overriding `get_followup_prompt()`: + +```python icon="python" focus={4-12} +from openhands.sdk.critic.base import CriticBase, CriticResult + +class CustomCritic(APIBasedCritic): + def get_followup_prompt(self, critic_result: CriticResult, iteration: int) -> str: + score_percent = critic_result.score * 100 + return f""" +Your solution scored {score_percent:.1f}% (iteration {iteration}). + +Please review your work carefully: +1. Check that all requirements are met +2. Verify tests pass +3. Fix any issues and try again +""" +``` + +### Example Workflow + +Here's what happens during iterative refinement: + +``` +Iteration 1: + → Agent creates files, runs tests + → Agent calls FinishAction + → Critic evaluates: score = 0.45 (below 0.7 threshold) + → Follow-up prompt sent automatically + +Iteration 2: + → Agent reviews and fixes issues + → Agent calls FinishAction + → Critic evaluates: score = 0.72 (above threshold) + → ✅ Success! Conversation ends +``` + +## Troubleshooting + +### Critic Evaluations Not Appearing + +- Verify the critic is properly configured and passed to the Agent +- Ensure you're using the OpenHands LLM Provider (`llm-proxy.*.all-hands.dev`) + +### API Authentication Errors + +- Verify `LLM_API_KEY` is set correctly +- Check that the API key has not expired + +### Iterative Refinement Not Triggering + +- Ensure `iterative_refinement` config is attached to the critic +- Check that `success_threshold` is set appropriately (higher values trigger more retries) +- Verify the agent is using `FinishAction` to complete tasks + +## Ready-to-run Example + + +The critic model is hosted by the OpenHands LLM Provider and is currently free to use. This example is available on GitHub: [examples/01_standalone_sdk/34_critic_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/34_critic_example.py) + + +This example demonstrates iterative refinement with a moderately complex task - creating a Python word statistics tool with specific edge case requirements. The critic evaluates whether all requirements are met and triggers retries if needed. + +```python icon="python" expandable examples/01_standalone_sdk/34_critic_example.py +"""Iterative Refinement with Critic Model Example. + +This is EXPERIMENTAL. + +This example demonstrates how to use a critic model to shepherd an agent through +complex, multi-step tasks. The critic evaluates the agent's progress and provides +feedback that can trigger follow-up prompts when the agent hasn't completed the +task successfully. + +Key concepts demonstrated: +1. Setting up a critic with IterativeRefinementConfig for automatic retry +2. Conversation.run() automatically handles retries based on critic scores +3. Custom follow-up prompt generation via critic.get_followup_prompt() +4. Iterating until the task is completed successfully or max iterations reached + +For All-Hands LLM proxy (llm-proxy.*.all-hands.dev), the critic is auto-configured +using the same base_url with /vllm suffix and "critic" as the model name. +""" + +import os +import re +import tempfile +from pathlib import Path + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.sdk.critic import APIBasedCritic, IterativeRefinementConfig +from openhands.sdk.critic.base import CriticBase +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +# Configuration +# Higher threshold (70%) makes it more likely the agent needs multiple iterations, +# which better demonstrates how iterative refinement works. +# Adjust as needed to see different behaviors. +SUCCESS_THRESHOLD = float(os.getenv("CRITIC_SUCCESS_THRESHOLD", "0.7")) +MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "3")) + + +def get_required_env(name: str) -> str: + value = os.getenv(name) + if value: + return value + raise ValueError( + f"Missing required environment variable: {name}. " + f"Set {name} before running this example." + ) + + +def get_default_critic(llm: LLM) -> CriticBase | None: + """Auto-configure critic for All-Hands LLM proxy. + + When the LLM base_url matches `llm-proxy.*.all-hands.dev`, returns an + APIBasedCritic configured with: + - server_url: {base_url}/vllm + - api_key: same as LLM + - model_name: "critic" + + Args: + llm: The LLM instance to derive critic configuration from. + + Returns: + An APIBasedCritic if the LLM is configured for All-Hands proxy, + None otherwise. + + Example: + llm = LLM( + model="anthropic/claude-sonnet-4-5", + api_key=api_key, + base_url="https://llm-proxy.eval.all-hands.dev", + ) + critic = get_default_critic(llm) + if critic is None: + # Fall back to explicit configuration + critic = APIBasedCritic( + server_url="https://my-critic-server.com", + api_key="my-api-key", + model_name="my-critic-model", + ) + """ + base_url = llm.base_url + api_key = llm.api_key + if base_url is None or api_key is None: + return None + + # Match: llm-proxy.{env}.all-hands.dev (e.g., staging, prod, eval) + pattern = r"^https?://llm-proxy\.[^./]+\.all-hands\.dev" + if not re.match(pattern, base_url): + return None + + return APIBasedCritic( + server_url=f"{base_url.rstrip('/')}/vllm", + api_key=api_key, + model_name="critic", + ) + + +# Task prompt designed to be moderately complex with subtle requirements. +# The task is simple enough to complete in 1-2 iterations, but has specific +# requirements that are easy to miss - triggering critic feedback. +INITIAL_TASK_PROMPT = """\ +Create a Python word statistics tool called `wordstats` that analyzes text files. + +## Structure + +Create directory `wordstats/` with: +- `stats.py` - Main module with `analyze_file(filepath)` function +- `cli.py` - Command-line interface +- `tests/test_stats.py` - Unit tests + +## Requirements for stats.py + +The `analyze_file(filepath)` function must return a dict with these EXACT keys: +- `lines`: total line count (including empty lines) +- `words`: word count +- `chars`: character count (including whitespace) +- `unique_words`: count of unique words (case-insensitive) + +### Important edge cases (often missed!): +1. Empty files must return all zeros, not raise an exception +2. Hyphenated words count as ONE word (e.g., "well-known" = 1 word) +3. Numbers like "123" or "3.14" are NOT counted as words +4. Contractions like "don't" count as ONE word +5. File not found must raise FileNotFoundError with a clear message + +## Requirements for cli.py + +When run as `python cli.py `: +- Print each stat on its own line: "Lines: X", "Words: X", etc. +- Exit with code 1 if file not found, printing error to stderr +- Exit with code 0 on success + +## Required Tests (test_stats.py) + +Write tests that verify: +1. Basic counting on normal text +2. Empty file returns all zeros +3. Hyphenated words counted correctly +4. Numbers are excluded from word count +5. FileNotFoundError raised for missing files + +## Verification Steps + +1. Create a sample file `sample.txt` with this EXACT content (no trailing newline): +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` +Hello world! +This is a well-known test file. + +It has 5 lines, including empty ones. +Numbers like 42 and 3.14 don't count as words. +``` + +2. Run: `python wordstats/cli.py sample.txt` + Expected output: + - Lines: 5 + - Words: 21 + - Chars: 130 + - Unique words: 21 + +3. Run the tests: `python -m pytest wordstats/tests/ -v` + ALL tests must pass. + +The task is complete ONLY when: +- All files exist +- The CLI outputs the correct stats for sample.txt +- All 5+ tests pass +""" + + +llm_api_key = get_required_env("LLM_API_KEY") +llm = LLM( + # Use a weaker model to increase likelihood of needing multiple iterations + model="anthropic/claude-haiku-4-5", + api_key=llm_api_key, + top_p=0.95, + base_url=os.getenv("LLM_BASE_URL", None), +) + +# Setup critic with iterative refinement config +# The IterativeRefinementConfig tells Conversation.run() to automatically +# retry the task if the critic score is below the threshold +iterative_config = IterativeRefinementConfig( + success_threshold=SUCCESS_THRESHOLD, + max_iterations=MAX_ITERATIONS, +) + +# Auto-configure critic for All-Hands proxy or use explicit env vars +critic = get_default_critic(llm) +if critic is None: + print("⚠️ No All-Hands LLM proxy detected, trying explicit env vars...") + critic = APIBasedCritic( + server_url=get_required_env("CRITIC_SERVER_URL"), + api_key=get_required_env("CRITIC_API_KEY"), + model_name=get_required_env("CRITIC_MODEL_NAME"), + iterative_refinement=iterative_config, + ) +else: + # Add iterative refinement config to the auto-configured critic + critic = critic.model_copy(update={"iterative_refinement": iterative_config}) + +# Create agent with critic (iterative refinement is built into the critic) +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], + critic=critic, +) + +# Create workspace +workspace = Path(tempfile.mkdtemp(prefix="critic_demo_")) +print(f"📁 Created workspace: {workspace}") + +# Create conversation - iterative refinement is handled automatically +# by Conversation.run() based on the critic's config +conversation = Conversation( + agent=agent, + workspace=str(workspace), +) + +print("\n" + "=" * 70) +print("🚀 Starting Iterative Refinement with Critic Model") +print("=" * 70) +print(f"Success threshold: {SUCCESS_THRESHOLD:.0%}") +print(f"Max iterations: {MAX_ITERATIONS}") + +# Send the task and run - Conversation.run() handles retries automatically +conversation.send_message(INITIAL_TASK_PROMPT) +conversation.run() + +# Print additional info about created files +print("\nCreated files:") +for path in sorted(workspace.rglob("*")): + if path.is_file(): + relative = path.relative_to(workspace) + print(f" - {relative}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"\nEXAMPLE_COST: {cost:.4f}") +``` + +```bash Running the Example icon="terminal" +LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \ + uv run python examples/01_standalone_sdk/34_critic_example.py +``` + +### Example Output + +``` +📁 Created workspace: /tmp/critic_demo_abc123 + +====================================================================== +🚀 Starting Iterative Refinement with Critic Model +====================================================================== +Success threshold: 70% +Max iterations: 3 + +... agent works on the task ... + +✓ Critic evaluation: score=0.758, success=True + +Created files: + - sample.txt + - wordstats/cli.py + - wordstats/stats.py + - wordstats/tests/test_stats.py + +EXAMPLE_COST: 0.0234 +``` + +## Next Steps + +- **[Observability](/sdk/guides/observability)** - Monitor and log agent behavior +- **[Metrics](/sdk/guides/metrics)** - Collect performance metrics +- **[Stuck Detector](/sdk/guides/agent-stuck-detector)** - Detect unproductive agent patterns + +### Custom Tools +Source: https://docs.openhands.dev/sdk/guides/custom-tools.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> The ready-to-run example is available [here](#ready-to-run-example)! + +## Understanding the Tool System + +The SDK's tool system is built around three core components: + +1. **Action** - Defines input parameters (what the tool accepts) +2. **Observation** - Defines output data (what the tool returns) +3. **Executor** - Implements the tool's logic (what the tool does) + +These components are tied together by a **ToolDefinition** that registers the tool with the agent. + +## Built-in Tools + +The tools package ([source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)) provides a bunch of built-in tools that follow these patterns. + +```python icon="python" wrap +from openhands.tools import BashTool, FileEditorTool +from openhands.tools.preset import get_default_tools + +# Use specific tools +agent = Agent(llm=llm, tools=[BashTool.create(), FileEditorTool.create()]) + +# Or use preset +tools = get_default_tools() +agent = Agent(llm=llm, tools=tools) +``` + + +See [source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools) for the complete list of available tools and design philosophy. + + +## Creating a Custom Tool + +Here's a minimal example of creating a custom grep tool: + + + + ### Define the Action + Defines input parameters (what the tool accepts) + + ```python icon="python" wrap + class GrepAction(Action): + pattern: str = Field(description="Regex to search for") + path: str = Field( + default=".", + description="Directory to search (absolute or relative)" + ) + include: str | None = Field( + default=None, + description="Optional glob to filter files (e.g. '*.py')" + ) + ``` + + + ### Define the Observation + Defines output data (what the tool returns) + + ```python icon="python" wrap + class GrepObservation(Observation): + matches: list[str] = Field(default_factory=list) + files: list[str] = Field(default_factory=list) + count: int = 0 + + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + if not self.count: + return [TextContent(text="No matches found.")] + files_list = "\n".join(f"- {f}" for f in self.files[:20]) + sample = "\n".join(self.matches[:10]) + more = "\n..." if self.count > 10 else "" + ret = ( + f"Found {self.count} matching lines.\n" + f"Files:\n{files_list}\n" + f"Sample:\n{sample}{more}" + ) + return [TextContent(text=ret)] + ``` + + The to_llm_content() property formats observations for the LLM. + + + + ### Define the Executor + Implements the tool’s logic (what the tool does) + + ```python icon="python" wrap + class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): + def __init__(self, terminal: TerminalExecutor): + self.terminal: TerminalExecutor = terminal + + def __call__( + self, + action: GrepAction, + conversation=None, + ) -> GrepObservation: + root = os.path.abspath(action.path) + pat = shlex.quote(action.pattern) + root_q = shlex.quote(root) + + # Use grep -r; add --include when provided + if action.include: + inc = shlex.quote(action.include) + cmd = f"grep -rHnE --include {inc} {pat} {root_q}" + else: + cmd = f"grep -rHnE {pat} {root_q}" + cmd += " 2>/dev/null | head -100" + result = self.terminal(TerminalAction(command=cmd)) + + matches: list[str] = [] + files: set[str] = set() + + # grep returns exit code 1 when no matches; treat as empty + output_text = result.text + + if output_text.strip(): + for line in output_text.strip().splitlines(): + matches.append(line) + # Expect "path:line:content" + # take the file part before first ":" + file_path = line.split(":", 1)[0] + if file_path: + files.add(os.path.abspath(file_path)) + + return GrepObservation( + matches=matches, + files=sorted(files), + count=len(matches), + ) + ``` + + + ### Finally, define the tool + ```python icon="python" wrap + class GrepTool(ToolDefinition[GrepAction, GrepObservation]): + """Custom grep tool that searches file contents using regular expressions.""" + + @classmethod + def create( + cls, + conv_state, + terminal_executor: TerminalExecutor | None = None + ) -> Sequence[ToolDefinition]: + """Create GrepTool instance with a GrepExecutor. + + Args: + conv_state: Conversation state to get + working directory from. + terminal_executor: Optional terminal executor to reuse. + If not provided, a new one will be created. + + Returns: + A sequence containing a single GrepTool instance. + """ + if terminal_executor is None: + terminal_executor = TerminalExecutor( + working_dir=conv_state.workspace.working_dir + ) + grep_executor = GrepExecutor(terminal_executor) + + return [ + cls( + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, + ) + ] + ``` + + + +## Good to know +### Tool Registration +Tools are registered using `register_tool()` and referenced by name: + +```python icon="python" wrap +# Register a simple tool class +register_tool("FileEditorTool", FileEditorTool) + +# Register a factory function that creates multiple tools +register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) + +# Use registered tools by name +tools = [ + Tool(name="FileEditorTool"), + Tool(name="BashAndGrepToolSet"), +] +``` + +### Factory Functions +Tool factory functions receive `conv_state` as a parameter, allowing access to workspace information: + +```python icon="python" wrap +def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: + """Create execute_bash and custom grep tools sharing one executor.""" + bash_executor = BashExecutor( + working_dir=conv_state.workspace.working_dir + ) + # Create and configure tools... + return [bash_tool, grep_tool] +``` + +### Shared Executors +Multiple tools can share executors for efficiency and state consistency: + +```python icon="python" wrap +bash_executor = BashExecutor(working_dir=conv_state.workspace.working_dir) +bash_tool = execute_bash_tool.set_executor(executor=bash_executor) + +grep_executor = GrepExecutor(bash_executor) +grep_tool = ToolDefinition( + name="grep", + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, +) +``` + +## When to Create Custom Tools + +Create custom tools when you need to: +- Combine multiple operations into a single, structured interface +- Add typed parameters with validation +- Format complex outputs for LLM consumption +- Integrate with external APIs or services + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/02_custom_tools.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/02_custom_tools.py) + + +```python icon="python" expandable examples/01_standalone_sdk/02_custom_tools.py +"""Advanced example showing explicit executor usage and custom grep tool.""" + +import os +import shlex +from collections.abc import Sequence + +from pydantic import Field, SecretStr + +from openhands.sdk import ( + LLM, + Action, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Observation, + TextContent, + ToolDefinition, + get_logger, +) +from openhands.sdk.tool import ( + Tool, + ToolExecutor, + register_tool, +) +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import ( + TerminalAction, + TerminalExecutor, + TerminalTool, +) + + +logger = get_logger(__name__) + +# --- Action / Observation --- + + +class GrepAction(Action): + pattern: str = Field(description="Regex to search for") + path: str = Field( + default=".", description="Directory to search (absolute or relative)" + ) + include: str | None = Field( + default=None, description="Optional glob to filter files (e.g. '*.py')" + ) + + +class GrepObservation(Observation): + matches: list[str] = Field(default_factory=list) + files: list[str] = Field(default_factory=list) + count: int = 0 + + @property + def to_llm_content(self) -> Sequence[TextContent | ImageContent]: + if not self.count: + return [TextContent(text="No matches found.")] + files_list = "\n".join(f"- {f}" for f in self.files[:20]) + sample = "\n".join(self.matches[:10]) + more = "\n..." if self.count > 10 else "" + ret = ( + f"Found {self.count} matching lines.\n" + f"Files:\n{files_list}\n" + f"Sample:\n{sample}{more}" + ) + return [TextContent(text=ret)] + + +# --- Executor --- + + +class GrepExecutor(ToolExecutor[GrepAction, GrepObservation]): + def __init__(self, terminal: TerminalExecutor): + self.terminal: TerminalExecutor = terminal + + def __call__(self, action: GrepAction, conversation=None) -> GrepObservation: # noqa: ARG002 + root = os.path.abspath(action.path) + pat = shlex.quote(action.pattern) + root_q = shlex.quote(root) + + # Use grep -r; add --include when provided + if action.include: + inc = shlex.quote(action.include) + cmd = f"grep -rHnE --include {inc} {pat} {root_q} 2>/dev/null | head -100" + else: + cmd = f"grep -rHnE {pat} {root_q} 2>/dev/null | head -100" + + result = self.terminal(TerminalAction(command=cmd)) + + matches: list[str] = [] + files: set[str] = set() + + # grep returns exit code 1 when no matches; treat as empty + output_text = result.text + + if output_text.strip(): + for line in output_text.strip().splitlines(): + matches.append(line) + # Expect "path:line:content" — take the file part before first ":" + file_path = line.split(":", 1)[0] + if file_path: + files.add(os.path.abspath(file_path)) + + return GrepObservation(matches=matches, files=sorted(files), count=len(matches)) + + +# Tool description +_GREP_DESCRIPTION = """Fast content search tool. +* Searches file contents using regular expressions +* Supports full regex syntax (eg. "log.*Error", "function\\s+\\w+", etc.) +* Filter files by pattern with the include parameter (eg. "*.js", "*.{ts,tsx}") +* Returns matching file paths sorted by modification time. +* Only the first 100 results are returned. Consider narrowing your search with stricter regex patterns or provide path parameter if you need more results. +* Use this tool when you need to find files containing specific patterns +* When you are doing an open ended search that may require multiple rounds of globbing and grepping, use the Agent tool instead +""" # noqa: E501 + + +# --- Tool Definition --- + + +class GrepTool(ToolDefinition[GrepAction, GrepObservation]): + """A custom grep tool that searches file contents using regular expressions.""" + + @classmethod + def create( + cls, conv_state, terminal_executor: TerminalExecutor | None = None + ) -> Sequence[ToolDefinition]: + """Create GrepTool instance with a GrepExecutor. + + Args: + conv_state: Conversation state to get working directory from. + terminal_executor: Optional terminal executor to reuse. If not provided, + a new one will be created. + + Returns: + A sequence containing a single GrepTool instance. + """ + if terminal_executor is None: + terminal_executor = TerminalExecutor( + working_dir=conv_state.workspace.working_dir + ) + grep_executor = GrepExecutor(terminal_executor) + + return [ + cls( + description=_GREP_DESCRIPTION, + action_type=GrepAction, + observation_type=GrepObservation, + executor=grep_executor, + ) + ] + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools - demonstrating both simplified and advanced patterns +cwd = os.getcwd() + + +def _make_bash_and_grep_tools(conv_state) -> list[ToolDefinition]: + """Create terminal and custom grep tools sharing one executor.""" + + terminal_executor = TerminalExecutor(working_dir=conv_state.workspace.working_dir) + # terminal_tool = terminal_tool.set_executor(executor=terminal_executor) + terminal_tool = TerminalTool.create(conv_state, executor=terminal_executor)[0] + + # Use the GrepTool.create() method with shared terminal_executor + grep_tool = GrepTool.create(conv_state, terminal_executor=terminal_executor)[0] + + return [terminal_tool, grep_tool] + + +register_tool("BashAndGrepToolSet", _make_bash_and_grep_tools) + +tools = [ + Tool(name=FileEditorTool.name), + Tool(name="BashAndGrepToolSet"), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message( + "Hello! Can you use the grep tool to find all files " + "containing the word 'class' in this project, then create a summary file listing them? " # noqa: E501 + "Use the pattern 'class' to search and include only Python files with '*.py'." # noqa: E501 +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[Model Context Protocol (MCP) Integration](/sdk/guides/mcp)** - Use Model Context Protocol servers +- **[Tools Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools)** - Built-in tools implementation + +### Assign Reviews +Source: https://docs.openhands.dev/sdk/guides/github-workflows/assign-reviews.md + +> The reference workflow is available [here](#reference-workflow)! + +Automate pull request triage by intelligently assigning reviewers based on git blame analysis, notifying reviewers of pending PRs, and prompting authors on stale pull requests. The agent performs three sequential checks: pinging reviewers on clean PRs awaiting review (3+ days), reminding authors on stale PRs (5+ days), and auto-assigning reviewers based on code ownership for unassigned PRs. + +## How it works + +It relies on the basic action workflow (`01_basic_action`) which provides a flexible template for running arbitrary agent tasks in GitHub Actions. + +**Core Components:** +- **`agent_script.py`** - Python script that initializes the OpenHands agent with configurable LLM settings and executes tasks based on provided prompts +- **`workflow.yml`** - GitHub Actions workflow that sets up the environment, installs dependencies, and runs the agent + +**Prompt Options:** +1. **`PROMPT_STRING`** - Direct inline text for simple prompts (used in this example) +2. **`PROMPT_LOCATION`** - URL or file path for external prompts + +The workflow downloads the agent script, validates configuration, runs the task, and uploads execution logs as artifacts. + +## Assign Reviews Use Case + +This specific implementation uses the basic action template to handle three PR management scenarios: + +**1. Need Reviewer Action** +- Identifies PRs waiting for review +- Notifies reviewers to take action + +**2. Need Author Action** +- Finds stale PRs with no activity for 5+ days +- Prompts authors to update, request review, or close + +**3. Need Reviewers** +- Detects non-draft PRs without assigned reviewers (created 1+ day ago, CI passing) +- Uses git blame analysis to identify relevant contributors +- Automatically assigns reviewers based on file ownership and contribution history +- Balances reviewer workload across team members + +## Quick Start + + + + ```bash icon="terminal" + cp examples/03_github_workflows/01_basic_action/assign-reviews.yml .github/workflows/assign-reviews.yml + ``` + + + Go to `GitHub Settings → Secrets → Actions`, and add `LLM_API_KEY` + (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). + + + Go to `GitHub Settings → Actions → General → Workflow permissions` and enable "Read and write permissions". + + + The default is: Daily at 12 PM UTC. + + + +## Features + +- **Intelligent Assignment** - Uses git blame to identify relevant reviewers based on code ownership +- **Automated Notifications** - Sends contextual reminders to reviewers and authors +- **Workload Balancing** - Distributes review requests evenly across team members +- **Scheduled & Manual** - Runs daily automatically or on-demand via workflow dispatch + +## Reference Workflow + + +This example is available on GitHub: [examples/03_github_workflows/01_basic_action/assign-reviews.yml](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) + + +```yaml icon="yaml" expandable examples/03_github_workflows/01_basic_action/assign-reviews.yml +--- +# To set this up: +# 1. Change the name below to something relevant to your task +# 2. Modify the "env" section below with your prompt +# 3. Add your LLM_API_KEY to the repository secrets +# 4. Commit this file to your repository +# 5. Trigger the workflow manually or set up a schedule +name: Assign Reviews + +on: + # Manual trigger + workflow_dispatch: + # Scheduled trigger (disabled by default, uncomment and customize as needed) + schedule: + # Run at 12 PM UTC every day + - cron: 0 12 * * * + +permissions: + contents: write + pull-requests: write + issues: write + +jobs: + run-task: + runs-on: ubuntu-24.04 + env: + # Configuration (modify these values as needed) + AGENT_SCRIPT_URL: https://raw.githubusercontent.com/OpenHands/agent-sdk/main/examples/03_github_workflows/01_basic_action/agent_script.py + # Provide either PROMPT_LOCATION (URL/file) OR PROMPT_STRING (direct text), not both + # Option 1: Use a URL or file path for the prompt + PROMPT_LOCATION: '' + # PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt' + # Option 2: Use direct text for the prompt + PROMPT_STRING: > + Use GITHUB_TOKEN and the github API to organize open pull requests and issues in the repo. + Read the sections below in order, and perform each in order. Do NOT take action + on the same issue or PR twice. + + # Issues with needs-info - Check for OP Response + + Find all open issues that have the "needs-info" label. For each issue: + 1. Identify the original poster (issue author) + 2. Check if there are any comments from the original poster AFTER the "needs-info" label was added + 3. To determine when the label was added, use: GET /repos/{owner}/{repo}/issues/{issue_number}/timeline + and look for "labeled" events with the label "needs-info" + 4. If the original poster has commented after the label was added: + - Remove the "needs-info" label + - Add the "needs-triage" label + - Post a comment: "[Automatic Post]: The issue author has provided additional information. Moving back to needs-triage for review." + + # Issues with needs-triage + + Find all open issues that have the "needs-triage" label. For each issue that has been in this state for more than 4 days since the last + activity: + 1. First, check if the issue has already been triaged by verifying it does NOT have: + - The "enhancement" label + - Any "priority" label (priority:low, priority:medium, priority:high, etc.) + 2. If the issue has already been triaged (has enhancement or priority label), remove the needs-triage label + 3. For issues that have NOT been triaged yet: + - Read the issue description and comments + - Determine if it requires maintainer attention by checking: + * Is it a bug report, feature request, or question? + * Does it have enough information to be actionable? + * Has a maintainer already commented? + * Is the last comment older than 4 days? + - If it needs maintainer attention and no maintainer has commented: + * Find an appropriate maintainer based on the issue topic and recent activity + * Tag them with: "[Automatic Post]: This issue has been waiting for triage. @{maintainer}, could you please take a look when you have + a chance?" + + # Need Reviewer Action + + Find all open PRs where: + 1. The PR is waiting for review (there are no open review comments or change requests) + 2. The PR is in a "clean" state (CI passing, no merge conflicts) + 3. The PR is not marked as draft (draft: false) + 4. The PR has had no activity (comments, commits, reviews) for more than 3 days. + + In this case, send a message to the reviewers: + [Automatic Post]: This PR seems to be currently waiting for review. + {reviewer_names}, could you please take a look when you have a chance? + + # Need Author Action + + Find all open PRs where the most recent change or comment was made on the pull + request more than 5 days ago (use 14 days if the PR is marked as draft). + + And send a message to the author: + + [Automatic Post]: It has been a while since there was any activity on this PR. + {author}, are you still working on it? If so, please go ahead, if not then + please request review, close it, or request that someone else follow up. + + # Need Reviewers + + Find all open pull requests that: + 1. Have no reviewers assigned to them. + 2. Are not marked as draft. + 3. Were created more than 1 day ago. + 4. CI is passing and there are no merge conflicts. + + For each of these pull requests, read the git blame information for the files, + and find the most recent and active contributors to the file/location of the changes. + Assign one of these people as a reviewer, but try not to assign too many reviews to + any single person. Add this message: + + [Automatic Post]: I have assigned {reviewer} as a reviewer based on git blame information. + Thanks in advance for the help! + + LLM_MODEL: + LLM_BASE_URL: + steps: + - name: Checkout repository + uses: actions/checkout@v5 + + - name: Set up Python + uses: actions/setup-python@v6 + with: + python-version: '3.13' + + - name: Install uv + uses: astral-sh/setup-uv@v7 + with: + enable-cache: true + + - name: Install OpenHands dependencies + run: | + # Install OpenHands SDK and tools from git repository + uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" + uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" + + - name: Check required configuration + env: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + run: | + if [ -z "$LLM_API_KEY" ]; then + echo "Error: LLM_API_KEY secret is not set." + exit 1 + fi + + # Check that exactly one of PROMPT_LOCATION or PROMPT_STRING is set + if [ -n "$PROMPT_LOCATION" ] && [ -n "$PROMPT_STRING" ]; then + echo "Error: Both PROMPT_LOCATION and PROMPT_STRING are set." + echo "Please provide only one in the env section of the workflow file." + exit 1 + fi + + if [ -z "$PROMPT_LOCATION" ] && [ -z "$PROMPT_STRING" ]; then + echo "Error: Neither PROMPT_LOCATION nor PROMPT_STRING is set." + echo "Please set one in the env section of the workflow file." + exit 1 + fi + + if [ -n "$PROMPT_LOCATION" ]; then + echo "Prompt location: $PROMPT_LOCATION" + else + echo "Using inline PROMPT_STRING (${#PROMPT_STRING} characters)" + fi + echo "LLM model: $LLM_MODEL" + if [ -n "$LLM_BASE_URL" ]; then + echo "LLM base URL: $LLM_BASE_URL" + fi + + - name: Run task + env: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + PYTHONPATH: '' + run: | + echo "Running agent script: $AGENT_SCRIPT_URL" + + # Download script if it's a URL + if [[ "$AGENT_SCRIPT_URL" =~ ^https?:// ]]; then + echo "Downloading agent script from URL..." + curl -sSL "$AGENT_SCRIPT_URL" -o /tmp/agent_script.py + AGENT_SCRIPT_PATH="/tmp/agent_script.py" + else + AGENT_SCRIPT_PATH="$AGENT_SCRIPT_URL" + fi + + # Run with appropriate prompt argument + if [ -n "$PROMPT_LOCATION" ]; then + echo "Using prompt from: $PROMPT_LOCATION" + uv run python "$AGENT_SCRIPT_PATH" "$PROMPT_LOCATION" + else + echo "Using PROMPT_STRING (${#PROMPT_STRING} characters)" + uv run python "$AGENT_SCRIPT_PATH" + fi + + - name: Upload logs as artifact + uses: actions/upload-artifact@v4 + if: always() + with: + name: openhands-task-logs + path: | + *.log + output/ + retention-days: 7 +``` + +## Related Files + +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/agent_script.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/assign-reviews.yml) +- [Basic Action README](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/01_basic_action/README.md) + +### PR Review +Source: https://docs.openhands.dev/sdk/guides/github-workflows/pr-review.md + +> The reference workflow is available [here](#reference-workflow)! + +Automatically review pull requests, providing feedback on code quality, security, and best practices. Reviews can be triggered in two ways: +- Requesting `openhands-agent` as a reviewer +- Adding the `review-this` label to the PR + + +The reference workflow triggers on either the "review-this" label or when the openhands-agent account is requested as a reviewer. In OpenHands organization repositories, openhands-agent has access, so this works as-is. In your own repositories, requesting openhands-agent will only work if that account is added as a collaborator or is part of a team with access. If you don't plan to grant access, use the label trigger instead, or change the condition to a reviewer handle that exists in your repo. + + +## Quick Start + +```bash +# 1. Copy workflow to your repository +cp examples/03_github_workflows/02_pr_review/workflow.yml .github/workflows/pr-review.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY + +# 3. (Optional) Create a "review-this" label in your repository +# Go to Issues → Labels → New label +# You can also trigger reviews by requesting "openhands-agent" as a reviewer +``` + +## Features + +- **Fast Reviews** - Results posted on the PR in only 2 or 3 minutes +- **Comprehensive Analysis** - Analyzes the changes given the repository context. Covers code quality, security, best practices +- **GitHub Integration** - Posts comments directly to the PR +- **Customizable** - Add your own code review guidelines without forking + +## Security + +- Users with write access (maintainers) can trigger reviews by requesting `openhands-agent` as a reviewer or adding the `review-this` label. +- Maintainers need to read the PR to make sure it's safe to run. + +## Customizing the Code Review + +Instead of forking the `agent_script.py`, you can customize the code review behavior by adding a skill file to your repository. This is the **recommended approach** for customization. + +### How It Works + +The PR review agent uses skills from the [OpenHands/extensions](https://github.com/OpenHands/extensions) repository by default. You can add your project-specific guidelines alongside the default skill by creating a custom skill file. + + +**Skill paths**: Place skills in `.agents/skills/` (recommended). The legacy path `.openhands/skills/` is also supported. See [Skill Loading Precedence](/overview/skills#skill-loading-precedence) for details. + + +### Example: Custom Code Review Skill + +Create `.agents/skills/custom-codereview-guide.md` in your repository: + +```markdown +--- +name: custom-codereview-guide +description: Project-specific review guidelines for MyProject +triggers: +- /codereview +--- + +# MyProject-Specific Review Guidelines + +In addition to general code review practices, check for: + +## Project Conventions + +- All API endpoints must have OpenAPI documentation +- Database migrations must be reversible +- Feature flags required for new features + +## Architecture Rules + +- No direct database access from controllers +- All external API calls must go through the gateway service + +## Communication Style + +- Be direct and constructive +- Use GitHub suggestion syntax for code fixes +``` + + +**Note**: These rules supplement the default `code-review` skill, not replace it. + + + +**How skill merging works**: Using a unique name like `custom-codereview-guide` allows BOTH your custom skill AND the default `code-review` skill to be triggered by `/codereview`. When triggered, skill content is concatenated into the agent's context (public skills first, then your custom skills). There is no smart merging—if guidelines conflict, the agent sees both and must reconcile them. + +If your skill has `name: code-review` (matching the public skill's name), it will completely **override** the default public skill instead of supplementing it. + + + +**Migrating from override to supplement**: If you previously created a skill with `name: code-review` to override the default, rename it (e.g., to `my-project-review`) to receive guidelines from both skills instead. + + +### Benefits of Custom Skills + +1. **No forking required**: Keep using the official SDK while customizing behavior +2. **Version controlled**: Your review guidelines live in your repository +3. **Easy updates**: SDK updates don't overwrite your customizations +4. **Team alignment**: Everyone uses the same review standards +5. **Composable**: Add project-specific rules alongside default guidelines + + +See the [software-agent-sdk's own custom-codereview-guide skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/custom-codereview-guide.md) for a complete example. + + +## Reference Workflow + + +This example is available on GitHub: [examples/03_github_workflows/02_pr_review/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) + + +```yaml icon="yaml" expandable examples/03_github_workflows/02_pr_review/workflow.yml +--- +# OpenHands PR Review Workflow +# +# To set this up: +# 1. Copy this file to .github/workflows/pr-review.yml in your repository +# 2. Add LLM_API_KEY to repository secrets +# 3. Customize the inputs below as needed +# 4. Commit this file to your repository +# 5. Trigger the review by either: +# - Adding the "review-this" label to any PR, OR +# - Requesting openhands-agent as a reviewer +# +# For more information, see: +# https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review +name: PR Review by OpenHands + +on: + # Trigger when a label is added or a reviewer is requested + pull_request: + types: [labeled, review_requested] + +permissions: + contents: read + pull-requests: write + issues: write + +jobs: + pr-review: + # Run when review-this label is added OR openhands-agent is requested as reviewer + if: | + github.event.label.name == 'review-this' || + github.event.requested_reviewer.login == 'openhands-agent' + runs-on: ubuntu-latest + steps: + - name: Checkout for composite action + uses: actions/checkout@v4 + with: + repository: OpenHands/software-agent-sdk + # Use a specific version tag or branch (e.g., 'v1.0.0' or 'main') + ref: main + sparse-checkout: .github/actions/pr-review + + - name: Run PR Review + uses: ./.github/actions/pr-review + with: + # LLM configuration + llm-model: anthropic/claude-sonnet-4-5-20250929 + llm-base-url: '' + # Review style: roasted (other option: standard) + review-style: roasted + # SDK version to use (version tag or branch name) + sdk-version: main + # Secrets + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} +``` + +### Action Inputs + +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | Yes | - | +| `llm-base-url` | LLM base URL (optional) | No | `''` | +| `review-style` | Review style: 'standard' or 'roasted' | No | `roasted` | +| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | +| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | + +## Related Files + +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/agent_script.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/02_pr_review/prompt.py) +- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) + +### TODO Management +Source: https://docs.openhands.dev/sdk/guides/github-workflows/todo-management.md + +> The reference workflow is available [here](#reference-workflow)! + + +Scan your codebase for TODO comments and let the OpenHands Agent implement them, creating a pull request for each TODO and picking relevant reviewers based on code changes and file ownership + +## Quick Start + + + + ```bash icon="terminal" + cp examples/03_github_workflows/03_todo_management/workflow.yml .github/workflows/todo-management.yml + ``` + + + Go to `GitHub Settings → Secrets` and add `LLM_API_KEY` + (get from https://docs.openhands.dev/openhands/usage/llms/openhands-llms). + + + Go to `Settings → Actions → General → Workflow permissions` and enable: + - `Read and write permissions` + - `Allow GitHub Actions to create and approve pull requests` + + + Trigger the agent by adding TODO comments into your code. + + Example: `# TODO(openhands): Add input validation for user email` + + + The workflow is configurable and any identifier can be used in place of `TODO(openhands)` + + + + + +## Features + +- **Scanning** - Finds matching TODO comments with configurable identifiers and extracts the TODO description. +- **Implementation** - Sends the TODO description to the OpenHands Agent that automatically implements it +- **PR Management** - Creates feature branches, pull requests and picks most relevant reviewers + +## Best Practices + +- **Start Small** - Begin with `MAX_TODOS: 1` to test the workflow +- **Clear Descriptions** - Write descriptive TODO comments +- **Review PRs** - Always review the generated PRs before merging + +## Reference Workflow + + +This example is available on GitHub: [examples/03_github_workflows/03_todo_management/](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/03_todo_management) + + +```yaml icon="yaml" expandable examples/03_github_workflows/03_todo_management/workflow.yml +--- +# Automated TODO Management Workflow +# Make sure to replace and with +# appropriate values for your LLM setup. +# +# This workflow automatically scans for TODO(openhands) comments and creates +# pull requests to implement them using the OpenHands agent. +# +# Setup: +# 1. Add LLM_API_KEY to repository secrets +# 2. Ensure GITHUB_TOKEN has appropriate permissions +# 3. Make sure Github Actions are allowed to create and review PRs +# 4. Commit this file to .github/workflows/ in your repository +# 5. Configure the schedule or trigger manually + +name: Automated TODO Management + +on: + # Manual trigger + workflow_dispatch: + inputs: + max_todos: + description: Maximum number of TODOs to process in this run + required: false + default: '3' + type: string + todo_identifier: + description: TODO identifier to search for (e.g., TODO(openhands)) + required: false + default: TODO(openhands) + type: string + + # Trigger when 'automatic-todo' label is added to a PR + pull_request: + types: [labeled] + + # Scheduled trigger (disabled by default, uncomment and customize as needed) + # schedule: + # # Run every Monday at 9 AM UTC + # - cron: "0 9 * * 1" + +permissions: + contents: write + pull-requests: write + issues: write + +jobs: + scan-todos: + runs-on: ubuntu-latest + # Only run if triggered manually or if 'automatic-todo' label was added + if: > + github.event_name == 'workflow_dispatch' || + (github.event_name == 'pull_request' && + github.event.label.name == 'automatic-todo') + outputs: + todos: ${{ steps.scan.outputs.todos }} + todo-count: ${{ steps.scan.outputs.todo-count }} + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 # Full history for better context + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.13' + + - name: Copy TODO scanner + run: | + cp examples/03_github_workflows/03_todo_management/scanner.py /tmp/scanner.py + chmod +x /tmp/scanner.py + + - name: Scan for TODOs + id: scan + run: | + echo "Scanning for TODO comments..." + + # Run the scanner and capture output + TODO_IDENTIFIER="${{ github.event.inputs.todo_identifier || 'TODO(openhands)' }}" + python /tmp/scanner.py . --identifier "$TODO_IDENTIFIER" > todos.json + + # Count TODOs + TODO_COUNT=$(python -c \ + "import json; data=json.load(open('todos.json')); print(len(data))") + echo "Found $TODO_COUNT $TODO_IDENTIFIER items" + + # Limit the number of TODOs to process + MAX_TODOS="${{ github.event.inputs.max_todos || '3' }}" + if [ "$TODO_COUNT" -gt "$MAX_TODOS" ]; then + echo "Limiting to first $MAX_TODOS TODOs" + python -c " + import json + data = json.load(open('todos.json')) + limited = data[:$MAX_TODOS] + json.dump(limited, open('todos.json', 'w'), indent=2) + " + TODO_COUNT=$MAX_TODOS + fi + + # Set outputs + echo "todos=$(cat todos.json | jq -c .)" >> $GITHUB_OUTPUT + echo "todo-count=$TODO_COUNT" >> $GITHUB_OUTPUT + + # Display found TODOs + echo "## 📋 Found TODOs" >> $GITHUB_STEP_SUMMARY + if [ "$TODO_COUNT" -eq 0 ]; then + echo "No TODO(openhands) comments found." >> $GITHUB_STEP_SUMMARY + else + echo "Found $TODO_COUNT TODO(openhands) items:" \ + >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + python -c " + import json + data = json.load(open('todos.json')) + for i, todo in enumerate(data, 1): + print(f'{i}. **{todo[\"file\"]}:{todo[\"line\"]}** - ' + + f'{todo[\"description\"]}') + " >> $GITHUB_STEP_SUMMARY + fi + + process-todos: + needs: scan-todos + if: needs.scan-todos.outputs.todo-count > 0 + runs-on: ubuntu-latest + strategy: + matrix: + todo: ${{ fromJson(needs.scan-todos.outputs.todos) }} + max-parallel: 1 # Process one TODO at a time to avoid conflicts + steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 + token: ${{ secrets.GITHUB_TOKEN }} + + - name: Switch to feature branch with TODO management files + run: | + git checkout openhands/todo-management-example + git pull origin openhands/todo-management-example + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.13' + + - name: Install uv + uses: astral-sh/setup-uv@v6 + with: + enable-cache: true + + - name: Install OpenHands dependencies + run: | + # Install OpenHands SDK and tools from git repository + uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk" + uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools" + + - name: Copy agent files + run: | + cp examples/03_github_workflows/03_todo_management/agent_script.py agent.py + cp examples/03_github_workflows/03_todo_management/prompt.py prompt.py + chmod +x agent.py + + - name: Configure Git + run: | + git config --global user.name "openhands-bot" + git config --global user.email \ + "openhands-bot@users.noreply.github.com" + + - name: Process TODO + env: + LLM_MODEL: + LLM_BASE_URL: + LLM_API_KEY: ${{ secrets.LLM_API_KEY }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_REPOSITORY: ${{ github.repository }} + TODO_FILE: ${{ matrix.todo.file }} + TODO_LINE: ${{ matrix.todo.line }} + TODO_DESCRIPTION: ${{ matrix.todo.description }} + PYTHONPATH: '' + run: | + echo "Processing TODO: $TODO_DESCRIPTION" + echo "File: $TODO_FILE:$TODO_LINE" + + # Create a unique branch name for this TODO + BRANCH_NAME="todo/$(echo "$TODO_DESCRIPTION" | \ + sed 's/[^a-zA-Z0-9]/-/g' | \ + sed 's/--*/-/g' | \ + sed 's/^-\|-$//g' | \ + tr '[:upper:]' '[:lower:]' | \ + cut -c1-50)" + echo "Branch name: $BRANCH_NAME" + + # Create and switch to new branch (force create if exists) + git checkout -B "$BRANCH_NAME" + + # Run the agent to process the TODO + # Stay in repository directory for git operations + + # Create JSON payload for the agent + TODO_JSON=$(cat <&1 | tee agent_output.log + AGENT_EXIT_CODE=$? + set -e + + echo "Agent exit code: $AGENT_EXIT_CODE" + echo "Agent output log:" + cat agent_output.log + + # Show files in working directory + echo "Files in working directory:" + ls -la + + # If agent failed, show more details + if [ $AGENT_EXIT_CODE -ne 0 ]; then + echo "Agent failed with exit code $AGENT_EXIT_CODE" + echo "Last 50 lines of agent output:" + tail -50 agent_output.log + exit $AGENT_EXIT_CODE + fi + + # Check if any changes were made + cd "$GITHUB_WORKSPACE" + if git diff --quiet; then + echo "No changes made by agent, skipping PR creation" + exit 0 + fi + + # Commit changes + git add -A + git commit -m "Implement TODO: $TODO_DESCRIPTION + + Automatically implemented by OpenHands agent. + + Co-authored-by: openhands " + + # Push branch + git push origin "$BRANCH_NAME" + + # Create pull request + PR_TITLE="Implement TODO: $TODO_DESCRIPTION" + PR_BODY="## 🤖 Automated TODO Implementation + + This PR automatically implements the following TODO: + + **File:** \`$TODO_FILE:$TODO_LINE\` + **Description:** $TODO_DESCRIPTION + + ### Implementation + The OpenHands agent has analyzed the TODO and implemented the + requested functionality. + + ### Review Notes + - Please review the implementation for correctness + - Test the changes in your development environment + - The original TODO comment will be updated with this PR URL + once merged + + --- + *This PR was created automatically by the TODO Management workflow.*" + + # Create PR using GitHub CLI or API + curl -X POST \ + -H "Authorization: token $GITHUB_TOKEN" \ + -H "Accept: application/vnd.github.v3+json" \ + "https://api.github.com/repos/${{ github.repository }}/pulls" \ + -d "{ + \"title\": \"$PR_TITLE\", + \"body\": \"$PR_BODY\", + \"head\": \"$BRANCH_NAME\", + \"base\": \"${{ github.ref_name }}\" + }" + + summary: + needs: [scan-todos, process-todos] + if: always() + runs-on: ubuntu-latest + steps: + - name: Generate Summary + run: | + echo "# 🤖 TODO Management Summary" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + + TODO_COUNT="${{ needs.scan-todos.outputs.todo-count || '0' }}" + echo "**TODOs Found:** $TODO_COUNT" >> $GITHUB_STEP_SUMMARY + + if [ "$TODO_COUNT" -gt 0 ]; then + echo "**Processing Status:** ✅ Completed" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + echo "Check the pull requests created for each TODO" \ + "implementation." >> $GITHUB_STEP_SUMMARY + else + echo "**Status:** ℹ️ No TODOs found to process" \ + >> $GITHUB_STEP_SUMMARY + fi + + echo "" >> $GITHUB_STEP_SUMMARY + echo "---" >> $GITHUB_STEP_SUMMARY + echo "*Workflow completed at $(date)*" >> $GITHUB_STEP_SUMMARY +``` + +## Related Documentation + +- [Agent Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/agent_script.py) +- [Scanner Script](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/scanner.py) +- [Workflow File](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/workflow.yml) +- [Prompt Template](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/03_github_workflows/03_todo_management/prompt.py) + +### Hello World +Source: https://docs.openhands.dev/sdk/guides/hello-world.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## Your First Agent + +This is the most basic example showing how to set up and run an OpenHands agent. + + + + ### LLM Configuration + + Configure the language model that will power your agent: + ```python icon="python" + llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, # Optional + service_id="agent" + ) + ``` + + + ### Select an Agent + Use the preset agent with common built-in tools: + ```python icon="python" + agent = get_default_agent(llm=llm, cli_mode=True) + ``` + The default agent includes `BashTool`, `FileEditorTool`, etc. + + For the complete list of available tools see the + [tools package source code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-tools/openhands/tools). + + + + + ### Start a Conversation + Start a conversation to manage the agent's lifecycle: + ```python icon="python" + conversation = Conversation(agent=agent, workspace=cwd) + conversation.send_message( + "Write 3 facts about the current project into FACTS.txt." + ) + conversation.run() + ``` + + + ### Expected Behavior + When you run this example: + 1. The agent analyzes the current directory + 2. Gathers information about the project + 3. Creates `FACTS.txt` with 3 relevant facts + 4. Completes and exits + + Example output file: + + ```text icon="text" wrap + FACTS.txt + --------- + 1. This is a Python project using the OpenHands Software Agent SDK. + 2. The project includes examples demonstrating various agent capabilities. + 3. The SDK provides tools for file manipulation, bash execution, and more. + ``` + + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/01_hello_world.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/01_hello_world.py) + + +```python icon="python" wrap expandable examples/01_standalone_sdk/01_hello_world.py +import os + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +llm = LLM( + model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), + api_key=os.getenv("LLM_API_KEY"), + base_url=os.getenv("LLM_BASE_URL", None), +) + +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) + +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) + +conversation.send_message("Write 3 facts about the current project into FACTS.txt.") +conversation.run() +print("All done!") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create custom tools for specialized needs +- **[Model Context Protocol (MCP)](/sdk/guides/mcp)** - Integrate external MCP servers +- **[Security Analyzer](/sdk/guides/security)** - Add security validation to tool usage + +### Hooks +Source: https://docs.openhands.dev/sdk/guides/hooks.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +## Overview + +Hooks let you observe and customize key lifecycle moments in the SDK without forking core code. Typical uses include: +- Logging and analytics +- Emitting custom metrics +- Auditing or compliance +- Tracing and debugging + +## Hook Types + +| Hook | When it runs | Can block? | +|------|--------------|------------| +| PreToolUse | Before tool execution | Yes (exit 2) | +| PostToolUse | After tool execution | No | +| UserPromptSubmit | Before processing user message | Yes (exit 2) | +| Stop | When agent tries to finish | Yes (exit 2) | +| SessionStart | When conversation starts | No | +| SessionEnd | When conversation ends | No | + +## Key Concepts + +- Registration points: subscribe to events or attach pre/post hooks around LLM calls and tool execution +- Isolation: hooks run outside the agent loop logic, avoiding core modifications +- Composition: enable or disable hooks per environment (local vs. prod) + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/33_hooks](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/33_hooks/) + + +```python icon="python" expandable examples/01_standalone_sdk/33_hooks/33_hooks.py +"""OpenHands Agent SDK — Hooks Example + +Demonstrates the OpenHands hooks system. +Hooks are shell scripts that run at key lifecycle events: + +- PreToolUse: Block dangerous commands before execution +- PostToolUse: Log tool usage after execution +- UserPromptSubmit: Inject context into user messages +- Stop: Enforce task completion criteria + +The hook scripts are in the scripts/ directory alongside this file. +""" + +import os +import signal +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.sdk.hooks import HookConfig, HookDefinition, HookMatcher +from openhands.tools.preset.default import get_default_agent + + +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + +SCRIPT_DIR = Path(__file__).parent / "hook_scripts" + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create temporary workspace with git repo +with tempfile.TemporaryDirectory() as tmpdir: + workspace = Path(tmpdir) + os.system(f"cd {workspace} && git init -q && echo 'test' > file.txt") + + log_file = workspace / "tool_usage.log" + summary_file = workspace / "summary.txt" + + # Configure hooks using the typed approach (recommended) + # This provides better type safety and IDE support + hook_config = HookConfig( + pre_tool_use=[ + HookMatcher( + matcher="terminal", + hooks=[ + HookDefinition( + command=str(SCRIPT_DIR / "block_dangerous.sh"), + timeout=10, + ) + ], + ) + ], + post_tool_use=[ + HookMatcher( + matcher="*", + hooks=[ + HookDefinition( + command=(f"LOG_FILE={log_file} {SCRIPT_DIR / 'log_tools.sh'}"), + timeout=5, + ) + ], + ) + ], + user_prompt_submit=[ + HookMatcher( + hooks=[ + HookDefinition( + command=str(SCRIPT_DIR / "inject_git_context.sh"), + ) + ], + ) + ], + stop=[ + HookMatcher( + hooks=[ + HookDefinition( + command=( + f"SUMMARY_FILE={summary_file} " + f"{SCRIPT_DIR / 'require_summary.sh'}" + ), + ) + ], + ) + ], + ) + + # Alternative: You can also use .from_dict() for loading from JSON config files + # Example with a single hook matcher: + # hook_config = HookConfig.from_dict({ + # "hooks": { + # "PreToolUse": [{ + # "matcher": "terminal", + # "hooks": [{"command": "path/to/script.sh", "timeout": 10}] + # }] + # } + # }) + + agent = get_default_agent(llm=llm) + conversation = Conversation( + agent=agent, + workspace=str(workspace), + hook_config=hook_config, + ) + + # Demo 1: Safe command (PostToolUse logs it) + print("=" * 60) + print("Demo 1: Safe command - logged by PostToolUse") + print("=" * 60) + conversation.send_message("Run: echo 'Hello from hooks!'") + conversation.run() + + if log_file.exists(): + print(f"\n[Log: {log_file.read_text().strip()}]") + + # Demo 2: Dangerous command (PreToolUse blocks it) + print("\n" + "=" * 60) + print("Demo 2: Dangerous command - blocked by PreToolUse") + print("=" * 60) + conversation.send_message("Run: rm -rf /tmp/test") + conversation.run() + + # Demo 3: Context injection + Stop hook enforcement + print("\n" + "=" * 60) + print("Demo 3: Context injection + Stop hook") + print("=" * 60) + print("UserPromptSubmit injects git status; Stop requires summary.txt\n") + conversation.send_message( + "Check what files have changes, then create summary.txt describing the repo." + ) + conversation.run() + + if summary_file.exists(): + print(f"\n[summary.txt: {summary_file.read_text()[:80]}...]") + + print("\n" + "=" * 60) + print("Example Complete!") + print("=" * 60) + + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") +``` + + + +### Hook Scripts + +The example uses external hook scripts in the `hook_scripts/` directory: + + +```bash +#!/bin/bash +# PreToolUse hook: Block dangerous rm -rf commands +# Uses jq for JSON parsing (needed for nested fields like tool_input.command) + +input=$(cat) +command=$(echo "$input" | jq -r '.tool_input.command // ""') + +# Block rm -rf commands +if [[ "$command" =~ "rm -rf" ]]; then + echo '{"decision": "deny", "reason": "rm -rf commands are blocked for safety"}' + exit 2 # Exit code 2 = block the operation +fi + +exit 0 # Exit code 0 = allow the operation +``` + + + +```bash +#!/bin/bash +# PostToolUse hook: Log all tool usage +# Uses OPENHANDS_TOOL_NAME env var (no jq/python needed!) + +# LOG_FILE should be set by the calling script +LOG_FILE="${LOG_FILE:-/tmp/tool_usage.log}" + +echo "[$(date)] Tool used: $OPENHANDS_TOOL_NAME" >> "$LOG_FILE" +exit 0 +``` + + + +```bash +#!/bin/bash +# UserPromptSubmit hook: Inject git status when user asks about code changes + +input=$(cat) + +# Check if user is asking about changes, diff, or git +if echo "$input" | grep -qiE "(changes|diff|git|commit|modified)"; then + # Get git status if in a git repo + if git rev-parse --git-dir > /dev/null 2>&1; then + status=$(git status --short 2>/dev/null | head -10) + if [ -n "$status" ]; then + # Escape for JSON + escaped=$(echo "$status" | sed 's/"/\\"/g' | tr '\n' ' ') + echo "{\"additionalContext\": \"Current git status: $escaped\"}" + fi + fi +fi +exit 0 +``` + + + +```bash +#!/bin/bash +# Stop hook: Require a summary.txt file before allowing agent to finish +# SUMMARY_FILE should be set by the calling script + +SUMMARY_FILE="${SUMMARY_FILE:-./summary.txt}" + +if [ ! -f "$SUMMARY_FILE" ]; then + echo '{"decision": "deny", "additionalContext": "Create summary.txt first."}' + exit 2 +fi +exit 0 +``` + + + +## Next Steps + +- See also: [Metrics and Observability](/sdk/guides/metrics) +- Architecture: [Events](/sdk/arch/events) + +### Iterative Refinement +Source: https://docs.openhands.dev/sdk/guides/iterative-refinement.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> The ready-to-run example is available [here](#ready-to-run-example)! + +## Overview + +Iterative refinement is a powerful pattern where multiple agents work together in a feedback loop: +1. A **refactoring agent** performs the main task (e.g., code conversion) +2. A **critique agent** evaluates the quality and provides detailed feedback +3. If quality is below threshold, the refactoring agent tries again with the feedback + +This pattern is useful for: +- Code refactoring and modernization (e.g., COBOL to Java) +- Document translation and localization +- Content generation with quality requirements +- Any task requiring iterative improvement + +## How It Works + +### The Iteration Loop + +The core workflow runs in a loop until quality threshold is met: + +```python icon="python" wrap +QUALITY_THRESHOLD = 90.0 +MAX_ITERATIONS = 5 + +while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + # Phase 1: Refactoring agent converts COBOL to Java + refactoring_agent = get_default_agent(llm=llm, cli_mode=True) + refactoring_conversation = Conversation( + agent=refactoring_agent, + workspace=str(workspace_dir) + ) + refactoring_conversation.send_message(refactoring_prompt) + refactoring_conversation.run() + + # Phase 2: Critique agent evaluates the conversion + critique_agent = get_default_agent(llm=llm, cli_mode=True) + critique_conversation = Conversation( + agent=critique_agent, + workspace=str(workspace_dir) + ) + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + + # Parse score and decide whether to continue + current_score = parse_critique_score(critique_file) + + iteration += 1 +``` + +### Critique Scoring + +The critique agent evaluates each file on four dimensions (0-25 pts each): +- **Correctness**: Does the Java code preserve the original business logic? +- **Code Quality**: Is the code clean and following Java conventions? +- **Completeness**: Are all COBOL features properly converted? +- **Best Practices**: Does it use proper OOP, error handling, and documentation? + +### Feedback Loop + +When the score is below threshold, the refactoring agent receives the critique file location: + +```python icon="python" wrap +if critique_file and critique_file.exists(): + base_prompt += f""" +IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. +Please review the critique at: {critique_file} +Address all issues mentioned in the critique to improve the conversion quality. +""" +``` + +## Customization + +### Adjusting Thresholds + +```python icon="python" wrap +QUALITY_THRESHOLD = 95.0 # Require higher quality +MAX_ITERATIONS = 10 # Allow more iterations +``` + +### Using Real COBOL Files + +The example uses sample files, but you can use real files from the [AWS CardDemo project](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl). + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/31_iterative_refinement.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/31_iterative_refinement.py) + + +```python icon="python" expandable examples/01_standalone_sdk/31_iterative_refinement.py +#!/usr/bin/env python3 +""" +Iterative Refinement Example: COBOL to Java Refactoring + +This example demonstrates an iterative refinement workflow where: +1. A refactoring agent converts COBOL files to Java files +2. A critique agent evaluates the quality of each conversion and provides scores +3. If the average score is below 90%, the process repeats with feedback + +The workflow continues until the refactoring meets the quality threshold. + +Source COBOL files can be obtained from: +https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl +""" + +import os +import re +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Conversation +from openhands.tools.preset.default import get_default_agent + + +QUALITY_THRESHOLD = float(os.getenv("QUALITY_THRESHOLD", "90.0")) +MAX_ITERATIONS = int(os.getenv("MAX_ITERATIONS", "5")) + + +def setup_workspace() -> tuple[Path, Path, Path]: + """Create workspace directories for the refactoring workflow.""" + workspace_dir = Path(tempfile.mkdtemp()) + cobol_dir = workspace_dir / "cobol" + java_dir = workspace_dir / "java" + critique_dir = workspace_dir / "critiques" + + cobol_dir.mkdir(parents=True, exist_ok=True) + java_dir.mkdir(parents=True, exist_ok=True) + critique_dir.mkdir(parents=True, exist_ok=True) + + return workspace_dir, cobol_dir, java_dir + + +def create_sample_cobol_files(cobol_dir: Path) -> list[str]: + """Create sample COBOL files for demonstration. + + In a real scenario, you would clone files from: + https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl + """ + sample_files = { + "CBACT01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBACT01C. + ***************************************************************** + * Program: CBACT01C - Account Display Program + * Purpose: Display account information for a given account number + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-ACCOUNT-ID PIC 9(11). + 01 WS-ACCOUNT-STATUS PIC X(1). + 01 WS-ACCOUNT-BALANCE PIC S9(13)V99. + 01 WS-CUSTOMER-NAME PIC X(50). + 01 WS-ERROR-MSG PIC X(80). + + PROCEDURE DIVISION. + PERFORM 1000-INIT. + PERFORM 2000-PROCESS. + PERFORM 3000-TERMINATE. + STOP RUN. + + 1000-INIT. + INITIALIZE WS-ACCOUNT-ID + INITIALIZE WS-ACCOUNT-STATUS + INITIALIZE WS-ACCOUNT-BALANCE + INITIALIZE WS-CUSTOMER-NAME. + + 2000-PROCESS. + DISPLAY "ENTER ACCOUNT NUMBER: " + ACCEPT WS-ACCOUNT-ID + IF WS-ACCOUNT-ID = ZEROS + MOVE "INVALID ACCOUNT NUMBER" TO WS-ERROR-MSG + DISPLAY WS-ERROR-MSG + ELSE + DISPLAY "ACCOUNT: " WS-ACCOUNT-ID + DISPLAY "STATUS: " WS-ACCOUNT-STATUS + DISPLAY "BALANCE: " WS-ACCOUNT-BALANCE + END-IF. + + 3000-TERMINATE. + DISPLAY "PROGRAM COMPLETE". +""", + "CBCUS01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBCUS01C. + ***************************************************************** + * Program: CBCUS01C - Customer Information Program + * Purpose: Manage customer data operations + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-CUSTOMER-ID PIC 9(9). + 01 WS-FIRST-NAME PIC X(25). + 01 WS-LAST-NAME PIC X(25). + 01 WS-ADDRESS PIC X(100). + 01 WS-PHONE PIC X(15). + 01 WS-EMAIL PIC X(50). + 01 WS-OPERATION PIC X(1). + 88 OP-ADD VALUE 'A'. + 88 OP-UPDATE VALUE 'U'. + 88 OP-DELETE VALUE 'D'. + 88 OP-DISPLAY VALUE 'V'. + + PROCEDURE DIVISION. + PERFORM 1000-MAIN-PROCESS. + STOP RUN. + + 1000-MAIN-PROCESS. + DISPLAY "CUSTOMER MANAGEMENT SYSTEM" + DISPLAY "A-ADD U-UPDATE D-DELETE V-VIEW" + ACCEPT WS-OPERATION + EVALUATE TRUE + WHEN OP-ADD + PERFORM 2000-ADD-CUSTOMER + WHEN OP-UPDATE + PERFORM 3000-UPDATE-CUSTOMER + WHEN OP-DELETE + PERFORM 4000-DELETE-CUSTOMER + WHEN OP-DISPLAY + PERFORM 5000-DISPLAY-CUSTOMER + WHEN OTHER + DISPLAY "INVALID OPERATION" + END-EVALUATE. + + 2000-ADD-CUSTOMER. + DISPLAY "ADDING NEW CUSTOMER" + ACCEPT WS-CUSTOMER-ID + ACCEPT WS-FIRST-NAME + ACCEPT WS-LAST-NAME + DISPLAY "CUSTOMER ADDED: " WS-CUSTOMER-ID. + + 3000-UPDATE-CUSTOMER. + DISPLAY "UPDATING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "CUSTOMER UPDATED: " WS-CUSTOMER-ID. + + 4000-DELETE-CUSTOMER. + DISPLAY "DELETING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "CUSTOMER DELETED: " WS-CUSTOMER-ID. + + 5000-DISPLAY-CUSTOMER. + DISPLAY "DISPLAYING CUSTOMER" + ACCEPT WS-CUSTOMER-ID + DISPLAY "ID: " WS-CUSTOMER-ID + DISPLAY "NAME: " WS-FIRST-NAME " " WS-LAST-NAME. +""", + "CBTRN01C.cbl": """ IDENTIFICATION DIVISION. + PROGRAM-ID. CBTRN01C. + ***************************************************************** + * Program: CBTRN01C - Transaction Processing Program + * Purpose: Process financial transactions + ***************************************************************** + ENVIRONMENT DIVISION. + DATA DIVISION. + WORKING-STORAGE SECTION. + 01 WS-TRANS-ID PIC 9(16). + 01 WS-TRANS-TYPE PIC X(2). + 88 TRANS-CREDIT VALUE 'CR'. + 88 TRANS-DEBIT VALUE 'DB'. + 88 TRANS-TRANSFER VALUE 'TR'. + 01 WS-TRANS-AMOUNT PIC S9(13)V99. + 01 WS-FROM-ACCOUNT PIC 9(11). + 01 WS-TO-ACCOUNT PIC 9(11). + 01 WS-TRANS-DATE PIC 9(8). + 01 WS-TRANS-STATUS PIC X(10). + + PROCEDURE DIVISION. + PERFORM 1000-INITIALIZE. + PERFORM 2000-PROCESS-TRANSACTION. + PERFORM 3000-FINALIZE. + STOP RUN. + + 1000-INITIALIZE. + MOVE ZEROS TO WS-TRANS-ID + MOVE SPACES TO WS-TRANS-TYPE + MOVE ZEROS TO WS-TRANS-AMOUNT + MOVE "PENDING" TO WS-TRANS-STATUS. + + 2000-PROCESS-TRANSACTION. + DISPLAY "ENTER TRANSACTION TYPE (CR/DB/TR): " + ACCEPT WS-TRANS-TYPE + DISPLAY "ENTER AMOUNT: " + ACCEPT WS-TRANS-AMOUNT + EVALUATE TRUE + WHEN TRANS-CREDIT + PERFORM 2100-PROCESS-CREDIT + WHEN TRANS-DEBIT + PERFORM 2200-PROCESS-DEBIT + WHEN TRANS-TRANSFER + PERFORM 2300-PROCESS-TRANSFER + WHEN OTHER + MOVE "INVALID" TO WS-TRANS-STATUS + END-EVALUATE. + + 2100-PROCESS-CREDIT. + DISPLAY "PROCESSING CREDIT" + ACCEPT WS-TO-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "CREDIT APPLIED TO: " WS-TO-ACCOUNT. + + 2200-PROCESS-DEBIT. + DISPLAY "PROCESSING DEBIT" + ACCEPT WS-FROM-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "DEBIT FROM: " WS-FROM-ACCOUNT. + + 2300-PROCESS-TRANSFER. + DISPLAY "PROCESSING TRANSFER" + ACCEPT WS-FROM-ACCOUNT + ACCEPT WS-TO-ACCOUNT + MOVE "COMPLETED" TO WS-TRANS-STATUS + DISPLAY "TRANSFER FROM " WS-FROM-ACCOUNT " TO " WS-TO-ACCOUNT. + + 3000-FINALIZE. + DISPLAY "TRANSACTION STATUS: " WS-TRANS-STATUS. +""", + } + + created_files = [] + for filename, content in sample_files.items(): + file_path = cobol_dir / filename + file_path.write_text(content) + created_files.append(filename) + + return created_files + + +def get_refactoring_prompt( + cobol_dir: Path, + java_dir: Path, + cobol_files: list[str], + critique_file: Path | None = None, +) -> str: + """Generate the prompt for the refactoring agent.""" + files_list = "\n".join(f" - {f}" for f in cobol_files) + + base_prompt = f"""Convert the following COBOL files to Java: + +COBOL Source Directory: {cobol_dir} +Java Target Directory: {java_dir} + +Files to convert: +{files_list} + +Requirements: +1. Create a Java class for each COBOL program +2. Preserve the business logic and data structures +3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) +4. Convert COBOL data types to appropriate Java types +5. Implement proper error handling with try-catch blocks +6. Add JavaDoc comments explaining the purpose of each class and method +7. In JavaDoc comments, include traceability to the original COBOL source using + the format: @source : (e.g., @source CBACT01C.cbl:73-77) +8. Create a clean, maintainable object-oriented design +9. Each Java file should be compilable and follow Java best practices + +Read each COBOL file and create the corresponding Java file in the target directory. +""" + + if critique_file and critique_file.exists(): + base_prompt += f""" + +IMPORTANT: A previous refactoring attempt was evaluated and needs improvement. +Please review the critique at: {critique_file} +Address all issues mentioned in the critique to improve the conversion quality. +""" + + return base_prompt + + +def get_critique_prompt( + cobol_dir: Path, + java_dir: Path, + cobol_files: list[str], +) -> str: + """Generate the prompt for the critique agent.""" + files_list = "\n".join(f" - {f}" for f in cobol_files) + + return f"""Evaluate the quality of COBOL to Java refactoring. + +COBOL Source Directory: {cobol_dir} +Java Target Directory: {java_dir} + +Original COBOL files: +{files_list} + +Please evaluate each converted Java file against its original COBOL source. + +For each file, assess: +1. Correctness: Does the Java code preserve the original business logic? (0-25 pts) +2. Code Quality: Is the code clean, readable, following Java conventions? (0-25 pts) +3. Completeness: Are all COBOL features properly converted? (0-25 pts) +4. Best Practices: Does it use proper OOP, error handling, documentation? (0-25 pts) + +Create a critique report in the following EXACT format: + +# COBOL to Java Refactoring Critique Report + +## Summary +[Brief overall assessment] + +## File Evaluations + +### [Original COBOL filename] +- **Java File**: [corresponding Java filename or "NOT FOUND"] +- **Correctness**: [score]/25 - [brief explanation] +- **Code Quality**: [score]/25 - [brief explanation] +- **Completeness**: [score]/25 - [brief explanation] +- **Best Practices**: [score]/25 - [brief explanation] +- **File Score**: [total]/100 +- **Issues to Address**: + - [specific issue 1] + - [specific issue 2] + ... + +[Repeat for each file] + +## Overall Score +- **Average Score**: [calculated average of all file scores] +- **Recommendation**: [PASS if average >= 90, NEEDS_IMPROVEMENT otherwise] + +## Priority Improvements +1. [Most critical improvement needed] +2. [Second priority] +3. [Third priority] + +Save this report to: {java_dir.parent}/critiques/critique_report.md +""" + + +def parse_critique_score(critique_file: Path) -> float: + """Parse the average score from the critique report.""" + if not critique_file.exists(): + return 0.0 + + content = critique_file.read_text() + + # Look for "Average Score: X" pattern + patterns = [ + r"\*\*Average Score\*\*:\s*(\d+(?:\.\d+)?)", + r"Average Score:\s*(\d+(?:\.\d+)?)", + r"average.*?(\d+(?:\.\d+)?)\s*(?:/100|%|$)", + ] + + for pattern in patterns: + match = re.search(pattern, content, re.IGNORECASE) + if match: + return float(match.group(1)) + + return 0.0 + + +def run_iterative_refinement() -> None: + """Run the iterative refinement workflow.""" + # Setup + api_key = os.getenv("LLM_API_KEY") + assert api_key is not None, "LLM_API_KEY environment variable is not set." + model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + base_url = os.getenv("LLM_BASE_URL") + + llm = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="iterative_refinement", + ) + + workspace_dir, cobol_dir, java_dir = setup_workspace() + critique_dir = workspace_dir / "critiques" + + print(f"Workspace: {workspace_dir}") + print(f"COBOL Directory: {cobol_dir}") + print(f"Java Directory: {java_dir}") + print(f"Critique Directory: {critique_dir}") + print() + + # Create sample COBOL files + cobol_files = create_sample_cobol_files(cobol_dir) + print(f"Created {len(cobol_files)} sample COBOL files:") + for f in cobol_files: + print(f" - {f}") + print() + + critique_file = critique_dir / "critique_report.md" + current_score = 0.0 + iteration = 0 + + while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + iteration += 1 + print("=" * 80) + print(f"ITERATION {iteration}") + print("=" * 80) + + # Phase 1: Refactoring + print("\n--- Phase 1: Refactoring Agent ---") + refactoring_agent = get_default_agent(llm=llm, cli_mode=True) + refactoring_conversation = Conversation( + agent=refactoring_agent, + workspace=str(workspace_dir), + ) + + previous_critique = critique_file if iteration > 1 else None + refactoring_prompt = get_refactoring_prompt( + cobol_dir, java_dir, cobol_files, previous_critique + ) + + refactoring_conversation.send_message(refactoring_prompt) + refactoring_conversation.run() + print("Refactoring phase complete.") + + # Phase 2: Critique + print("\n--- Phase 2: Critique Agent ---") + critique_agent = get_default_agent(llm=llm, cli_mode=True) + critique_conversation = Conversation( + agent=critique_agent, + workspace=str(workspace_dir), + ) + + critique_prompt = get_critique_prompt(cobol_dir, java_dir, cobol_files) + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + print("Critique phase complete.") + + # Parse the score + current_score = parse_critique_score(critique_file) + print(f"\nCurrent Score: {current_score:.1f}%") + + if current_score >= QUALITY_THRESHOLD: + print(f"\n✓ Quality threshold ({QUALITY_THRESHOLD}%) met!") + else: + print( + f"\n✗ Score below threshold ({QUALITY_THRESHOLD}%). " + "Continuing refinement..." + ) + + # Final summary + print("\n" + "=" * 80) + print("ITERATIVE REFINEMENT COMPLETE") + print("=" * 80) + print(f"Total iterations: {iteration}") + print(f"Final score: {current_score:.1f}%") + print(f"Workspace: {workspace_dir}") + + # List created Java files + print("\nCreated Java files:") + for java_file in java_dir.glob("*.java"): + print(f" - {java_file.name}") + + # Show critique file location + if critique_file.exists(): + print(f"\nFinal critique report: {critique_file}") + + # Report cost + cost = llm.metrics.accumulated_cost + print(f"\nEXAMPLE_COST: {cost}") + + +if __name__ == "__main__": + run_iterative_refinement() +``` + + + +## Next Steps + +- [Agent Delegation](/sdk/guides/agent-delegation) - Parallel task execution with sub-agents +- [Custom Tools](/sdk/guides/custom-tools) - Create specialized tools for your workflow + +### Exception Handling +Source: https://docs.openhands.dev/sdk/guides/llm-error-handling.md + +The SDK normalizes common provider errors into typed, provider‑agnostic exceptions so your application can handle them consistently across OpenAI, Anthropic, Groq, Google, and others. + +This guide explains when these errors occur and shows recommended handling patterns for both direct LLM usage and higher‑level agent/conversation flows. + +## Why typed exceptions? + +LLM providers format errors differently (status codes, messages, exception classes). The SDK maps those into stable types so client apps don’t depend on provider‑specific details. Typical benefits: + +- One code path to handle auth, rate limits, timeouts, service issues, and bad requests +- Clear behavior when conversation history exceeds the context window +- Backward compatibility when you switch providers or SDK versions + +## Quick start: Using agents and conversations + +Agent-driven conversations are the common entry point. Exceptions from the underlying LLM calls bubble up from `conversation.run()` and `conversation.send_message(...)` when a condenser is not configured. + +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import Agent, Conversation, LLM +from openhands.sdk.llm.exceptions import ( + LLMError, + LLMAuthenticationError, + LLMRateLimitError, + LLMTimeoutError, + LLMServiceUnavailableError, + LLMBadRequestError, + LLMContextWindowExceedError, +) + +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) +agent = Agent(llm=llm, tools=[]) +conversation = Conversation( + agent=agent, + persistence_dir="./.conversations", + workspace=".", +) + +try: + conversation.send_message( + "Continue the long analysis we started earlier…" + ) + conversation.run() + +except LLMContextWindowExceedError: + # Conversation is longer than the model’s context window + # Options: + # 1) Enable a condenser (recommended for long sessions) + # 2) Shorten inputs or reset conversation + print("Hit the context limit. Consider enabling a condenser.") + +except LLMAuthenticationError: + print( + "Invalid or missing API credentials." + "Check your API key or auth setup." + ) + +except LLMRateLimitError: + print("Rate limit exceeded. Back off and retry later.") + +except LLMTimeoutError: + print("Request timed out. Consider increasing timeout or retrying.") + +except LLMServiceUnavailableError: + print("Service unavailable or connectivity issue. Retry with backoff.") + +except LLMBadRequestError: + print("Bad request to provider. Validate inputs and arguments.") + +except LLMError as e: + # Fallback for other SDK LLM errors (parsing/validation, etc.) + print(f"Unhandled LLM error: {e}") +``` + + + +### Avoiding context‑window errors with a condenser + +If a condenser is configured, the SDK emits a condensation request event instead of raising `LLMContextWindowExceedError`. The agent will summarize older history and continue. + +```python icon="python" focus={5-6, 9-14} wrap +from openhands.sdk.context.condenser import LLMSummarizingCondenser + +condenser = LLMSummarizingCondenser( + llm=llm.model_copy(update={"usage_id": "condenser"}), + max_size=10, + keep_first=2, +) + +agent = Agent(llm=llm, tools=[], condenser=condenser) +conversation = Conversation( + agent=agent, + persistence_dir="./.conversations", + workspace=".", +) +``` + + + See the dedicated guide: [Context Condenser](/sdk/guides/context-condenser). + + +## Handling errors with direct LLM calls + +The same exceptions are raised from both `LLM.completion()` and `LLM.responses()` paths, so you can share handlers. + +### Example: Using `.completion()` + +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import LLM +from openhands.sdk.llm import Message, TextContent +from openhands.sdk.llm.exceptions import ( + LLMError, + LLMAuthenticationError, + LLMRateLimitError, + LLMTimeoutError, + LLMServiceUnavailableError, + LLMBadRequestError, + LLMContextWindowExceedError, +) + +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) + +try: + response = llm.completion([ + Message.user([TextContent(text="Summarize our design doc")]) + ]) + print(response.message) + +except LLMContextWindowExceedError: + print("Context window exceeded. Consider enabling a condenser.") +except LLMAuthenticationError: + print("Invalid or missing API credentials.") +except LLMRateLimitError: + print("Rate limit exceeded. Back off and retry later.") +except LLMTimeoutError: + print("Request timed out. Consider increasing timeout or retrying.") +except LLMServiceUnavailableError: + print("Service unavailable or connectivity issue. Retry with backoff.") +except LLMBadRequestError: + print("Bad request to provider. Validate inputs and arguments.") +except LLMError as e: + print(f"Unhandled LLM error: {e}") +``` + +### Example: Using `.responses()` + +```python icon="python" wrap +from pydantic import SecretStr +from openhands.sdk import LLM +from openhands.sdk.llm import Message, TextContent +from openhands.sdk.llm.exceptions import LLMError, LLMContextWindowExceedError + +llm = LLM(model="claude-sonnet-4-20250514", api_key=SecretStr("your-key")) + +try: + resp = llm.responses([ + Message.user( + [TextContent(text="Write a one-line haiku about code.")] + ) + ]) + print(resp.message) +except LLMContextWindowExceedError: + print("Context window exceeded. Consider enabling a condenser.") +except LLMError as e: + print(f"LLM error: {e}") +``` + +## Exception reference + +All exceptions live under `openhands.sdk.llm.exceptions` unless noted. + +| Category | Error | Description | +|--------|------|-------------| +| **Provider / transport (provider-agnostic)** | `LLMContextWindowExceedError` | Conversation exceeds the model’s context window. Without a condenser, thrown for both Chat and Responses paths. | +| | `LLMAuthenticationError` | Invalid or missing credentials (401/403 patterns). | +| | `LLMRateLimitError` | Provider rate limit exceeded. | +| | `LLMTimeoutError` | SDK or lower-level timeout while waiting for the provider. | +| | `LLMServiceUnavailableError` | Temporary connectivity or service outage (e.g., 5xx responses, connection issues). | +| | `LLMBadRequestError` | Client-side request issues (invalid parameters, malformed input). | +| **Response parsing / validation** | `LLMMalformedActionError` | Model returned a malformed action. | +| | `LLMNoActionError` | Model did not return an action when one was expected. | +| | `LLMResponseError` | Could not extract an action from the response. | +| | `FunctionCallConversionError` | Failed converting tool/function call payloads. | +| | `FunctionCallValidationError` | Tool/function call arguments failed validation. | +| | `FunctionCallNotExistsError` | Model referenced an unknown tool or function. | +| | `LLMNoResponseError` | Provider returned an empty or invalid response (rare; observed with some Gemini models). | +| **Cancellation** | `UserCancelledError` | A user explicitly aborted the operation. | +| | `OperationCancelled` | A running operation was cancelled programmatically. | + + + All of the above (except the explicit cancellation types) inherit from `LLMError`, so you can implement a catch‑all + for unexpected SDK LLM errors while still keeping fine‑grained handlers for the most common cases. + + +### LLM Fallback Strategy +Source: https://docs.openhands.dev/sdk/guides/llm-fallback.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +`FallbackStrategy` gives your agent automatic resilience: when the primary LLM fails with a transient error (rate limit, timeout, connection issue), the SDK tries alternate LLMs in order. Fallback is **per-call** — each new request always starts with the primary model. + +## Basic Usage + +Attach a `FallbackStrategy` to your primary `LLM`. The fallback LLMs are referenced by name from an [LLM Profile Store](/sdk/guides/llm-profile-store): + +```python icon="python" wrap focus={16, 17, 21, 22, 23} +from pydantic import SecretStr +from openhands.sdk import LLM, LLMProfileStore +from openhands.sdk.llm import FallbackStrategy + +# Menage persisted LLM profiles +# default store directory: .openhands/profiles +store = LLMProfileStore() + +fallback_llm = LLM( + usage_id="fallback-1", + model="openai/gpt-4o", + api_key=SecretStr("your-openai-key"), +) +store.save("fallback-1", fallback_llm, include_secrets=True) + +# Configure an LLM with a fallback strategy +primary_llm = LLM( + usage_id="agent-primary", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("your-api-key"), + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1"], + ), +) +``` + +## How It Works + +1. The primary LLM handles the request as normal +2. If the call fails with a **transient error**, the `FallbackStrategy` kicks in and tries each fallback LLM in order +3. The first successful fallback response is returned to the caller +4. If all fallbacks fail, the original primary error is raised +5. Token usage and cost from fallback calls are **merged into the primary LLM's metrics**, so you get a unified view of total spend by model + + +Only transient errors trigger fallback. +Non-transient errors (e.g., authentication failures, bad requests) are raised immediately without trying fallbacks. +For a complete list of supported transient errors see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/978dd7d1e3268331b7f8af514e7a7930f98eb8af/openhands-sdk/openhands/sdk/llm/fallback_strategy.py#L29) + + +## Multiple Fallback Levels + +Chain as many fallback LLMs as you need. They are tried in list order: + +```python icon="python" wrap focus={5-7} +llm = LLM( + usage_id="agent-primary", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + ), +) +``` + +If the primary fails, `fallback-1` is tried. If that also fails, `fallback-2` is tried. If all fail, the primary error is raised. + +## Custom Profile Store Directory + +By default, fallback profiles are loaded from `.openhands/profiles`. You can point to a different directory: + +```python icon="python" wrap focus={3} +FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + profile_store_dir="/path/to/my/profiles", +) +``` + +## Metrics + +Fallback costs are automatically merged into the primary LLM's metrics. After a conversation, you can inspect exactly which models were used: + +```python icon="python" wrap +# After running a conversation +metrics = llm.metrics +print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") + +for usage in metrics.token_usages: + print(f" model={usage.model} prompt={usage.prompt_tokens} completion={usage.completion_tokens}") +``` + +Individual `token_usage` records carry the fallback model name, so you can distinguish which LLM produced each usage record. + +## Use Cases + +- **Rate limit handling** — When one provider throttles you, seamlessly switch to another +- **High availability** — Keep your agent running during provider outages +- **Cost optimization** — Try a cheaper model first and fall back to a more capable one on failure +- **Cross-provider redundancy** — Spread risk across Anthropic, OpenAI, Google, etc. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/39_llm_fallback.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/39_llm_fallback.py) + + +```python icon="python" expandable examples/01_standalone_sdk/39_llm_fallback.py +"""Example: Using FallbackStrategy for LLM resilience. + +When the primary LLM fails with a transient error (rate limit, timeout, etc.), +FallbackStrategy automatically tries alternate LLMs in order. Fallback is +per-call: each new request starts with the primary model. Token usage and +cost from fallback calls are merged into the primary LLM's metrics. + +This example: + 1. Saves two fallback LLM profiles to a temporary store. + 2. Configures a primary LLM with a FallbackStrategy pointing at those profiles. + 3. Runs a conversation — if the primary model is unavailable, the agent + transparently falls back to the next available model. +""" + +import os +import tempfile + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation, LLMProfileStore, Tool +from openhands.sdk.llm import FallbackStrategy +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Read configuration from environment +api_key = os.getenv("LLM_API_KEY", None) +assert api_key is not None, "LLM_API_KEY environment variable is not set." +base_url = os.getenv("LLM_BASE_URL") +primary_model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + +# Use a temporary directory so this example doesn't pollute your home folder. +# In real usage you can omit base_dir to use the default (~/.openhands/profiles). +profile_store_dir = tempfile.mkdtemp() +store = LLMProfileStore(base_dir=profile_store_dir) + +fallback_1 = LLM( + usage_id="fallback-1", + model=os.getenv("LLM_FALLBACK_MODEL_1", "openai/gpt-4o"), + api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_1", api_key)), + base_url=os.getenv("LLM_FALLBACK_BASE_URL_1", base_url), +) +store.save("fallback-1", fallback_1, include_secrets=True) + +fallback_2 = LLM( + usage_id="fallback-2", + model=os.getenv("LLM_FALLBACK_MODEL_2", "openai/gpt-4o-mini"), + api_key=SecretStr(os.getenv("LLM_FALLBACK_API_KEY_2", api_key)), + base_url=os.getenv("LLM_FALLBACK_BASE_URL_2", base_url), +) +store.save("fallback-2", fallback_2, include_secrets=True) + +print(f"Saved fallback profiles: {store.list()}") + + +# Configure the primary LLM with a FallbackStrategy +primary_llm = LLM( + usage_id="agent-primary", + model=primary_model, + api_key=SecretStr(api_key), + base_url=base_url, + fallback_strategy=FallbackStrategy( + fallback_llms=["fallback-1", "fallback-2"], + profile_store_dir=profile_store_dir, + ), +) + + +# Run a conversation +agent = Agent( + llm=primary_llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + ], +) + +conversation = Conversation(agent=agent, workspace=os.getcwd()) +conversation.send_message("Write a haiku about resilience into HAIKU.txt.") +conversation.run() + + +# Inspect metrics (includes any fallback usage) +metrics = primary_llm.metrics +print(f"Total cost (including fallbacks): ${metrics.accumulated_cost:.6f}") +print(f"Token usage records: {len(metrics.token_usages)}") +for usage in metrics.token_usages: + print( + f" model={usage.model}" + f" prompt={usage.prompt_tokens}" + f" completion={usage.completion_tokens}" + ) + +print(f"EXAMPLE_COST: {metrics.accumulated_cost}") +``` + + + +## Next Steps + +- **[LLM Profile Store](/sdk/guides/llm-profile-store)** — Save and load LLM configurations as reusable profiles +- **[Model Routing](/sdk/guides/llm-routing)** — Route requests based on content (e.g., multimodal vs text-only) +- **[Exception Handling](/sdk/guides/llm-error-handling)** — Handle LLM errors in your application +- **[LLM Metrics](/sdk/guides/metrics)** — Track token usage and costs across models + +### Image Input +Source: https://docs.openhands.dev/sdk/guides/llm-image-input.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +### Sending Images + +The LLM you use must support image inputs (`llm.vision_is_active()` need to be `True`). + +Pass images along with text in the message content: + +```python focus={14} icon="python" wrap +from openhands.sdk import ImageContent + +IMAGE_URL = "https://github.com/OpenHands/OpenHands/raw/main/docs/static/img/logo.png" +conversation.send_message( + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) +``` + +Works with multimodal LLMs like `GPT-4 Vision` and `Claude` with vision capabilities. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/17_image_input.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/17_image_input.py) + + +You can send images to multimodal LLMs for vision-based tasks like screenshot analysis, image processing, and visual QA: + +```python icon="python" expandable examples/01_standalone_sdk/17_image_input.py +"""OpenHands Agent SDK — Image Input Example. + +This script mirrors the basic setup from ``examples/01_hello_world.py`` but adds +vision support by sending an image to the agent alongside text instructions. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.spec import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.task_tracker import TaskTrackerTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM (vision-capable model) +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="vision-llm", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +assert llm.vision_is_active(), "The selected LLM model does not support vision input." + +cwd = os.getcwd() + +agent = Agent( + llm=llm, + tools=[ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), + Tool(name=TaskTrackerTool.name), + ], +) + +llm_messages = [] # collect raw LLM messages for inspection + + +def conversation_callback(event: Event) -> None: + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +IMAGE_URL = "https://github.com/OpenHands/docs/raw/main/openhands/static/img/logo.png" + +conversation.send_message( + Message( + role="user", + content=[ + TextContent( + text=( + "Study this image and describe the key elements you see. " + "Summarize them in a short paragraph and suggest a catchy caption." + ) + ), + ImageContent(image_urls=[IMAGE_URL]), + ], + ) +) +conversation.run() + +conversation.send_message( + "Great! Please save your description and caption into image_report.md." +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[Hello World](/sdk/guides/hello-world)** - Learn basic conversation patterns +- **[Async Operations](/sdk/guides/convo-async)** - Process multiple images concurrently + +### LLM Profile Store +Source: https://docs.openhands.dev/sdk/guides/llm-profile-store.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The `LLMProfileStore` class provides a centralized mechanism for managing `LLM` configurations. +Define a profile once, reuse it everywhere — across scripts, sessions, and even machines. + +## Benefits +- **Persistence:** Saves model parameters (API keys, temperature, max tokens, ...) to a stable disk format. +- **Reusability:** Import a defined profile into any script or session with a single identifier. +- **Portability:** Simplifies the synchronization of model configurations across different machines or deployment environments. + +## How It Works + + + + ### Create a Store + + The store manages a directory of JSON profile files. By default it uses `~/.openhands/profiles`, + but you can point it anywhere. + + ```python icon="python" focus={3, 4, 6, 7} + from openhands.sdk import LLMProfileStore + + # Default location: ~/.openhands/profiles + store = LLMProfileStore() + + # Or bring your own directory + store = LLMProfileStore(base_dir="./my-profiles") + ``` + + + ### Save a Profile + + Got an LLM configured just right? Save it for later. + + ```python icon="python" focus={11, 12} + from pydantic import SecretStr + from openhands.sdk import LLM, LLMProfileStore + + fast_llm = LLM( + usage_id="fast", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr("sk-..."), + temperature=0.0, + ) + + store = LLMProfileStore() + store.save("fast", fast_llm) + ``` + + + API keys are **excluded** by default for security. Pass `include_secrets=True` to the save method if you wish to + persist them; otherwise, they will be read from the environment at load time. + + + + ### Load a Profile + + Next time you need that LLM, just load it: + + ```python icon="python" + # Same model, ready to go. + llm = store.load("fast") + ``` + + + ### List and Clean Up + + See what you've got, delete what you don't need: + + ```python icon="python" focus={1, 3, 4} + print(store.list()) # ['fast.json', 'creative.json'] + + store.delete("creative") + print(store.list()) # ['fast.json'] + ``` + + + +## Good to Know + +Profile names must be simple filenames (no slashes, no dots at the start). + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/37_llm_profile_store.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/37_llm_profile_store.py) + + +```python icon="python" expandable examples/01_standalone_sdk/37_llm_profile_store.py +"""Example: Using LLMProfileStore to save and reuse LLM configurations. + +LLMProfileStore persists LLM configurations as JSON files, so you can define +a profile once and reload it across sessions without repeating setup code. +""" + +import os +import tempfile + +from pydantic import SecretStr + +from openhands.sdk import LLM, LLMProfileStore + + +# Use a temporary directory so this example doesn't pollute your home folder. +# In real usage you can omit base_dir to use the default (~/.openhands/profiles). +store = LLMProfileStore(base_dir=tempfile.mkdtemp()) + + +# 1. Create two LLM profiles with different usage + +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +base_url = os.getenv("LLM_BASE_URL") +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") + +fast_llm = LLM( + usage_id="fast", + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + temperature=0.0, +) + +creative_llm = LLM( + usage_id="creative", + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + temperature=0.9, +) + +# 2. Save profiles + +# Note that secrets are excluded by default for safety. +store.save("fast", fast_llm) +store.save("creative", creative_llm) + +# To persist the API key as well, pass `include_secrets=True`: +# store.save("fast", fast_llm, include_secrets=True) + +# 3. List available persisted profiles + +print(f"Stored profiles: {store.list()}") + +# 4. Load a profile + +loaded = store.load("fast") +assert isinstance(loaded, LLM) +print( + "Loaded profile. " + f"usage:{loaded.usage_id}, " + f"model: {loaded.model}, " + f"temperature: {loaded.temperature}." +) + +# 5. Delete a profile + +store.delete("creative") +print(f"After deletion: {store.list()}") + +print("EXAMPLE_COST: 0") +``` + + + +## Next Steps + +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLMs in memory at runtime +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[Exception Handling](/sdk/guides/llm-error-handling)** - Handle LLM errors gracefully + +### Reasoning +Source: https://docs.openhands.dev/sdk/guides/llm-reasoning.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +View your agent's internal reasoning process for debugging, transparency, and understanding decision-making. + +This guide demonstrates two provider-specific approaches: +1. **Anthropic Extended Thinking** - Claude's thinking blocks for complex reasoning +2. **OpenAI Reasoning via Responses API** - GPT's reasoning effort parameter + +## Anthropic Extended Thinking + +> A ready-to-run example is available [here](#ready-to-run-example-antrophic)! + +Anthropic's Claude models support extended thinking, which allows you to access the model's internal reasoning process +through thinking blocks. This is useful for understanding how Claude approaches complex problems step-by-step. + +### How It Works + +The key to accessing thinking blocks is to register a callback that checks for `thinking_blocks` in LLM messages: + +```python focus={6-11} icon="python" wrap +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"🧠 Found {len(message.thinking_blocks)} thinking blocks") + for block in message.thinking_blocks: + if isinstance(block, RedactedThinkingBlock): + print(f"Redacted: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f"Thinking: {block.thinking}") + +conversation = Conversation(agent=agent, callbacks=[show_thinking]) +``` + +### Understanding Thinking Blocks + +Claude uses thinking blocks to reason through complex problems step-by-step. There are two types: + +- **`ThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#how-extended-thinking-works)): Contains the full reasoning text from Claude's internal thought process +- **`RedactedThinkingBlock`** ([related anthropic docs](https://docs.claude.com/en/docs/build-with-claude/extended-thinking#thinking-redaction)): Contains redacted or summarized thinking data + +By registering a callback with your conversation, you can intercept and display these thinking blocks in real-time, +giving you insight into how Claude is approaching the problem. + +### Ready-to-run Example Antrophic + + +This example is available on GitHub: [examples/01_standalone_sdk/22_anthropic_thinking.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/22_anthropic_thinking.py) + + +```python icon="python" expandable examples/01_standalone_sdk/22_anthropic_thinking.py +"""Example demonstrating Anthropic's extended thinking feature with thinking blocks.""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + RedactedThinkingBlock, + ThinkingBlock, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM for Anthropic Claude with extended thinking +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Setup agent with bash tool +agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)]) + + +# Callback to display thinking blocks +def show_thinking(event: Event): + if isinstance(event, LLMConvertibleEvent): + message = event.to_llm_message() + if hasattr(message, "thinking_blocks") and message.thinking_blocks: + print(f"\n🧠 Found {len(message.thinking_blocks)} thinking blocks") + for i, block in enumerate(message.thinking_blocks): + if isinstance(block, RedactedThinkingBlock): + print(f" Block {i + 1}: {block.data}") + elif isinstance(block, ThinkingBlock): + print(f" Block {i + 1}: {block.thinking}") + + +conversation = Conversation( + agent=agent, callbacks=[show_thinking], workspace=os.getcwd() +) + +conversation.send_message( + "Calculate compound interest for $10,000 at 5% annually, " + "compounded quarterly for 3 years. Show your work.", +) +conversation.run() + +conversation.send_message( + "Now, write that number to RESULTs.txt.", +) +conversation.run() +print("✅ Done!") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## OpenAI Reasoning via Responses API + +> A ready-to-run example is available [here](#ready-to-run-example-openai)! + +OpenAI's latest models (e.g., `GPT-5`, `GPT-5-Codex`) support a [Responses API](https://platform.openai.com/docs/api-reference/responses) +that provides access to the model's reasoning process. +By setting the `reasoning_effort` parameter, you can control how much reasoning the model performs and access those reasoning traces. + +### How It Works + +Configure the LLM with the `reasoning_effort` parameter to enable reasoning: + +```python focus={5} icon="python" wrap +llm = LLM( + model="openhands/gpt-5-codex", + api_key=SecretStr(api_key), + base_url=base_url, + # Enable reasoning with effort level + reasoning_effort="high", +) +``` + +The `reasoning_effort` parameter can be set to `"none"`, `"low"`, `"medium"`, or `"high"` to control the amount of +reasoning performed by the model. + +Then capture reasoning traces in your callback: + +```python focus={3-4} icon="python" wrap +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + msg = event.to_llm_message() + llm_messages.append(msg) +``` + +### Understanding Reasoning Traces + +The OpenAI Responses API provides reasoning traces that show how the model approached the problem. +These traces are available in the LLM messages and can be inspected to understand the model's decision-making process. +Unlike Anthropic's thinking blocks, OpenAI's reasoning is more tightly integrated with the response generation process. + +### Ready-to-run Example OpenAI + + +This example is available on GitHub: [examples/01_standalone_sdk/23_responses_reasoning.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/23_responses_reasoning.py) + + +```python icon="python" expandable examples/01_standalone_sdk/23_responses_reasoning.py +""" +Example: Responses API path via LiteLLM in a Real Agent Conversation + +- Runs a real Agent/Conversation to verify /responses path works +- Demonstrates rendering of Responses reasoning within normal conversation events +""" + +from __future__ import annotations + +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.llm import LLM +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +assert api_key, "Set LLM_API_KEY or OPENAI_API_KEY in your environment." + +model = "openhands/gpt-5-mini-2025-08-07" # Use a model that supports Responses API +base_url = os.getenv("LLM_BASE_URL") + +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + # Responses-path options + reasoning_effort="high", + # Logging / behavior tweaks + log_completions=False, + usage_id="agent", +) + +print("\n=== Agent Conversation using /responses path ===") +agent = get_default_agent( + llm=llm, + cli_mode=True, # disable browser tools for env simplicity +) + +llm_messages = [] # collect raw LLM-convertible messages for inspection + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=os.getcwd(), +) + +# Keep the tasks short for demo purposes +conversation.send_message("Read the repo and write one fact into FACTS.txt.") +conversation.run() + +conversation.send_message("Now delete FACTS.txt.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + ms = str(message) + print(f"Message {i}: {ms[:200]}{'...' if len(ms) > 200 else ''}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Use Cases + +**Debugging**: Understand why the agent made specific decisions or took certain actions. + +**Transparency**: Show users how the AI arrived at its conclusions. + +**Quality Assurance**: Identify flawed reasoning patterns or logic errors. + +**Learning**: Study how models approach complex problems. + +## Next Steps + +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display reasoning in real-time +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and performance +- **[Custom Tools](/sdk/guides/custom-tools)** - Add specialized capabilities + +### LLM Registry +Source: https://docs.openhands.dev/sdk/guides/llm-registry.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +Use the LLM registry to manage multiple LLM providers and dynamically switch between models. + +## Using the Registry + +You can add LLMs to the registry using the `.add` method and retrieve them later using the `.get()` method. + +```python icon="python" focus={9,10,13} +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# define the registry and add an LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) +... +# retrieve the LLM by its usage ID +llm = llm_registry.get("agent") +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + + +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) + +# Get LLM from registry +llm = llm_registry.get("agent") + +# Tools +cwd = os.getcwd() +tools = [Tool(name=TerminalTool.name)] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message("Please echo 'Hello!'") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") + +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") + +# Demonstrate requesting a completion directly from an LLM +resp = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] +) +# Access the response content via OpenHands LLMResponse +msg = resp.message +texts = [c.text for c in msg.content if isinstance(c, TextContent)] +print(f"Direct completion response: {texts[0] if texts else str(msg)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Next Steps + +- **[LLM Routing](/sdk/guides/llm-routing)** - Automatically route to different models +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs + +### Model Routing +Source: https://docs.openhands.dev/sdk/guides/llm-routing.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +This feature is under active development and more default routers will be available in future releases. + +> A ready-to-run example is available [here](#ready-to-run-example)! + +### Using the built-in MultimodalRouter + +Define the built-in rule-based `MultimodalRouter` that will route text-only requests to a secondary LLM and multimodal requests (with images) to the primary, multimodal-capable LLM: + +```python icon="python" wrap focus={13-16} +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="litellm_proxy/mistral/devstral-small-2507", + base_url="https://llm-proxy.eval.all-hands.dev", + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, +) +``` + +You may define your own router by extending the `Router` class. See the [base class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/router/base.py) for details. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/19_llm_routing.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/19_llm_routing.py) + + +Automatically route requests to different LLMs based on task characteristics to optimize cost and performance: + +```python icon="python" expandable examples/01_standalone_sdk/19_llm_routing.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + ImageContent, + LLMConvertibleEvent, + Message, + TextContent, + get_logger, +) +from openhands.sdk.llm.router import MultimodalRouter +from openhands.tools.preset.default import get_default_tools + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +primary_llm = LLM( + usage_id="agent-primary", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +secondary_llm = LLM( + usage_id="agent-secondary", + model="openhands/devstral-small-2507", + base_url=base_url, + api_key=SecretStr(api_key), +) +multimodal_router = MultimodalRouter( + usage_id="multimodal-router", + llms_for_routing={"primary": primary_llm, "secondary": secondary_llm}, +) + +# Tools +tools = get_default_tools() # Use our default openhands experience + +# Agent +agent = Agent(llm=multimodal_router, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=os.getcwd() +) + +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Hi there, who trained you?"))], + ) +) +conversation.run() + +conversation.send_message( + message=Message( + role="user", + content=[ + ImageContent( + image_urls=["http://images.cocodataset.org/val2017/000000039769.jpg"] + ), + TextContent(text=("What do you see in the image above?")), + ], + ) +) +conversation.run() + +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text=("Who trained you as an LLM?"))], + ) +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + + +## Next Steps + +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Metrics](/sdk/guides/metrics)** - Track token usage and costs + +### LLM Streaming +Source: https://docs.openhands.dev/sdk/guides/llm-streaming.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + +This is currently only supported for the chat completion endpoint. + + +> A ready-to-run example is available [here](#ready-to-run-example)! + + +Enable real-time display of LLM responses as they're generated, token by token. This guide demonstrates how to use +streaming callbacks to process and display tokens as they arrive from the language model. + + +## How It Works + +Streaming allows you to display LLM responses progressively as the model generates them, rather than waiting for the +complete response. This creates a more responsive user experience, especially for long-form content generation. + + + + ### Enable Streaming on LLM + Configure the LLM with streaming enabled: + + ```python focus={6} icon="python" wrap + llm = LLM( + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="stream-demo", + stream=True, # Enable streaming + ) + ``` + + + ### Define Token Callback + Create a callback function that processes streaming chunks as they arrive: + + ```python icon="python" wrap + def on_token(chunk: ModelResponseStream) -> None: + """Process each streaming chunk as it arrives.""" + choices = chunk.choices + for choice in choices: + delta = choice.delta + if delta is not None: + content = getattr(delta, "content", None) + if isinstance(content, str): + sys.stdout.write(content) + sys.stdout.flush() + ``` + + The callback receives a `ModelResponseStream` object containing: + - **`choices`**: List of response choices from the model + - **`delta`**: Incremental content changes for each choice + - **`content`**: The actual text tokens being streamed + + + ### Register Callback with Conversation + + Pass your token callback to the conversation: + + ```python focus={3} icon="python" wrap + conversation = Conversation( + agent=agent, + token_callbacks=[on_token], # Register streaming callback + workspace=os.getcwd(), + ) + ``` + + The `token_callbacks` parameter accepts a list of callbacks, allowing you to register multiple handlers + if needed (e.g., one for display, another for logging). + + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/29_llm_streaming.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/29_llm_streaming.py) + + +```python icon="python" expandable examples/01_standalone_sdk/29_llm_streaming.py +import os +import sys +from typing import Literal + +from pydantic import SecretStr + +from openhands.sdk import ( + Conversation, + get_logger, +) +from openhands.sdk.llm import LLM +from openhands.sdk.llm.streaming import ModelResponseStream +from openhands.tools.preset.default import get_default_agent + + +logger = get_logger(__name__) + + +api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") +if not api_key: + raise RuntimeError("Set LLM_API_KEY or OPENAI_API_KEY in your environment.") + +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key), + base_url=base_url, + usage_id="stream-demo", + stream=True, +) + +agent = get_default_agent(llm=llm, cli_mode=True) + + +# Define streaming states +StreamingState = Literal["thinking", "content", "tool_name", "tool_args"] +# Track state across on_token calls for boundary detection +_current_state: StreamingState | None = None + + +def on_token(chunk: ModelResponseStream) -> None: + """ + Handle all types of streaming tokens including content, + tool calls, and thinking blocks with dynamic boundary detection. + """ + global _current_state + + choices = chunk.choices + for choice in choices: + delta = choice.delta + if delta is not None: + # Handle thinking blocks (reasoning content) + reasoning_content = getattr(delta, "reasoning_content", None) + if isinstance(reasoning_content, str) and reasoning_content: + if _current_state != "thinking": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("THINKING: ") + _current_state = "thinking" + sys.stdout.write(reasoning_content) + sys.stdout.flush() + + # Handle regular content + content = getattr(delta, "content", None) + if isinstance(content, str) and content: + if _current_state != "content": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("CONTENT: ") + _current_state = "content" + sys.stdout.write(content) + sys.stdout.flush() + + # Handle tool calls + tool_calls = getattr(delta, "tool_calls", None) + if tool_calls: + for tool_call in tool_calls: + tool_name = ( + tool_call.function.name if tool_call.function.name else "" + ) + tool_args = ( + tool_call.function.arguments + if tool_call.function.arguments + else "" + ) + if tool_name: + if _current_state != "tool_name": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("TOOL NAME: ") + _current_state = "tool_name" + sys.stdout.write(tool_name) + sys.stdout.flush() + if tool_args: + if _current_state != "tool_args": + if _current_state is not None: + sys.stdout.write("\n") + sys.stdout.write("TOOL ARGS: ") + _current_state = "tool_args" + sys.stdout.write(tool_args) + sys.stdout.flush() + + +conversation = Conversation( + agent=agent, + workspace=os.getcwd(), + token_callbacks=[on_token], +) + +story_prompt = ( + "Tell me a long story about LLM streaming, write it a file, " + "make sure it has multiple paragraphs. " +) +conversation.send_message(story_prompt) +print("Token Streaming:") +print("-" * 100 + "\n") +conversation.run() + +cleanup_prompt = ( + "Thank you. Please delete the streaming story file now that I've read it, " + "then confirm the deletion." +) +conversation.send_message(cleanup_prompt) +print("Token Streaming:") +print("-" * 100 + "\n") +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[LLM Error Handling](/sdk/guides/llm-error-handling)** - Handle streaming errors gracefully +- **[Custom Visualizer](/sdk/guides/convo-custom-visualizer)** - Build custom UI for streaming +- **[Interactive Terminal](/sdk/guides/agent-interactive-terminal)** - Display streams in terminal UI + +### LLM Subscriptions +Source: https://docs.openhands.dev/sdk/guides/llm-subscriptions.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + +OpenAI subscription is the first provider we support. More subscription providers will be added in future releases. + + +> A ready-to-run example is available [here](#ready-to-run-example)! + +Use your existing ChatGPT Plus or Pro subscription to access OpenAI's Codex models without consuming API credits. The SDK handles OAuth authentication, credential caching, and automatic token refresh. + +## How It Works + + + + ### Call subscription_login() + + The `LLM.subscription_login()` class method handles the entire authentication flow: + + ```python icon="python" + from openhands.sdk import LLM + + llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") + ``` + + On first run, this opens your browser for OAuth authentication with OpenAI. After successful login, credentials are cached locally in `~/.openhands/auth/` for future use. + + + ### Use the LLM + + Once authenticated, use the LLM with your agent as usual. The SDK automatically refreshes tokens when they expire. + + + +## Supported Models + +The following models are available via ChatGPT subscription: + +| Model | Description | +|-------|-------------| +| `gpt-5.2-codex` | Latest Codex model (default) | +| `gpt-5.2` | GPT-5.2 base model | +| `gpt-5.1-codex-max` | High-capacity Codex model | +| `gpt-5.1-codex-mini` | Lightweight Codex model | + +## Configuration Options + +### Force Fresh Login + +If your cached credentials become stale or you want to switch accounts: + +```python icon="python" +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", + force_login=True, # Always perform fresh OAuth login +) +``` + +### Disable Browser Auto-Open + +For headless environments or when you prefer to manually open the URL: + +```python icon="python" +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", + open_browser=False, # Prints URL to console instead +) +``` + +### Check Subscription Mode + +Verify that the LLM is using subscription-based authentication: + +```python icon="python" +llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex") +print(f"Using subscription: {llm.is_subscription}") # True +``` + +## Credential Storage + +Credentials are stored securely in `~/.openhands/auth/`. To clear cached credentials and force a fresh login, delete the files in this directory. + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/35_subscription_login.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/35_subscription_login.py) + + +```python icon="python" expandable examples/01_standalone_sdk/35_subscription_login.py +"""Example: Using ChatGPT subscription for Codex models. + +This example demonstrates how to use your ChatGPT Plus/Pro subscription +to access OpenAI's Codex models without consuming API credits. + +The subscription_login() method handles: +- OAuth PKCE authentication flow +- Credential caching (~/.openhands/auth/) +- Automatic token refresh + +Supported models: +- gpt-5.2-codex +- gpt-5.2 +- gpt-5.1-codex-max +- gpt-5.1-codex-mini + +Requirements: +- Active ChatGPT Plus or Pro subscription +- Browser access for initial OAuth login +""" + +import os + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# First time: Opens browser for OAuth login +# Subsequent calls: Reuses cached credentials (auto-refreshes if expired) +llm = LLM.subscription_login( + vendor="openai", + model="gpt-5.2-codex", # or "gpt-5.2", "gpt-5.1-codex-max", "gpt-5.1-codex-mini" +) + +# Alternative: Force a fresh login (useful if credentials are stale) +# llm = LLM.subscription_login(vendor="openai", model="gpt-5.2-codex", force_login=True) + +# Alternative: Disable auto-opening browser (prints URL to console instead) +# llm = LLM.subscription_login( +# vendor="openai", model="gpt-5.2-codex", open_browser=False +# ) + +# Verify subscription mode is active +print(f"Using subscription mode: {llm.is_subscription}") + +# Use the LLM with an agent as usual +agent = Agent( + llm=llm, + tools=[ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), + ], +) + +cwd = os.getcwd() +conversation = Conversation(agent=agent, workspace=cwd) + +conversation.send_message("List the files in the current directory.") +conversation.run() +print("Done!") +``` + + + +## Next Steps + +- **[LLM Registry](/sdk/guides/llm-registry)** - Manage multiple LLM configurations +- **[LLM Streaming](/sdk/guides/llm-streaming)** - Stream responses token-by-token +- **[LLM Reasoning](/sdk/guides/llm-reasoning)** - Access model reasoning traces + +### Model Context Protocol +Source: https://docs.openhands.dev/sdk/guides/mcp.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + + + ***MCP*** (Model Context Protocol) is a protocol for exposing tools and resources to AI agents. + Read more about MCP [here](https://modelcontextprotocol.io/). + + + + +## Basic MCP Usage + +> The ready-to-run basic MCP usage example is available [here](#ready-to-run-basic-mcp-usage-example)! + + + + ### MCP Configuration + Configure MCP servers using a dictionary with server names and connection details following [this configuration format](https://gofastmcp.com/clients/client#configuration-format) + + ```python mcp_config icon="python" wrap focus={3-10} + mcp_config = { + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + }, + "repomix": { + "command": "npx", + "args": ["-y", "repomix@1.4.2", "--mcp"] + }, + } + } + ``` + + + ### Tool Filtering + Use `filter_tools_regex` to control which MCP tools are available to the agent + + ```python filter_tools_regex focus={4-5} icon="python" + agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", + ) + ``` + + + +## MCP with OAuth + +> The ready-to-run MCP with OAuth example is available [here](#ready-to-run-mcp-with-oauth-example)! + +For MCP servers requiring OAuth authentication: +- Configure OAuth-enabled MCP servers by specifying the URL and auth type +- The SDK automatically handles the OAuth flow when first connecting +- When the agent first attempts to use an OAuth-protected MCP server's tools, the SDK initiates the OAuth flow via [FastMCP](https://gofastmcp.com/servers/auth/authentication) +- User will be prompted to authenticate +- Access tokens are securely stored and automatically refreshed by FastMCP as needed + +```python mcp_config focus={5} icon="python" wrap +mcp_config = { + "mcpServers": { + "Notion": { + "url": "https://mcp.notion.com/mcp", + "auth": "oauth" + } + } +} +``` + +## Ready-to-Run Basic MCP Usage Example + + +This example is available on GitHub: [examples/01_standalone_sdk/07_mcp_integration.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/07_mcp_integration.py) + + +Here's an example integrating MCP servers with an agent: + +```python icon="python" expandable examples/01_standalone_sdk/07_mcp_integration.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Add MCP Tools +mcp_config = { + "mcpServers": { + "fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}, + "repomix": {"command": "npx", "args": ["-y", "repomix@1.4.2", "--mcp"]}, + } +} +# Agent +agent = Agent( + llm=llm, + tools=tools, + mcp_config=mcp_config, + # This regex filters out all repomix tools except pack_codebase + filter_tools_regex="^(?!repomix)(.*)|^repomix.*pack_codebase.*$", +) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) +conversation.set_security_analyzer(LLMSecurityAnalyzer()) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Ready-to-Run MCP with OAuth Example + + +This example is available on GitHub: [examples/01_standalone_sdk/08_mcp_with_oauth.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/08_mcp_with_oauth.py) + + +```python icon="python" expandable examples/01_standalone_sdk/08_mcp_with_oauth.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +mcp_config = { + "mcpServers": {"Notion": {"url": "https://mcp.notion.com/mcp", "auth": "oauth"}} +} +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], +) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message("Can you search about OpenHands V1 in my notion workspace?") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Creating native SDK tools +- **[Security Analyzer](/sdk/guides/security)** - Securing tool usage +- **[MCP Package Source Code](https://github.com/OpenHands/software-agent-sdk/tree/main/openhands-sdk/openhands/sdk/mcp)** - MCP integration implementation + +### Metrics Tracking +Source: https://docs.openhands.dev/sdk/guides/metrics.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +## Overview + +The OpenHands SDK provides metrics tracking at two levels: individual LLM metrics and aggregated conversation-level costs: +- You can access detailed metrics from each LLM instance using the `llm.metrics` object to track token usage, costs, and latencies per API call. +- For a complete view, use `conversation.conversation_stats` to get aggregated costs across all LLMs used in a conversation, including the primary agent LLM and any auxiliary LLMs (such as those used by the [context condenser](/sdk/guides/context-condenser)). + +## Getting Metrics from Individual LLMs + +> A ready-to-run example is available [here](#ready-to-run-example-llm-metrics)! + +Track token usage, costs, and performance metrics from LLM interactions: + +### Accessing Individual LLM Metrics + +Access metrics directly from the LLM object after running the conversation: + +```python icon="python" focus={3-4} +conversation.run() + +assert llm.metrics is not None +print(f"Final LLM metrics: {llm.metrics.model_dump()}") +``` + +The `llm.metrics` object is an instance of the [Metrics class](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py), which provides detailed information including: + +- `accumulated_cost` - Total accumulated cost across all API calls +- `accumulated_token_usage` - Aggregated token usage with fields like: + - `prompt_tokens` - Number of input tokens processed + - `completion_tokens` - Number of output tokens generated + - `cache_read_tokens` - Cache hits (if supported by the model) + - `cache_write_tokens` - Cache writes (if supported by the model) + - `reasoning_tokens` - Reasoning tokens (for models that support extended thinking) + - `context_window` - Context window size used +- `costs` - List of individual cost records per API call +- `token_usages` - List of detailed token usage records per API call +- `response_latencies` - List of response latency metrics per API call + + + For more details on the available metrics and methods, refer to the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/llm/utils/metrics.py). + + +### Ready-to-run Example (LLM metrics) + +This example is available on GitHub: [examples/01_standalone_sdk/13_get_llm_metrics.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/13_get_llm_metrics.py) + + +```python icon="python" expandable examples/01_standalone_sdk/13_get_llm_metrics.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Add MCP Tools +mcp_config = {"mcpServers": {"fetch": {"command": "uvx", "args": ["mcp-server-fetch"]}}} + +# Agent +agent = Agent(llm=llm, tools=tools, mcp_config=mcp_config) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +# Conversation +conversation = Conversation( + agent=agent, + callbacks=[conversation_callback], + workspace=cwd, +) + +logger.info("Starting conversation with MCP integration...") +conversation.send_message( + "Read https://github.com/OpenHands/OpenHands and write 3 facts " + "about the project into FACTS.txt." +) +conversation.run() + +conversation.send_message("Great! Now delete that file.") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +assert llm.metrics is not None +print( + f"Conversation finished. Final LLM metrics with details: {llm.metrics.model_dump()}" +) + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Using LLM Registry for Cost Tracking + +> A ready-to-run example is available [here](#ready-to-run-example-llm-registry)! + +The [LLM Registry](/sdk/guides/llm-registry) allows you to maintain a centralized registry of LLM instances, each identified by a unique `usage_id`. This is particularly useful for tracking costs across different LLMs used in your application. + +### How the LLM Registry Works + +Each LLM is created with a unique `usage_id` (e.g., "agent", "condenser") that serves as its identifier in the registry. The registry maintains references to all LLM instances, allowing you to: + +1. **Register LLMs**: Add LLM instances to the registry with `llm_registry.add(llm)` +2. **Retrieve LLMs**: Get LLM instances by their usage ID with `llm_registry.get("usage_id")` +3. **List Usage IDs**: View all registered usage IDs with `llm_registry.list_usage_ids()` +4. **Track Costs Separately**: Each LLM's metrics are tracked independently by its usage ID + +This pattern is essential when using multiple LLMs in your application, such as having a primary agent LLM and a separate LLM for context condensing. + +### Ready-to-run Example (LLM Registry) + +This example is available on GitHub: [examples/01_standalone_sdk/05_use_llm_registry.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/05_use_llm_registry.py) + + + +```python icon="python" expandable examples/01_standalone_sdk/05_use_llm_registry.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + LLMRegistry, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +main_llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Create LLM registry and add the LLM +llm_registry = LLMRegistry() +llm_registry.add(main_llm) + +# Get LLM from registry +llm = llm_registry.get("agent") + +# Tools +cwd = os.getcwd() +tools = [Tool(name=TerminalTool.name)] + +# Agent +agent = Agent(llm=llm, tools=tools) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +conversation.send_message("Please echo 'Hello!'") +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +print("=" * 100) +print(f"LLM Registry usage IDs: {llm_registry.list_usage_ids()}") + +# Demonstrate getting the same LLM instance from registry +same_llm = llm_registry.get("agent") +print(f"Same LLM instance: {llm is same_llm}") + +# Demonstrate requesting a completion directly from an LLM +resp = llm.completion( + messages=[ + Message(role="user", content=[TextContent(text="Say hello in one word.")]) + ] +) +# Access the response content via OpenHands LLMResponse +msg = resp.message +texts = [c.text for c in msg.content if isinstance(c, TextContent)] +print(f"Direct completion response: {texts[0] if texts else str(msg)}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + +### Getting Aggregated Conversation Costs + + +This example is available on GitHub: [examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py) + + +Beyond individual LLM metrics, you can access aggregated costs for an entire conversation using `conversation.conversation_stats`. This is particularly useful when your conversation involves multiple LLMs, such as the main agent LLM and auxiliary LLMs for tasks like context condensing. + +```python icon="python" expandable examples/01_standalone_sdk/21_generate_extraneous_conversation_costs.py +import os + +from pydantic import SecretStr +from tabulate import tabulate + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + LLMSummarizingCondenser, + Message, + TextContent, + get_logger, +) +from openhands.sdk.tool.spec import Tool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM using LLMRegistry +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") + +# Create LLM instance +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +llm_condenser = LLM( + model=model, + base_url=base_url, + api_key=SecretStr(api_key), + usage_id="condenser", +) + +# Tools +condenser = LLMSummarizingCondenser(llm=llm_condenser, max_size=10, keep_first=2) + +cwd = os.getcwd() +agent = Agent( + llm=llm, + tools=[ + Tool( + name=TerminalTool.name, + ), + ], + condenser=condenser, +) + +conversation = Conversation(agent=agent, workspace=cwd) +conversation.send_message( + message=Message( + role="user", + content=[TextContent(text="Please echo 'Hello!'")], + ) +) +conversation.run() + +# Demonstrate extraneous costs part of the conversation +second_llm = LLM( + usage_id="demo-secondary", + model=model, + base_url=os.getenv("LLM_BASE_URL"), + api_key=SecretStr(api_key), +) +conversation.llm_registry.add(second_llm) +completion_response = second_llm.completion( + messages=[Message(role="user", content=[TextContent(text="echo 'More spend!'")])] +) + +# Access total spend +spend = conversation.conversation_stats.get_combined_metrics() +print("\n=== Total Spend for Conversation ===\n") +print(f"Accumulated Cost: ${spend.accumulated_cost:.6f}") +if spend.accumulated_token_usage: + print(f"Prompt Tokens: {spend.accumulated_token_usage.prompt_tokens}") + print(f"Completion Tokens: {spend.accumulated_token_usage.completion_tokens}") + print(f"Cache Read Tokens: {spend.accumulated_token_usage.cache_read_tokens}") + print(f"Cache Write Tokens: {spend.accumulated_token_usage.cache_write_tokens}") + +spend_per_usage = conversation.conversation_stats.usage_to_metrics +print("\n=== Spend Breakdown by Usage ID ===\n") +rows = [] +for usage_id, metrics in spend_per_usage.items(): + rows.append( + [ + usage_id, + f"${metrics.accumulated_cost:.6f}", + metrics.accumulated_token_usage.prompt_tokens + if metrics.accumulated_token_usage + else 0, + metrics.accumulated_token_usage.completion_tokens + if metrics.accumulated_token_usage + else 0, + ] + ) + +print( + tabulate( + rows, + headers=["Usage ID", "Cost", "Prompt Tokens", "Completion Tokens"], + tablefmt="github", + ) +) + +# Report cost +cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Understanding Conversation Stats + +The `conversation.conversation_stats` object provides cost tracking across all LLMs used in a conversation. It is an instance of the [ConversationStats class](https://github.com/OpenHands/software-agent-sdk/blob/32e1e75f7e962033a8fd6773a672612e07bc8c0d/openhands-sdk/openhands/sdk/conversation/conversation_stats.py), which provides the following key features: + +#### Key Methods and Properties + +- **`usage_to_metrics`**: A dictionary mapping usage IDs to their respective `Metrics` objects. This allows you to track costs separately for each LLM used in the conversation. + +- **`get_combined_metrics()`**: Returns a single `Metrics` object that aggregates costs across all LLMs used in the conversation. This gives you the total cost of the entire conversation. + +- **`get_metrics_for_usage(usage_id: str)`**: Retrieves the `Metrics` object for a specific usage ID, allowing you to inspect costs for individual LLMs. + +```python icon="python" focus={2, 6, 10} +# Get combined metrics for the entire conversation +total_metrics = conversation.conversation_stats.get_combined_metrics() +print(f"Total cost: ${total_metrics.accumulated_cost:.6f}") + +# Get metrics for a specific LLM by usage ID +agent_metrics = conversation.conversation_stats.get_metrics_for_usage("agent") +print(f"Agent cost: ${agent_metrics.accumulated_cost:.6f}") + +# Access all usage IDs and their metrics +for usage_id, metrics in conversation.conversation_stats.usage_to_metrics.items(): + print(f"{usage_id}: ${metrics.accumulated_cost:.6f}") +``` + +## Next Steps + +- **[Context Condenser](/sdk/guides/context-condenser)** - Learn about context management and how it uses separate LLMs +- **[LLM Routing](/sdk/guides/llm-routing)** - Optimize costs with smart routing between different models + +### Observability & Tracing +Source: https://docs.openhands.dev/sdk/guides/observability.md + +> A full setup example is available [here](#example:-full-setup)! + +## Overview + +The OpenHands SDK provides built-in OpenTelemetry (OTEL) tracing support, allowing you to monitor and debug your agent's execution in real-time. You can send traces to any OTLP-compatible observability platform including: + +- **[Laminar](https://laminar.sh/)** - AI-focused observability with browser session replay support +- **[Honeycomb](https://www.honeycomb.io/)** - High-performance distributed tracing +- **Any OTLP-compatible backend** - Including Jaeger, Datadog, New Relic, and more + +The SDK automatically traces: +- Agent execution steps +- Tool calls and executions +- LLM API calls (via LiteLLM integration) +- Browser automation sessions (when using browser-use) +- Conversation lifecycle events + +## Quick Start + +Tracing is automatically enabled when you set the appropriate environment variables. The SDK detects the configuration on startup and initializes tracing without requiring code changes. + +### Using Laminar + +[Laminar](https://laminar.sh/) provides specialized AI observability features including browser session replays when using browser-use tools: + +```bash icon="terminal" wrap +# Set your Laminar project API key +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +``` + +That's it! Run your agent code normally and traces will be sent to Laminar automatically. + +### Using Honeycomb or Other OTLP Backends + +For Honeycomb, Jaeger, or any other OTLP-compatible backend: + +```bash icon="terminal" wrap +# Required: Set the OTLP endpoint +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" + +# Required: Set authentication headers (format: comma-separated key=value pairs, URL-encoded) +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=your-api-key" + +# Recommended: Explicitly set the protocol (most OTLP backends require HTTP) +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" # use "grpc" only if your backend supports it +``` + +### Alternative Configuration Methods + +You can also use these alternative environment variable formats: + +```bash icon="terminal" wrap +# Short form for endpoint +export OTEL_ENDPOINT="http://localhost:4317" + +# Alternative header format +export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer%20" + +# Alternative protocol specification +export OTEL_EXPORTER="otlp_http" # or "otlp_grpc" +``` + +## How It Works + +The OpenHands SDK uses the [Laminar SDK](https://docs.lmnr.ai/) as its OpenTelemetry instrumentation layer. When you set the environment variables, the SDK: + +1. **Detects Configuration**: Checks for OTEL environment variables on startup +2. **Initializes Tracing**: Configures OpenTelemetry with the appropriate exporter +3. **Instruments Code**: Automatically wraps key functions with tracing decorators +4. **Captures Context**: Associates traces with conversation IDs for session grouping +5. **Exports Spans**: Sends trace data to your configured backend + +### What Gets Traced + +The SDK automatically instruments these components: + +- **`agent.step`** - Each iteration of the agent's execution loop +- **Tool Executions** - Individual tool calls with input/output capture +- **LLM Calls** - API requests to language models via LiteLLM +- **Conversation Lifecycle** - Message sending, conversation runs, and title generation +- **Browser Sessions** - When using browser-use, captures session replays (Laminar only) + +### Trace Hierarchy + +Traces are organized hierarchically: + + + + + + + + + + + + + + + +Each conversation gets its own session ID (the conversation UUID), allowing you to group all traces from a single +conversation together in your observability platform. + +Note that in `tool.execute` the tool calls are traced, e.g., `bash`, `file_editor`. + +## Configuration Reference + +### Environment Variables + +The SDK checks for these environment variables (in order of precedence): + +| Variable | Description | Example | +|----------|-------------|---------| +| `LMNR_PROJECT_API_KEY` | Laminar project API key | `your-laminar-api-key` | +| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | Full OTLP traces endpoint URL | `https://api.honeycomb.io:443/v1/traces` | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | Base OTLP endpoint (traces path appended) | `http://localhost:4317` | +| `OTEL_ENDPOINT` | Short form endpoint | `http://localhost:4317` | +| `OTEL_EXPORTER_OTLP_TRACES_HEADERS` | Authentication headers for traces | `x-honeycomb-team=YOUR_API_KEY` | +| `OTEL_EXPORTER_OTLP_HEADERS` | General authentication headers | `Authorization=Bearer%20TOKEN` | +| `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` | Protocol for traces endpoint | `http/protobuf`, `grpc` | +| `OTEL_EXPORTER` | Short form protocol | `otlp_http`, `otlp_grpc` | + +### Header Format + +Headers should be comma-separated `key=value` pairs with URL encoding for special characters: + +```bash icon="terminal" wrap +# Single header +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=abc123" + +# Multiple headers +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20abc123,X-Custom-Header=value" +``` + +### Protocol Options + +The SDK supports both HTTP and gRPC protocols: + +- **`http/protobuf`** or **`otlp_http`** - HTTP with protobuf encoding (recommended for most backends) +- **`grpc`** or **`otlp_grpc`** - gRPC with protobuf encoding (use only if your backend supports gRPC) + +## Platform-Specific Configuration + +### Laminar Setup + +1. Sign up at [laminar.sh](https://laminar.sh/) +2. Create a project and copy your API key +3. Set the environment variable: + +```bash icon="terminal" wrap +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +``` + +**Browser Session Replay**: When using Laminar with browser-use tools, session replays are automatically captured, allowing you to see exactly what the browser automation did. + +### Honeycomb Setup + +1. Sign up at [honeycomb.io](https://www.honeycomb.io/) +2. Get your API key from the account settings +3. Configure the environment: + +```bash icon="terminal" wrap +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.honeycomb.io:443/v1/traces" +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="x-honeycomb-team=YOUR_API_KEY" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" +``` + +### Jaeger Setup + +For local development with Jaeger: + +```bash icon="terminal" wrap +# Start Jaeger all-in-one container +docker run -d --name jaeger \ + -p 4317:4317 \ + -p 16686:16686 \ + jaegertracing/all-in-one:latest + +# Configure SDK +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="http://localhost:4317" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="grpc" +``` + +Access the Jaeger UI at http://localhost:16686 + +### Generic OTLP Collector + +For other backends, use their OTLP endpoint: + +```bash icon="terminal" wrap +export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://your-otlp-collector:4317/v1/traces" +export OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=Bearer%20YOUR_TOKEN" +export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" +``` + +## Advanced Usage + +### Disabling Observability + +To disable tracing, simply unset all OTEL environment variables: + +```bash icon="terminal" wrap +unset LMNR_PROJECT_API_KEY +unset OTEL_EXPORTER_OTLP_TRACES_ENDPOINT +unset OTEL_EXPORTER_OTLP_ENDPOINT +unset OTEL_ENDPOINT +``` + +The SDK will automatically skip all tracing instrumentation with minimal overhead. + +### Custom Span Attributes + +The SDK automatically adds these attributes to spans: + +- **`conversation_id`** - UUID of the conversation +- **`tool_name`** - Name of the tool being executed +- **`action.kind`** - Type of action being performed +- **`session_id`** - Groups all traces from one conversation + +### Debugging Tracing Issues + +If traces aren't appearing in your observability platform: + +1. **Verify Environment Variables**: + ```python icon="python" wrap + import os + + otel_endpoint = os.getenv('OTEL_EXPORTER_OTLP_TRACES_ENDPOINT') + otel_headers = os.getenv('OTEL_EXPORTER_OTLP_TRACES_HEADERS') + + print(f"OTEL Endpoint: {otel_endpoint}") + print(f"OTEL Headers: {otel_headers}") + ``` + +2. **Check SDK Logs**: The SDK logs observability initialization at debug level: + ```python icon="python" wrap + import logging + + logging.basicConfig(level=logging.DEBUG) + ``` + +3. **Test Connectivity**: Ensure your application can reach the OTLP endpoint: + ```bash icon="terminal" wrap + curl -v https://api.honeycomb.io:443/v1/traces + ``` + +4. **Validate Headers**: Check that authentication headers are properly URL-encoded + +## Troubleshooting + +### Traces Not Appearing + +**Problem**: No traces showing up in observability platform + +**Solutions**: +- Verify environment variables are set correctly +- Check network connectivity to OTLP endpoint +- Ensure authentication headers are valid +- Look for SDK initialization logs at debug level + +### High Trace Volume + +**Problem**: Too many spans being generated + +**Solutions**: +- Configure sampling at the collector level +- For Laminar with non-browser tools, browser instrumentation is automatically disabled +- Use backend-specific filtering rules + +### Performance Impact + +**Problem**: Concerned about tracing overhead + +**Solutions**: +- Tracing has minimal overhead when properly configured +- Disable tracing in development by unsetting environment variables +- Use asynchronous exporters (default in most OTLP configurations) + +## Example: Full Setup + + +This example is available on GitHub: [examples/01_standalone_sdk/27_observability_laminar.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/27_observability_laminar.py) + + +```python icon="python" expandable examples/01_standalone_sdk/27_observability_laminar.py +""" +Observability & Laminar example + +This example demonstrates enabling OpenTelemetry tracing with Laminar in the +OpenHands SDK. Set LMNR_PROJECT_API_KEY and run the script to see traces. +""" + +import os + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation, Tool +from openhands.tools.terminal import TerminalTool + + +# Tip: Set LMNR_PROJECT_API_KEY in your environment before running, e.g.: +# export LMNR_PROJECT_API_KEY="your-laminar-api-key" +# For non-Laminar OTLP backends, set OTEL_* variables instead. + +# Configure LLM and Agent +api_key = os.getenv("LLM_API_KEY") +model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + model=model, + api_key=SecretStr(api_key) if api_key else None, + base_url=base_url, + usage_id="agent", +) + +agent = Agent( + llm=llm, + tools=[Tool(name=TerminalTool.name)], +) + +# Create conversation and run a simple task +conversation = Conversation(agent=agent, workspace=".") +conversation.send_message("List the files in the current directory and print them.") +conversation.run() +print( + "All done! Check your Laminar dashboard for traces " + "(session is the conversation UUID)." +) +``` + +```bash Running the Example +export LMNR_PROJECT_API_KEY="your-laminar-api-key" +cd software-agent-sdk +uv run python examples/01_standalone_sdk/27_observability_laminar.py +``` + +## Next Steps + +- **[Metrics Tracking](/sdk/guides/metrics)** - Monitor token usage and costs alongside traces +- **[LLM Registry](/sdk/guides/llm-registry)** - Track multiple LLMs used in your application +- **[Security](/sdk/guides/security)** - Add security validation to your traced agent executions + +### Plugins +Source: https://docs.openhands.dev/sdk/guides/plugins.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +Plugins provide a way to package and distribute multiple agent components together. A single plugin can include: + +- **Skills**: Specialized knowledge and workflows +- **Hooks**: Event handlers for tool lifecycle +- **MCP Config**: External tool server configurations +- **Agents**: Specialized agent definitions +- **Commands**: Slash commands + +The plugin format is compatible with the [Claude Code plugin structure](https://github.com/anthropics/claude-code/tree/main/plugins). + +## Plugin Structure + + +See the [example_plugins directory](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/05_skills_and_plugins/02_loading_plugins/example_plugins) for a complete working plugin structure. + + +A plugin follows this directory structure: + + + + + + + + + + + + + + + + + + + + + + + + + +Note that the plugin metadata, i.e., `plugin-name/.plugin/plugin.json`, is required. + +### Plugin Manifest + +The manifest file `plugin-name/.plugin/plugin.json` defines plugin metadata: + +```json icon="file-code" wrap +{ + "name": "code-quality", + "version": "1.0.0", + "description": "Code quality tools and workflows", + "author": "openhands", + "license": "MIT", + "repository": "https://github.com/example/code-quality-plugin" +} +``` + +### Skills + +Skills are defined in markdown files with YAML frontmatter: + +```markdown icon="file-code" +--- +name: python-linting +description: Instructions for linting Python code +trigger: + type: keyword + keywords: + - lint + - linting + - code quality +--- + +# Python Linting Skill + +Run ruff to check for issues: + +\`\`\`bash +ruff check . +\`\`\` +``` + +### Hooks + +Hooks are defined in `hooks/hooks.json`: + +```json icon="file-code" wrap +{ + "hooks": { + "PostToolUse": [ + { + "matcher": "file_editor", + "hooks": [ + { + "type": "command", + "command": "echo 'File edited: $OPENHANDS_TOOL_NAME'", + "timeout": 5 + } + ] + } + ] + } +} +``` + +### MCP Configuration + +MCP servers are configured in `.mcp.json`: + +```json wrap icon="file-code" +{ + "mcpServers": { + "fetch": { + "command": "uvx", + "args": ["mcp-server-fetch"] + } + } +} +``` + +## Using Plugin Components + +> The ready-to-run example is available [here](#ready-to-run-example)! + +Brief explanation on how to use a plugin with an agent. + + + + ### Loading a Plugin + First, load the desired plugins. + + ```python icon="python" + from openhands.sdk.plugin import Plugin + + # Load a single plugin + plugin = Plugin.load("/path/to/plugin") + + # Load all plugins from a directory + plugins = Plugin.load_all("/path/to/plugins") + ``` + + + ### Accessing Components + You can access the different plugin components to see which ones are available. + + ```python icon="python" + # Skills + for skill in plugin.skills: + print(f"Skill: {skill.name}") + + # Hooks configuration + if plugin.hooks: + print(f"Hooks configured: {plugin.hooks}") + + # MCP servers + if plugin.mcp_config: + servers = plugin.mcp_config.get("mcpServers", {}) + print(f"MCP servers: {list(servers.keys())}") + ``` + + + ### Using with an Agent + You can now feed your agent with your preferred plugin. + + ```python focus={3,10,17} icon="python" + # Create agent context with plugin skills + agent_context = AgentContext( + skills=plugin.skills, + ) + + # Create agent with plugin MCP config + agent = Agent( + llm=llm, + tools=tools, + mcp_config=plugin.mcp_config or {}, + agent_context=agent_context, + ) + + # Create conversation with plugin hooks + conversation = Conversation( + agent=agent, + hook_config=plugin.hooks, + ) + ``` + + + +## Ready-to-run Example + + +This example is available on GitHub: [examples/05_skills_and_plugins/02_loading_plugins/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/02_loading_plugins/main.py) + + +```python icon="python" expandable examples/05_skills_and_plugins/02_loading_plugins/main.py +"""Example: Loading Plugins via Conversation + +Demonstrates the recommended way to load plugins using the `plugins` parameter +on Conversation. Plugins bundle skills, hooks, and MCP config together. + +For full documentation, see: https://docs.all-hands.dev/sdk/guides/plugins +""" + +import os +import sys +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, Conversation +from openhands.sdk.plugin import PluginSource +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Locate example plugin directory +script_dir = Path(__file__).parent +plugin_path = script_dir / "example_plugins" / "code-quality" + +# Define plugins to load +# Supported sources: local path, "github:owner/repo", or git URL +# Optional: ref (branch/tag/commit), repo_path (for monorepos) +plugins = [ + PluginSource(source=str(plugin_path)), + # PluginSource(source="github:org/security-plugin", ref="v2.0.0"), + # PluginSource(source="github:org/monorepo", repo_path="plugins/logging"), +] + +# Check for API key +api_key = os.getenv("LLM_API_KEY") +if not api_key: + print("Set LLM_API_KEY to run this example") + print("EXAMPLE_COST: 0") + sys.exit(0) + +# Configure LLM and Agent +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +llm = LLM( + usage_id="plugin-demo", + model=model, + api_key=SecretStr(api_key), + base_url=os.getenv("LLM_BASE_URL"), +) +agent = Agent( + llm=llm, tools=[Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name)] +) + +# Create conversation with plugins - skills, MCP config, and hooks are merged +# Note: Plugins are loaded lazily on first send_message() or run() call +with tempfile.TemporaryDirectory() as tmpdir: + conversation = Conversation( + agent=agent, + workspace=tmpdir, + plugins=plugins, + ) + + # Test: The "lint" keyword triggers the python-linting skill + # This first send_message() call triggers lazy plugin loading + conversation.send_message("How do I lint Python code? Brief answer please.") + + # Verify skills were loaded from the plugin (after lazy loading) + skills = ( + conversation.agent.agent_context.skills + if conversation.agent.agent_context + else [] + ) + print(f"Loaded {len(skills)} skill(s) from plugins") + + conversation.run() + + print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") +``` + + + + +## Next Steps + +- **[Skills](/sdk/guides/skill)** - Learn more about skills and triggers +- **[Hooks](/sdk/guides/hooks)** - Understand hook event types +- **[MCP Integration](/sdk/guides/mcp)** - Configure external tool servers + +### Secret Registry +Source: https://docs.openhands.dev/sdk/guides/secrets.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +> A ready-to-run example is available [here](#ready-to-run-example)! + +The Secret Registry provides a secure way to handle sensitive data in your agent's workspace. +It automatically detects secret references in bash commands, injects them as environment variables when needed, +and masks secret values in command outputs to prevent accidental exposure. + +### Injecting Secrets + +Use the `update_secrets()` method to add secrets to your conversation. + + +Secrets can be provided as static strings or as callable functions that dynamically retrieve values, enabling integration with external secret stores and credential management systems: + +```python focus={4,11} icon="python" wrap +from openhands.sdk.conversation.secret_source import SecretSource + +# Static secret +conversation.update_secrets({"SECRET_TOKEN": "my-secret-token-value"}) + +# Dynamic secret using SecretSource +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" + +conversation.update_secrets({"SECRET_FUNCTION_TOKEN": MySecretSource()}) +``` + +## Ready-to-run Example + + +This example is available on GitHub: [examples/01_standalone_sdk/12_custom_secrets.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/12_custom_secrets.py) + + +```python icon="python" expandable examples/01_standalone_sdk/12_custom_secrets.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, +) +from openhands.sdk.secret import SecretSource +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) +conversation = Conversation(agent) + + +class MySecretSource(SecretSource): + def get_value(self) -> str: + return "callable-based-secret" + + +conversation.update_secrets( + {"SECRET_TOKEN": "my-secret-token-value", "SECRET_FUNCTION_TOKEN": MySecretSource()} +) + +conversation.send_message("just echo $SECRET_TOKEN") + +conversation.run() + +conversation.send_message("just echo $SECRET_FUNCTION_TOKEN") + +conversation.run() + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +## Next Steps + +- **[MCP Integration](/sdk/guides/mcp)** - Connect to MCP +- **[Security Analyzer](/sdk/guides/security)** - Add security validation + +### Security & Action Confirmation +Source: https://docs.openhands.dev/sdk/guides/security.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +Agent actions can be controlled through two complementary mechanisms: **confirmation policy** that determine when user +approval is required, and **security analyzer** that evaluates action risk levels. Together, they provide flexible control over agent behavior while maintaining safety. + +## Confirmation Policy +> A ready-to-run example is available [here](#ready-to-run-example-confirmation)! + +Confirmation policy controls whether actions require user approval before execution. They provide a simple way to ensure safe agent operation by requiring explicit permission for actions. + +### Setting Confirmation Policy + +Set the confirmation policy on your conversation: + +```python icon="python" focus={4} +from openhands.sdk.security.confirmation_policy import AlwaysConfirm + +conversation = Conversation(agent=agent, workspace=".") +conversation.set_confirmation_policy(AlwaysConfirm()) +``` + +Available policies: +- **`AlwaysConfirm()`** - Require approval for all actions +- **`NeverConfirm()`** - Execute all actions without approval +- **`ConfirmRisky()`** - Only require approval for risky actions (requires security analyzer) + +### Custom Confirmation Handler + +Implement your approval logic by checking conversation status: + +```python icon="python" focus={2-3,5} +while conversation.state.agent_status != AgentExecutionStatus.FINISHED: + if conversation.state.agent_status == AgentExecutionStatus.WAITING_FOR_CONFIRMATION: + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not confirm_in_console(pending): + conversation.reject_pending_actions("User rejected") + continue + conversation.run() +``` + +### Rejecting Actions + +Provide feedback when rejecting to help the agent try a different approach: + +```python icon="python" focus={2-5} +if not user_approved: + conversation.reject_pending_actions( + "User rejected because actions seem too risky." + "Please try a safer approach." + ) +``` + +### Ready-to-run Example Confirmation + + +Full confirmation example: [examples/01_standalone_sdk/04_confirmation_mode_example.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/04_confirmation_mode_example.py) + + +Require user approval before executing agent actions: + +```python icon="python" expandable examples/01_standalone_sdk/04_confirmation_mode_example.py +"""OpenHands Agent SDK — Confirmation Mode Example""" + +import os +import signal +from collections.abc import Callable + +from pydantic import SecretStr + +from openhands.sdk import LLM, BaseConversation, Conversation +from openhands.sdk.conversation.state import ( + ConversationExecutionStatus, + ConversationState, +) +from openhands.sdk.security.confirmation_policy import AlwaysConfirm, NeverConfirm +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.tools.preset.default import get_default_agent + + +# Make ^C a clean exit instead of a stack trace +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + + +def _print_action_preview(pending_actions) -> None: + print(f"\n🔍 Agent created {len(pending_actions)} action(s) awaiting confirmation:") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Default to 'no' on EOF/KeyboardInterrupt (matches original behavior). + """ + _print_action_preview(pending_actions) + while True: + try: + ans = ( + input("\nDo you want to execute these actions? (yes/no): ") + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\n❌ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("✅ Approved — executing actions…") + return True + if ans in ("no", "n"): + print("❌ Rejected — skipping actions…") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished(conversation: BaseConversation, confirmer: Callable) -> None: + """ + Drive the conversation until FINISHED. + If WAITING_FOR_CONFIRMATION, ask the confirmer; + on reject, call reject_pending_actions(). + Preserves original error if agent waits but no actions exist. + """ + while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + if ( + conversation.state.execution_status + == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "⚠️ Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected the actions") + # Let the agent produce a new step or finish + continue + + print("▶️ Running conversation.run()…") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +agent = get_default_agent(llm=llm) +conversation = Conversation(agent=agent, workspace=os.getcwd()) + +# Conditionally add security analyzer based on environment variable +add_security_analyzer = bool(os.getenv("ADD_SECURITY_ANALYZER", "").strip()) +if add_security_analyzer: + print("Agent security analyzer added.") + conversation.set_security_analyzer(LLMSecurityAnalyzer()) + +# 1) Confirmation mode ON +conversation.set_confirmation_policy(AlwaysConfirm()) +print("\n1) Command that will likely create actions…") +conversation.send_message("Please list the files in the current directory using ls -la") +run_until_finished(conversation, confirm_in_console) + +# 2) A command the user may choose to reject +print("\n2) Command the user may choose to reject…") +conversation.send_message("Please create a file called 'dangerous_file.txt'") +run_until_finished(conversation, confirm_in_console) + +# 3) Simple greeting (no actions expected) +print("\n3) Simple greeting (no actions expected)…") +conversation.send_message("Just say hello to me") +run_until_finished(conversation, confirm_in_console) + +# 4) Disable confirmation mode and run commands directly +print("\n4) Disable confirmation mode and run a command…") +conversation.set_confirmation_policy(NeverConfirm()) +conversation.send_message("Please echo 'Hello from confirmation mode example!'") +conversation.run() + +conversation.send_message( + "Please delete any file that was created during this conversation." +) +conversation.run() + +print("\n=== Example Complete ===") +print("Key points:") +print( + "- conversation.run() creates actions; confirmation mode " + "sets execution_status=WAITING_FOR_CONFIRMATION" +) +print("- User confirmation is handled via a single reusable function") +print("- Rejection uses conversation.reject_pending_actions() and the loop continues") +print("- Simple responses work normally without actions") +print("- Confirmation policy is toggled with conversation.set_confirmation_policy()") +``` + + + +--- + +## Security Analyzer + +Security analyzer evaluates the risk of agent actions before execution, helping protect against potentially dangerous operations. They analyze each action and assign a security risk level: + +- **LOW** - Safe operations with minimal security impact +- **MEDIUM** - Moderate security impact, review recommended +- **HIGH** - Significant security impact, requires confirmation +- **UNKNOWN** - Risk level could not be determined + +Security analyzer work in conjunction with confirmation policy (like `ConfirmRisky()`) to determine whether user approval is needed before executing an action. This provides an additional layer of safety for autonomous agent operations. + +### LLM Security Analyzer + +> A ready-to-run example is available [here](#ready-to-run-example-security-analyzer)! + +The **LLMSecurityAnalyzer** is the default implementation provided in the agent-sdk. It leverages the LLM's understanding of action context to provide lightweight security analysis. The LLM can annotate actions with security risk levels during generation, which the analyzer then uses to make security decisions. + +#### Security Analyzer Configuration + +Create an LLM-based security analyzer to review actions before execution: + +```python icon="python" focus={9} +from openhands.sdk import LLM +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) +security_analyzer = LLMSecurityAnalyzer(llm=security_llm) +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) +``` + +The security analyzer: +- Reviews each action before execution +- Flags potentially dangerous operations +- Can be configured with custom security policy +- Uses a separate LLM to avoid conflicts with the main agent + +#### Ready-to-run Example Security Analyzer + + +Full security analyzer example: [examples/01_standalone_sdk/16_llm_security_analyzer.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/16_llm_security_analyzer.py) + + +Automatically analyze agent actions for security risks before execution: + +```python icon="python" expandable examples/01_standalone_sdk/16_llm_security_analyzer.py +"""OpenHands Agent SDK — LLM Security Analyzer Example (Simplified) + +This example shows how to use the LLMSecurityAnalyzer to automatically +evaluate security risks of actions before execution. +""" + +import os +import signal +from collections.abc import Callable + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, BaseConversation, Conversation +from openhands.sdk.conversation.state import ( + ConversationExecutionStatus, + ConversationState, +) +from openhands.sdk.security.confirmation_policy import ConfirmRisky +from openhands.sdk.security.llm_analyzer import LLMSecurityAnalyzer +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Clean ^C exit: no stack trace noise +signal.signal(signal.SIGINT, lambda *_: (_ for _ in ()).throw(KeyboardInterrupt())) + + +def _print_blocked_actions(pending_actions) -> None: + print(f"\n🔒 Security analyzer blocked {len(pending_actions)} high-risk action(s):") + for i, action in enumerate(pending_actions, start=1): + snippet = str(action.action)[:100].replace("\n", " ") + print(f" {i}. {action.tool_name}: {snippet}...") + + +def confirm_high_risk_in_console(pending_actions) -> bool: + """ + Return True to approve, False to reject. + Matches original behavior: default to 'no' on EOF/KeyboardInterrupt. + """ + _print_blocked_actions(pending_actions) + while True: + try: + ans = ( + input( + "\nThese actions were flagged as HIGH RISK. " + "Do you want to execute them anyway? (yes/no): " + ) + .strip() + .lower() + ) + except (EOFError, KeyboardInterrupt): + print("\n❌ No input received; rejecting by default.") + return False + + if ans in ("yes", "y"): + print("✅ Approved — executing high-risk actions...") + return True + if ans in ("no", "n"): + print("❌ Rejected — skipping high-risk actions...") + return False + print("Please enter 'yes' or 'no'.") + + +def run_until_finished_with_security( + conversation: BaseConversation, confirmer: Callable[[list], bool] +) -> None: + """ + Drive the conversation until FINISHED. + - If WAITING_FOR_CONFIRMATION: ask the confirmer. + * On approve: set execution_status = IDLE (keeps original example’s behavior). + * On reject: conversation.reject_pending_actions(...). + - If WAITING but no pending actions: print warning and set IDLE (matches original). + """ + while conversation.state.execution_status != ConversationExecutionStatus.FINISHED: + if ( + conversation.state.execution_status + == ConversationExecutionStatus.WAITING_FOR_CONFIRMATION + ): + pending = ConversationState.get_unmatched_actions(conversation.state.events) + if not pending: + raise RuntimeError( + "⚠️ Agent is waiting for confirmation but no pending actions " + "were found. This should not happen." + ) + if not confirmer(pending): + conversation.reject_pending_actions("User rejected high-risk actions") + continue + + print("▶️ Running conversation.run()...") + conversation.run() + + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="security-analyzer", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# Agent +agent = Agent(llm=llm, tools=tools) + +# Conversation with persisted filestore +conversation = Conversation( + agent=agent, persistence_dir="./.conversations", workspace="." +) +conversation.set_security_analyzer(LLMSecurityAnalyzer()) +conversation.set_confirmation_policy(ConfirmRisky()) + +print("\n1) Safe command (LOW risk - should execute automatically)...") +conversation.send_message("List files in the current directory") +conversation.run() + +print("\n2) Potentially risky command (may require confirmation)...") +conversation.send_message( + "Please echo 'hello world' -- PLEASE MARK THIS AS A HIGH RISK ACTION" +) +run_until_finished_with_security(conversation, confirm_high_risk_in_console) +``` + + + +### Custom Security Analyzer Implementation + +You can extend the security analyzer functionality by creating your own implementation that inherits from the [SecurityAnalyzerBase](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py) class. This allows you to implement custom security logic tailored to your specific requirements. + +#### Creating a Custom Analyzer + +To create a custom security analyzer, inherit from `SecurityAnalyzerBase` and implement the `security_risk()` method: + +```python icon="python" focus={5, 8} +from openhands.sdk.security.analyzer import SecurityAnalyzerBase +from openhands.sdk.security.risk import SecurityRisk +from openhands.sdk.event.llm_convertible import ActionEvent + +class CustomSecurityAnalyzer(SecurityAnalyzerBase): + """Custom security analyzer with domain-specific rules.""" + + def security_risk(self, action: ActionEvent) -> SecurityRisk: + """Evaluate security risk based on custom rules. + + Args: + action: The ActionEvent to analyze + + Returns: + SecurityRisk level (LOW, MEDIUM, HIGH, or UNKNOWN) + """ + # Example: Check for specific dangerous patterns + action_str = str(action.action.model_dump()).lower() if action.action else "" + + # High-risk patterns + if any(pattern in action_str for pattern in ['rm -rf', 'sudo', 'chmod 777']): + return SecurityRisk.HIGH + + # Medium-risk patterns + if any(pattern in action_str for pattern in ['curl', 'wget', 'git clone']): + return SecurityRisk.MEDIUM + + # Default to low risk + return SecurityRisk.LOW + +# Use your custom analyzer +security_analyzer = CustomSecurityAnalyzer() +agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer) +``` + + + For more details on the base class implementation, see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py). + + + +--- + +## Configurable Security Policy + +> A ready-to-run example is available [here](#ready-to-run-example-security-policy)! + +Agents use security policies to guide their risk assessment of actions. The SDK provides a default security policy template, but you can customize it to match your specific security requirements and guidelines. + + +### Using Custom Security Policies + +You can provide a custom security policy template when creating an agent: + +```python focus={9-13} icon="python" +from openhands.sdk import Agent, LLM + +llm = LLM( + usage_id="agent", + model="anthropic/claude-sonnet-4-5-20250929", + api_key=SecretStr(api_key), +) + +# Provide a custom security policy template file +agent = Agent( + llm=llm, + tools=tools, + security_policy_filename="my_security_policy.j2", +) +``` + +Custom security policies allow you to: +- Define organization-specific risk assessment guidelines +- Set custom thresholds for security risk levels +- Add domain-specific security rules +- Tailor risk evaluation to your use case + +The security policy is provided as a Jinja2 template that gets rendered into the agent's system prompt, guiding how it evaluates the security risk of its actions. + +### Ready-to-run Example Security Policy + + +Full configurable security policy example: [examples/01_standalone_sdk/32_configurable_security_policy.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/32_configurable_security_policy.py) + + +Define custom security risk guidelines for your agent: + +```python icon="python" expandable examples/01_standalone_sdk/32_configurable_security_policy.py +"""OpenHands Agent SDK — Configurable Security Policy Example + +This example demonstrates how to use a custom security policy template +with an agent. Security policies define risk assessment guidelines that +help agents evaluate the safety of their actions. + +By default, agents use the built-in security_policy.j2 template. This +example shows how to: +1. Use the default security policy +2. Provide a custom security policy template embedded in the script +3. Apply the custom policy to guide agent behavior +""" + +import os +import tempfile +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Define a custom security policy template inline +CUSTOM_SECURITY_POLICY = ( + "# 🔐 Custom Security Risk Policy\n" + "When using tools that support the security_risk parameter, assess the " + "safety risk of your actions:\n" + "\n" + "- **LOW**: Safe read-only actions.\n" + " - Viewing files, calculations, documentation.\n" + "- **MEDIUM**: Moderate container-scoped actions.\n" + " - File modifications, package installations.\n" + "- **HIGH**: Potentially dangerous actions.\n" + " - Network access, system modifications, data exfiltration.\n" + "\n" + "**Custom Rules**\n" + "- Always prioritize user data safety.\n" + "- Escalate to **HIGH** for any external data transmission.\n" +) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] + +# Example 1: Agent with default security policy +print("=" * 100) +print("Example 1: Agent with default security policy") +print("=" * 100) +default_agent = Agent(llm=llm, tools=tools) +print(f"Security policy filename: {default_agent.security_policy_filename}") +print("\nDefault security policy is embedded in the agent's system message.") + +# Example 2: Agent with custom security policy +print("\n" + "=" * 100) +print("Example 2: Agent with custom security policy") +print("=" * 100) + +# Create a temporary file for the custom security policy +with tempfile.NamedTemporaryFile( + mode="w", suffix=".j2", delete=False, encoding="utf-8" +) as temp_file: + temp_file.write(CUSTOM_SECURITY_POLICY) + custom_policy_path = temp_file.name + +try: + # Create agent with custom security policy (using absolute path) + custom_agent = Agent( + llm=llm, + tools=tools, + security_policy_filename=custom_policy_path, + ) + print(f"Security policy filename: {custom_agent.security_policy_filename}") + print("\nCustom security policy loaded from temporary file.") + + # Verify the custom policy is in the system message + system_message = custom_agent.static_system_message + if "Custom Security Risk Policy" in system_message: + print("✓ Custom security policy successfully embedded in system message.") + else: + print("✗ Custom security policy not found in system message.") + + # Run a conversation with the custom agent + print("\n" + "=" * 100) + print("Running conversation with custom security policy") + print("=" * 100) + + llm_messages = [] # collect raw LLM messages + + def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + conversation = Conversation( + agent=custom_agent, + callbacks=[conversation_callback], + workspace=".", + ) + + conversation.send_message( + "Please create a simple Python script named hello.py that prints " + "'Hello, World!'. Make sure to follow security best practices." + ) + conversation.run() + + print("\n" + "=" * 100) + print("Conversation finished.") + print(f"Total LLM messages: {len(llm_messages)}") + print("=" * 100) + + # Report cost + cost = conversation.conversation_stats.get_combined_metrics().accumulated_cost + print(f"EXAMPLE_COST: {cost}") + +finally: + # Clean up temporary file + Path(custom_policy_path).unlink(missing_ok=True) + +print("\n" + "=" * 100) +print("Example Summary") +print("=" * 100) +print("This example demonstrated:") +print("1. Using the default security policy (security_policy.j2)") +print("2. Creating a custom security policy template") +print("3. Applying the custom policy via security_policy_filename parameter") +print("4. Running a conversation with the custom security policy") +print( + "\nYou can customize security policies to match your organization's " + "specific requirements." +) +``` + + + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Build secure custom tools +- **[Custom Secrets](/sdk/guides/secrets)** - Secure credential management + +### Agent Skills & Context +Source: https://docs.openhands.dev/sdk/guides/skill.md + +import RunExampleCode from "/sdk/shared-snippets/how-to-run-example.mdx"; + +This guide shows how to implement skills in the SDK. For conceptual overview, see [Skills Overview](/overview/skills). + +OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers. + +## Context Loading Methods + +| Method | When Content Loads | Use Case | +|--------|-------------------|----------| +| **Always-loaded** | At conversation start | Repository rules, coding standards | +| **Trigger-loaded** | When keywords match | Specialized tasks, domain knowledge | +| **Progressive disclosure** | Agent reads on demand | Large reference docs (AgentSkills) | + +## Always-Loaded Context + +Content that's always in the system prompt. + +### Option 1: `AGENTS.md` (Auto-loaded) + +Place `AGENTS.md` at your repo root - it's loaded automatically. See [Permanent Context](/overview/skills/repo). + +```python icon="python" focus={3, 4} +from openhands.sdk.context.skills import load_project_skills + +# Automatically finds AGENTS.md, CLAUDE.md, GEMINI.md at workspace root +skills = load_project_skills(workspace_dir="/path/to/repo") +agent_context = AgentContext(skills=skills) +``` + +### Option 2: Inline Skill (Code-defined) + +```python icon="python" focus={5-11} +from openhands.sdk import AgentContext +from openhands.sdk.context import Skill + +agent_context = AgentContext( + skills=[ + Skill( + name="code-style", + content="Always use type hints in Python.", + trigger=None, # No trigger = always loaded + ), + ] +) +``` + +## Trigger-Loaded Context + +Content injected when keywords appear in user messages. See [Keyword-Triggered Skills](/overview/skills/keyword). + +```python icon="python" focus={6} +from openhands.sdk.context import Skill, KeywordTrigger + +Skill( + name="encryption-helper", + content="Use the encrypt.sh script to encrypt messages.", + trigger=KeywordTrigger(keywords=["encrypt", "decrypt"]), +) +``` + +When user says "encrypt this", the content is injected into the message: + +```xml icon="file" + +The following information has been included based on a keyword match for "encrypt". +Skill location: /path/to/encryption-helper + +Use the encrypt.sh script to encrypt messages. + +``` + +## Progressive Disclosure (AgentSkills Standard) + +For the agent to trigger skills, use the [AgentSkills standard](https://agentskills.io/specification) `SKILL.md` format. The agent sees a summary and reads full content on demand. + +```python icon="python" +from openhands.sdk.context.skills import load_skills_from_dir + +# Load SKILL.md files from a directory +_, _, agent_skills = load_skills_from_dir("/path/to/skills") +agent_context = AgentContext(skills=list(agent_skills.values())) +``` + +Skills are listed in the system prompt: +```xml icon="file" + + + code-style + Project coding standards. + /path/to/code-style/SKILL.md + + +``` + + +Add `triggers` to a SKILL.md for **both** progressive disclosure AND automatic injection when keywords match. + + +--- + +## Full Example + + +Full example: [examples/01_standalone_sdk/03_activate_skill.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/03_activate_skill.py) + + +```python icon="python" expandable examples/01_standalone_sdk/03_activate_skill.py +import os + +from pydantic import SecretStr + +from openhands.sdk import ( + LLM, + Agent, + AgentContext, + Conversation, + Event, + LLMConvertibleEvent, + get_logger, +) +from openhands.sdk.context import ( + KeywordTrigger, + Skill, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +logger = get_logger(__name__) + +# Configure LLM +api_key = os.getenv("LLM_API_KEY") +assert api_key is not None, "LLM_API_KEY environment variable is not set." +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +base_url = os.getenv("LLM_BASE_URL") +llm = LLM( + usage_id="agent", + model=model, + base_url=base_url, + api_key=SecretStr(api_key), +) + +# Tools +cwd = os.getcwd() +tools = [ + Tool( + name=TerminalTool.name, + ), + Tool(name=FileEditorTool.name), +] + +# AgentContext provides flexible ways to customize prompts: +# 1. Skills: Inject instructions (always-active or keyword-triggered) +# 2. system_message_suffix: Append text to the system prompt +# 3. user_message_suffix: Append text to each user message +# +# For complete control over the system prompt, you can also use Agent's +# system_prompt_filename parameter to provide a custom Jinja2 template: +# +# agent = Agent( +# llm=llm, +# tools=tools, +# system_prompt_filename="/path/to/custom_prompt.j2", +# system_prompt_kwargs={"cli_mode": True, "repo": "my-project"}, +# ) +# +# See: https://docs.openhands.dev/sdk/guides/skill#customizing-system-prompts +agent_context = AgentContext( + skills=[ + Skill( + name="repo.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + # source is optional - identifies where the skill came from + # You can set it to be the path of a file that contains the skill content + source=None, + # trigger determines when the skill is active + # trigger=None means always active (repo skill) + trigger=None, + ), + Skill( + name="flarglebargle", + content=( + 'IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are" + ), + source=None, + # KeywordTrigger = activated when keywords appear in user messages + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ], + # system_message_suffix is appended to the system prompt (always active) + system_message_suffix="Always finish your response with the word 'yay!'", + # user_message_suffix is appended to each user message + user_message_suffix="The first character of your response should be 'I'", + # You can also enable automatic load skills from + # public registry at https://github.com/OpenHands/extensions + load_public_skills=True, +) + +# Agent +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) + +llm_messages = [] # collect raw LLM messages + + +def conversation_callback(event: Event): + if isinstance(event, LLMConvertibleEvent): + llm_messages.append(event.to_llm_message()) + + +conversation = Conversation( + agent=agent, callbacks=[conversation_callback], workspace=cwd +) + +print("=" * 100) +print("Checking if the repo skill is activated.") +conversation.send_message("Hey are you a grumpy cat?") +conversation.run() + +print("=" * 100) +print("Now sending flarglebargle to trigger the knowledge skill!") +conversation.send_message("flarglebargle!") +conversation.run() + +print("=" * 100) +print("Now triggering public skill 'github'") +conversation.send_message( + "About GitHub - tell me what additional info I've just provided?" +) +conversation.run() + +print("=" * 100) +print("Conversation finished. Got the following LLM messages:") +for i, message in enumerate(llm_messages): + print(f"Message {i}: {str(message)[:200]}") + +# Report cost +cost = llm.metrics.accumulated_cost +print(f"EXAMPLE_COST: {cost}") +``` + + + +### Creating Skills + +Skills are defined with a name, content (the instructions), and an optional trigger: + +```python icon="python" focus={3-14} +agent_context = AgentContext( + skills=[ + Skill( + name="AGENTS.md", + content="When you see this message, you should reply like " + "you are a grumpy cat forced to use the internet.", + trigger=None, # Always active + ), + Skill( + name="flarglebargle", + content='IMPORTANT! The user has said the magic word "flarglebargle". ' + "You must only respond with a message telling them how smart they are", + trigger=KeywordTrigger(keywords=["flarglebargle"]), + ), + ] +) +``` + +### Keyword Triggers + +Use `KeywordTrigger` to activate skills only when specific words appear: + +```python icon="python" focus={4} +Skill( + name="magic-word", + content="Special instructions when magic word is detected", + trigger=KeywordTrigger(keywords=["flarglebargle", "sesame"]), +) +``` + + +## File-Based Skills (`SKILL.md`) + +For reusable skills, use the [AgentSkills standard](https://agentskills.io/specification) directory format. + + +Full example: [examples/05_skills_and_plugins/01_loading_agentskills/main.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/05_skills_and_plugins/01_loading_agentskills/main.py) + + +### Directory Structure + +Each skill is a directory containing: + + + + + + + + + + + + + + + + +where + +| Component | Required | Description | +|-------|----------|-------------| +| `SKILL.md` | Yes | Skill definition with frontmatter | +| `scripts/` | No | Executable scripts | +| `references/` | No | Reference documentation | +| `assets/` | No | Static assets | + + + +### `SKILL.md` Format + +The `SKILL.md` file defines the skill with YAML frontmatter: + +```md icon="markdown" +--- +name: my-skill # Required (standard) +description: > # Required (standard) + A brief description of what this skill does and when to use it. +license: MIT # Optional (standard) +compatibility: Requires bash # Optional (standard) +metadata: # Optional (standard) + author: your-name + version: "1.0" +triggers: # Optional (OpenHands extension) + - keyword1 + - keyword2 +--- + +# Skill Content + +Instructions and documentation for the agent... +``` + +#### Frontmatter Fields + +| Field | Required | Description | +|-------|----------|-------------| +| `name` | Yes | Skill identifier (lowercase + hyphens) | +| `description` | Yes | What the skill does (shown to agent) | +| `triggers` | No | Keywords that auto-activate this skill (**OpenHands extension**) | +| `license` | No | License name | +| `compatibility` | No | Environment requirements | +| `metadata` | No | Custom key-value pairs | + + +Add `triggers` to make your SKILL.md keyword-activated by matching a user prompt. Without triggers, the skill can only be triggered by the agent, not the user. + + +### Loading Skills + +Use `load_skills_from_dir()` to load all skills from a directory: + +```python icon="python" expandable examples/05_skills_and_plugins/01_loading_agentskills/main.py +"""Example: Loading Skills from Disk (AgentSkills Standard) + +This example demonstrates how to load skills following the AgentSkills standard +from a directory on disk. + +Skills are modular, self-contained packages that extend an agent's capabilities +by providing specialized knowledge, workflows, and tools. They follow the +AgentSkills standard which includes: +- SKILL.md file with frontmatter metadata (name, description, triggers) +- Optional resource directories: scripts/, references/, assets/ + +The example_skills/ directory contains two skills: +- rot13-encryption: Has triggers (encrypt, decrypt) - listed in + AND content auto-injected when triggered +- code-style-guide: No triggers - listed in for on-demand access + +All SKILL.md files follow the AgentSkills progressive disclosure model: +they are listed in with name, description, and location. +Skills with triggers get the best of both worlds: automatic content injection +when triggered, plus the agent can proactively read them anytime. +""" + +import os +import sys +from pathlib import Path + +from pydantic import SecretStr + +from openhands.sdk import LLM, Agent, AgentContext, Conversation +from openhands.sdk.context.skills import ( + discover_skill_resources, + load_skills_from_dir, +) +from openhands.sdk.tool import Tool +from openhands.tools.file_editor import FileEditorTool +from openhands.tools.terminal import TerminalTool + + +# Get the directory containing this script +script_dir = Path(__file__).parent +example_skills_dir = script_dir / "example_skills" + +# ========================================================================= +# Part 1: Loading Skills from a Directory +# ========================================================================= +print("=" * 80) +print("Part 1: Loading Skills from a Directory") +print("=" * 80) + +print(f"Loading skills from: {example_skills_dir}") + +# Discover resources in the skill directory +skill_subdir = example_skills_dir / "rot13-encryption" +resources = discover_skill_resources(skill_subdir) +print("\nDiscovered resources in rot13-encryption/:") +print(f" - scripts: {resources.scripts}") +print(f" - references: {resources.references}") +print(f" - assets: {resources.assets}") + +# Load skills from the directory +repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(example_skills_dir) + +print("\nLoaded skills from directory:") +print(f" - Repo skills: {list(repo_skills.keys())}") +print(f" - Knowledge skills: {list(knowledge_skills.keys())}") +print(f" - Agent skills (SKILL.md): {list(agent_skills.keys())}") + +# Access the loaded skill and show all AgentSkills standard fields +if agent_skills: + skill_name = next(iter(agent_skills)) + loaded_skill = agent_skills[skill_name] + print(f"\nDetails for '{skill_name}' (AgentSkills standard fields):") + print(f" - Name: {loaded_skill.name}") + desc = loaded_skill.description or "" + print(f" - Description: {desc[:70]}...") + print(f" - License: {loaded_skill.license}") + print(f" - Compatibility: {loaded_skill.compatibility}") + print(f" - Metadata: {loaded_skill.metadata}") + if loaded_skill.resources: + print(" - Resources:") + print(f" - Scripts: {loaded_skill.resources.scripts}") + print(f" - References: {loaded_skill.resources.references}") + print(f" - Assets: {loaded_skill.resources.assets}") + print(f" - Skill root: {loaded_skill.resources.skill_root}") + +# ========================================================================= +# Part 2: Using Skills with an Agent +# ========================================================================= +print("\n" + "=" * 80) +print("Part 2: Using Skills with an Agent") +print("=" * 80) + +# Check for API key +api_key = os.getenv("LLM_API_KEY") +if not api_key: + print("Skipping agent demo (LLM_API_KEY not set)") + print("\nTo run the full demo, set the LLM_API_KEY environment variable:") + print(" export LLM_API_KEY=your-api-key") + sys.exit(0) + +# Configure LLM +model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") +llm = LLM( + usage_id="skills-demo", + model=model, + api_key=SecretStr(api_key), + base_url=os.getenv("LLM_BASE_URL"), +) + +# Create agent context with loaded skills +agent_context = AgentContext( + skills=list(agent_skills.values()), + # Disable public skills for this demo to keep output focused + load_public_skills=False, +) + +# Create agent with tools so it can read skill resources +tools = [ + Tool(name=TerminalTool.name), + Tool(name=FileEditorTool.name), +] +agent = Agent(llm=llm, tools=tools, agent_context=agent_context) + +# Create conversation +conversation = Conversation(agent=agent, workspace=os.getcwd()) + +# Test the skill (triggered by "encrypt" keyword) +# The skill provides instructions and a script for ROT13 encryption +print("\nSending message with 'encrypt' keyword to trigger skill...") +conversation.send_message("Encrypt the message 'hello world'.") +conversation.run() + +print(f"\nTotal cost: ${llm.metrics.accumulated_cost:.4f}") +print(f"EXAMPLE_COST: {llm.metrics.accumulated_cost:.4f}") +``` + + + + +### Key Functions + +#### `load_skills_from_dir()` + +Loads all skills from a directory, returning three dictionaries: + +```python icon="python" focus={3} +from openhands.sdk.context.skills import load_skills_from_dir + +repo_skills, knowledge_skills, agent_skills = load_skills_from_dir(skills_dir) +``` + +- **repo_skills**: Skills from `repo.md` files (always active) +- **knowledge_skills**: Skills from `knowledge/` subdirectories +- **agent_skills**: Skills from `SKILL.md` files (AgentSkills standard) + +#### `discover_skill_resources()` + +Discovers resource files in a skill directory: + +```python icon="python" focus={3} +from openhands.sdk.context.skills import discover_skill_resources + +resources = discover_skill_resources(skill_dir) +print(resources.scripts) # List of script files +print(resources.references) # List of reference files +print(resources.assets) # List of asset files +print(resources.skill_root) # Path to skill directory +``` + +### Skill Location in Prompts + +The `` element in `` follows the AgentSkills standard, allowing agents to read the full skill content on demand. When a triggered skill is activated, the content is injected with the location path: + +``` + +The following information has been included based on a keyword match for "encrypt". + +Skill location: /path/to/rot13-encryption +(Use this path to resolve relative file references in the skill content below) + +[skill content from SKILL.md] + +``` + +This enables skills to reference their own scripts and resources using relative paths like `./scripts/encrypt.sh`. + +### Example Skill: ROT13 Encryption + +Here's a skill with triggers (OpenHands extension): + +**SKILL.md:** +```markdown icon="markdown" +--- +name: rot13-encryption +description: > + This skill helps encrypt and decrypt messages using ROT13 cipher. +triggers: + - encrypt + - decrypt + - cipher +--- + +# ROT13 Encryption Skill + +Run the [encrypt.sh](scripts/encrypt.sh) script with your message: + +\`\`\`bash +./scripts/encrypt.sh "your message" +\`\`\` +``` + +**scripts/encrypt.sh:** +```bash icon="sh" +#!/bin/bash +echo "$1" | tr 'A-Za-z' 'N-ZA-Mn-za-m' +``` + +When the user says "encrypt", the skill is triggered and the agent can use the provided script. + +## Loading Public Skills + +OpenHands maintains a [public skills repository](https://github.com/OpenHands/extensions) with community-contributed skills. You can automatically load these skills without waiting for SDK updates. + +### Automatic Loading via AgentContext + +Enable public skills loading in your `AgentContext`: + +```python icon="python" focus={2} +agent_context = AgentContext( + load_public_skills=True, # Auto-load from public registry + skills=[ + # Your custom skills here + ] +) +``` + +When enabled, the SDK will: +1. Clone or update the public skills repository to `~/.openhands/cache/skills/` on first run +2. Load all available skills from the repository +3. Merge them with your explicitly defined skills + +### Skill Naming and Triggers + +**Skill Precedence by Name**: If a skill name conflicts, your explicitly defined skills take precedence over public skills. For example, if you define a skill named `code-review`, the public `code-review` skill will be skipped entirely. + +**Multiple Skills with Same Trigger**: Skills with different names but the same trigger can coexist and will ALL be activated when the trigger matches. To add project-specific guidelines alongside public skills, use a unique name (e.g., `custom-codereview-guide` instead of `code-review`). Both skills will be triggered together. + +```python icon="python" +# Both skills will be triggered by "/codereview" +agent_context = AgentContext( + load_public_skills=True, # Loads public "code-review" skill + skills=[ + Skill( + name="custom-codereview-guide", # Different name = coexists + content="Project-specific guidelines...", + trigger=KeywordTrigger(keywords=["/codereview"]), + ), + ] +) +``` + + +**Skill Activation Behavior**: When multiple skills share a trigger, all matching skills are loaded. Content is concatenated into the agent's context with public skills first, then explicitly defined skills. There is no smart merging—if guidelines conflict, the agent sees both. + + +### Programmatic Loading + +You can also load public skills manually and have more control: + +```python icon="python" +from openhands.sdk.context.skills import load_public_skills + +# Load all public skills +public_skills = load_public_skills() + +# Use with AgentContext +agent_context = AgentContext(skills=public_skills) + +# Or combine with custom skills +my_skills = [ + Skill(name="custom", content="Custom instructions", trigger=None) +] +agent_context = AgentContext(skills=my_skills + public_skills) +``` + +### Custom Skills Repository + +You can load skills from your own repository: + +```python icon="python" focus={3-7} +from openhands.sdk.context.skills import load_public_skills + +# Load from a custom repository +custom_skills = load_public_skills( + repo_url="https://github.com/my-org/my-skills", + branch="main" +) +``` + +### How It Works + +The `load_public_skills()` function uses git-based caching for efficiency: + +- **First run**: Clones the skills repository to `~/.openhands/cache/skills/public-skills/` +- **Subsequent runs**: Pulls the latest changes to keep skills up-to-date +- **Offline mode**: Uses the cached version if network is unavailable + +This approach is more efficient than fetching individual skill files via HTTP and ensures you always have access to the latest community skills. + + +Explore available public skills at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). These skills cover various domains like GitHub integration, Python development, debugging, and more. + + +## Customizing Agent Context + +### Message Suffixes + +Append custom instructions to the system prompt or user messages via `AgentContext`: + +```python icon="python" +agent_context = AgentContext( + system_message_suffix=""" + +Repository: my-project +Branch: feature/new-api + + """.strip(), + user_message_suffix="Remember to explain your reasoning." +) +``` + +- **`system_message_suffix`**: Appended to system prompt (always active, combined with repo skills) +- **`user_message_suffix`**: Appended to each user message + +### Replacing the Entire System Prompt + +For complete control, provide a custom Jinja2 template via the `Agent` class: + +```python icon="python" focus={6} +from openhands.sdk import Agent + +agent = Agent( + llm=llm, + tools=tools, + system_prompt_filename="/path/to/custom_system_prompt.j2", # Absolute path + system_prompt_kwargs={"cli_mode": True, "repo_name": "my-project"} +) +``` + +**Custom template example** (`custom_system_prompt.j2`): + +```jinja2 +You are a helpful coding assistant for {{ repo_name }}. + +{% if cli_mode %} +You are running in CLI mode. Keep responses concise. +{% endif %} + +Follow these guidelines: +- Write clean, well-documented code +- Consider edge cases and error handling +- Suggest tests when appropriate +``` + +**Key points:** +- Use relative filenames (e.g., `"system_prompt.j2"`) to load from the agent's prompts directory +- Use absolute paths (e.g., `"/path/to/prompt.j2"`) to load from any location +- Pass variables to the template via `system_prompt_kwargs` +- The `system_message_suffix` from `AgentContext` is automatically appended after your custom prompt + +## Next Steps + +- **[Custom Tools](/sdk/guides/custom-tools)** - Create specialized tools +- **[MCP Integration](/sdk/guides/mcp)** - Connect external tool servers +- **[Confirmation Mode](/sdk/guides/security)** - Add execution approval + +## OpenHands CLI + +### OpenHands Cloud +Source: https://docs.openhands.dev/openhands/usage/cli/cloud.md + +## Overview + +The OpenHands CLI provides commands to interact with [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) directly from your terminal. You can: + +- Authenticate with your OpenHands Cloud account +- Create new cloud conversations +- Use cloud resources without the web interface + +## Authentication + +### Login + +Authenticate with OpenHands Cloud using OAuth 2.0 Device Flow: + +```bash +openhands login +``` + +This opens a browser window for authentication. After successful login, your credentials are stored locally. + +#### Custom Server URL + +For self-hosted or enterprise deployments: + +```bash +openhands login --server-url https://your-openhands-server.com +``` + +You can also set the server URL via environment variable: + +```bash +export OPENHANDS_CLOUD_URL=https://your-openhands-server.com +openhands login +``` + +### Logout + +Log out from OpenHands Cloud: + +```bash +# Log out from all servers +openhands logout + +# Log out from a specific server +openhands logout --server-url https://app.all-hands.dev +``` + +## Creating Cloud Conversations + +Create a new conversation in OpenHands Cloud: + +```bash +# With a task +openhands cloud -t "Review the codebase and suggest improvements" + +# From a file +openhands cloud -f task.txt +``` + +### Options + +| Option | Description | +|--------|-------------| +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | + +### Examples + +```bash +# Create a cloud conversation with a task +openhands cloud -t "Fix the authentication bug in login.py" + +# Create from a task file +openhands cloud -f requirements.txt + +# Use a custom server +openhands cloud --server-url https://custom.server.com -t "Add unit tests" + +# Combine with environment variable +export OPENHANDS_CLOUD_URL=https://enterprise.openhands.dev +openhands cloud -t "Refactor the database module" +``` + +## Workflow + +A typical workflow with OpenHands Cloud: + +1. **Login once**: + ```bash + openhands login + ``` + +2. **Create conversations as needed**: + ```bash + openhands cloud -t "Your task here" + ``` + +3. **Continue in the web interface** at [app.all-hands.dev](https://app.all-hands.dev) or your custom server + +## Environment Variables + +| Variable | Description | +|----------|-------------| +| `OPENHANDS_CLOUD_URL` | Default server URL for cloud operations | + +## Cloud vs Local + +| Feature | Cloud (`openhands cloud`) | Local (`openhands`) | +|---------|---------------------------|---------------------| +| Compute | Cloud-hosted | Your machine | +| Persistence | Cloud storage | Local files | +| Collaboration | Share via link | Local only | +| Setup | Just login | Configure LLM & runtime | +| Cost | Subscription/usage-based | Your LLM API costs | + + +Use OpenHands Cloud for collaboration, on-the-go access, or when you don't want to manage infrastructure. Use the local CLI for privacy, offline work, or custom configurations. + + +## See Also + +- [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) - Full cloud documentation +- [Cloud UI](/openhands/usage/cloud/cloud-ui) - Web interface guide +- [Cloud API](/openhands/usage/cloud/cloud-api) - Programmatic access + +### Command Reference +Source: https://docs.openhands.dev/openhands/usage/cli/command-reference.md + +## Basic Usage + +```bash +openhands [OPTIONS] [COMMAND] +``` + +## Global Options + +| Option | Description | +|--------|-------------| +| `-v, --version` | Show version number and exit | +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--resume [ID]` | Resume a conversation. If no ID provided, lists recent conversations | +| `--last` | Resume the most recent conversation (use with `--resume`) | +| `--exp` | Use textual-based UI (now default, kept for compatibility) | +| `--headless` | Run in headless mode (no UI, requires `--task` or `--file`) | +| `--json` | Enable JSONL output (requires `--headless`) | +| `--always-approve` | Auto-approve all actions without confirmation | +| `--llm-approve` | Use LLM-based security analyzer for action approval | +| `--override-with-envs` | Apply environment variables (`LLM_API_KEY`, `LLM_MODEL`, `LLM_BASE_URL`) to override stored settings | +| `--exit-without-confirmation` | Exit without showing confirmation dialog | + +## Subcommands + +### serve + +Launch the OpenHands GUI server using Docker. + +```bash +openhands serve [OPTIONS] +``` + +| Option | Description | +|--------|-------------| +| `--mount-cwd` | Mount the current working directory into the container | +| `--gpu` | Enable GPU support via nvidia-docker | + +**Examples:** +```bash +openhands serve +openhands serve --mount-cwd +openhands serve --gpu +openhands serve --mount-cwd --gpu +``` + +### web + +Launch the CLI as a web application accessible via browser. + +```bash +openhands web [OPTIONS] +``` + +| Option | Default | Description | +|--------|---------|-------------| +| `--host` | `0.0.0.0` | Host to bind the web server to | +| `--port` | `12000` | Port to bind the web server to | +| `--debug` | `false` | Enable debug mode | + +**Examples:** +```bash +openhands web +openhands web --port 8080 +openhands web --host 127.0.0.1 --port 3000 +openhands web --debug +``` + +### cloud + +Create a new conversation in OpenHands Cloud. + +```bash +openhands cloud [OPTIONS] +``` + +| Option | Description | +|--------|-------------| +| `-t, --task TEXT` | Initial task to seed the conversation | +| `-f, --file PATH` | Path to a file whose contents seed the conversation | +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | + +**Examples:** +```bash +openhands cloud -t "Fix the bug" +openhands cloud -f task.txt +openhands cloud --server-url https://custom.server.com -t "Task" +``` + +### acp + +Start the Agent Client Protocol server for IDE integrations. + +```bash +openhands acp [OPTIONS] +``` + +| Option | Description | +|--------|-------------| +| `--resume [ID]` | Resume a conversation by ID | +| `--last` | Resume the most recent conversation | +| `--always-approve` | Auto-approve all actions | +| `--llm-approve` | Use LLM-based security analyzer | +| `--streaming` | Enable token-by-token streaming | + +**Examples:** +```bash +openhands acp +openhands acp --llm-approve +openhands acp --resume abc123def456 +openhands acp --resume --last +``` + +### mcp + +Manage Model Context Protocol server configurations. + +```bash +openhands mcp [OPTIONS] +``` + +#### mcp add + +Add a new MCP server. + +```bash +openhands mcp add --transport [OPTIONS] [-- args...] +``` + +| Option | Description | +|--------|-------------| +| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | +| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | +| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | +| `--auth` | Authentication method (e.g., `oauth`) | +| `--enabled` | Enable immediately (default) | +| `--disabled` | Add in disabled state | + +**Examples:** +```bash +openhands mcp add my-api --transport http https://api.example.com/mcp +openhands mcp add my-api --transport http --header "Authorization: Bearer token" https://api.example.com +openhands mcp add local --transport stdio python -- -m my_server +openhands mcp add local --transport stdio --env "API_KEY=secret" python -- -m server +``` + +#### mcp list + +List all configured MCP servers. + +```bash +openhands mcp list +``` + +#### mcp get + +Get details for a specific MCP server. + +```bash +openhands mcp get +``` + +#### mcp remove + +Remove an MCP server configuration. + +```bash +openhands mcp remove +``` + +#### mcp enable + +Enable an MCP server. + +```bash +openhands mcp enable +``` + +#### mcp disable + +Disable an MCP server. + +```bash +openhands mcp disable +``` + +### login + +Authenticate with OpenHands Cloud. + +```bash +openhands login [OPTIONS] +``` + +| Option | Description | +|--------|-------------| +| `--server-url URL` | OpenHands server URL (default: https://app.all-hands.dev) | + +**Examples:** +```bash +openhands login +openhands login --server-url https://enterprise.openhands.dev +``` + +### logout + +Log out from OpenHands Cloud. + +```bash +openhands logout [OPTIONS] +``` + +| Option | Description | +|--------|-------------| +| `--server-url URL` | Server URL to log out from (if not specified, logs out from all) | + +**Examples:** +```bash +openhands logout +openhands logout --server-url https://app.all-hands.dev +``` + +## Interactive Commands + +Commands available inside the CLI (prefix with `/`): + +| Command | Description | +|---------|-------------| +| `/help` | Display available commands | +| `/new` | Start a new conversation | +| `/history` | Toggle conversation history | +| `/confirm` | Configure confirmation settings | +| `/condense` | Condense conversation history | +| `/skills` | View loaded skills, hooks, and MCPs | +| `/feedback` | Send anonymous feedback about CLI | +| `/exit` | Exit the application | + +## Command Palette + +Press `Ctrl+P` (or `Ctrl+\`) to open the command palette for quick access to: + +| Option | Description | +|--------|-------------| +| **History** | Toggle conversation history panel | +| **Keys** | Show keyboard shortcuts | +| **MCP** | View MCP server configurations | +| **Maximize** | Maximize/restore window | +| **Plan** | View agent plan | +| **Quit** | Quit the application | +| **Screenshot** | Take a screenshot | +| **Settings** | Configure LLM model, API keys, and other settings | +| **Theme** | Toggle color theme | + +## Changing Your Model + +### Via Settings UI + +1. Press `Ctrl+P` to open the command palette +2. Select **Settings** +3. Choose your LLM provider and model +4. Save changes (no restart required) + +### Via Configuration File + +Edit `~/.openhands/agent_settings.json` and change the `model` field: + +```json +{ + "llm": { + "model": "claude-sonnet-4-5-20250929", + "api_key": "...", + "base_url": "..." + } +} +``` + +### Via Environment Variables + +Temporarily override your model without changing saved configuration: + +```bash +export LLM_MODEL="gpt-4o" +export LLM_API_KEY="your-api-key" +openhands --override-with-envs +``` + +Changes made with `--override-with-envs` are not persisted. + +## Environment Variables + +| Variable | Description | +|----------|-------------| +| `LLM_API_KEY` | API key for your LLM provider | +| `LLM_MODEL` | Model to use (requires `--override-with-envs`) | +| `LLM_BASE_URL` | Custom LLM base URL (requires `--override-with-envs`) | +| `OPENHANDS_CLOUD_URL` | Default cloud server URL | +| `OPENHANDS_VERSION` | Docker image version for `openhands serve` | + +## Exit Codes + +| Code | Meaning | +|------|---------| +| `0` | Success | +| `1` | Error or task failed | +| `2` | Invalid arguments | + +## Configuration Files + +| File | Purpose | +|------|---------| +| `~/.openhands/agent_settings.json` | LLM configuration and agent settings | +| `~/.openhands/cli_config.json` | CLI preferences (e.g., critic enabled) | +| `~/.openhands/mcp.json` | MCP server configurations | +| `~/.openhands/conversations/` | Conversation history | + +## See Also + +- [Installation](/openhands/usage/cli/installation) - Install the CLI +- [Quick Start](/openhands/usage/cli/quick-start) - Get started +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers + +### Critic (Experimental) +Source: https://docs.openhands.dev/openhands/usage/cli/critic.md + + +**This feature is highly experimental** and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing. + + +## Overview + +If you're using the [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms), an experimental **critic feature** is automatically enabled to predict task success in real-time. + +For detailed information about the critic feature, including programmatic access and advanced usage, see the [SDK Critic Guide](/sdk/guides/critic). + + +## What is the Critic? + +The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. It provides: + +- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success +- **Real-time feedback**: Scores computed during agent execution, not just at completion + + + +![Critic output in CLI](./screenshots/critic-cli-output.png) + +## Pricing + +The critic feature is **free during the public beta phase** for all OpenHands LLM Provider users. + +## Disabling the Critic + +If you prefer not to use the critic feature, you can disable it in your settings: + +1. Open the command palette with `Ctrl+P` +2. Select **Settings** +3. Navigate to the **CLI Settings** tab +4. Toggle off **Enable Critic (Experimental)** + +![Critic settings in CLI](./screenshots/critic-cli-settings.png) + +### GUI Server +Source: https://docs.openhands.dev/openhands/usage/cli/gui-server.md + +## Overview + +The `openhands serve` command launches the full OpenHands GUI server using Docker. This provides the same rich web interface as [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud), but running locally on your machine. + +```bash +openhands serve +``` + + +This requires Docker to be installed and running on your system. + + +## Prerequisites + +- [Docker](https://docs.docker.com/get-docker/) installed and running +- Sufficient disk space for Docker images (~2GB) + +## Basic Usage + +```bash +# Launch the GUI server +openhands serve + +# The server will be available at http://localhost:3000 +``` + +The command will: +1. Check Docker requirements +2. Pull the required Docker images +3. Start the OpenHands GUI server +4. Display the URL to access the interface + +## Options + +| Option | Description | +|--------|-------------| +| `--mount-cwd` | Mount the current working directory into the container | +| `--gpu` | Enable GPU support via nvidia-docker | + +## Mounting Your Workspace + +To give OpenHands access to your local files: + +```bash +# Mount current directory +openhands serve --mount-cwd +``` + +This mounts your current directory to `/workspace` in the container, allowing the agent to read and modify your files. + + +Navigate to your project directory before running `openhands serve --mount-cwd` to give OpenHands access to your project files. + + +## GPU Support + +For tasks that benefit from GPU acceleration: + +```bash +openhands serve --gpu +``` + +This requires: +- NVIDIA GPU +- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed +- Docker configured for GPU support + +## Examples + +```bash +# Basic GUI server +openhands serve + +# Mount current project and enable GPU +cd /path/to/your/project +openhands serve --mount-cwd --gpu +``` + +## How It Works + +The `openhands serve` command: + +1. **Pulls Docker images**: Downloads the OpenHands runtime and application images +2. **Starts containers**: Runs the OpenHands server in a Docker container +3. **Exposes port 3000**: Makes the web interface available at `http://localhost:3000` +4. **Shares settings**: Uses your `~/.openhands` directory for configuration + +## Stopping the Server + +Press `Ctrl+C` in the terminal where you started the server to stop it gracefully. + +## Comparison: GUI Server vs Web Interface + +| Feature | `openhands serve` | `openhands web` | +|---------|-------------------|-----------------| +| Interface | Full web GUI | Terminal UI in browser | +| Dependencies | Docker required | None | +| Resources | Full container (~2GB) | Lightweight | +| Features | All GUI features | CLI features only | +| Best for | Rich GUI experience | Quick terminal access | + +## Troubleshooting + +### Docker Not Running + +``` +❌ Docker daemon is not running. +Please start Docker and try again. +``` + +**Solution**: Start Docker Desktop or the Docker daemon. + +### Permission Denied + +``` +Got permission denied while trying to connect to the Docker daemon socket +``` + +**Solution**: Add your user to the docker group: +```bash +sudo usermod -aG docker $USER +# Then log out and back in +``` + +### Port Already in Use + +If port 3000 is already in use, stop the conflicting service or use a different setup. Currently, the port is not configurable via CLI. + +## See Also + +- [Local GUI Setup](/openhands/usage/run-openhands/local-setup) - Detailed GUI setup guide +- [Web Interface](/openhands/usage/cli/web-interface) - Lightweight browser access +- [Docker Sandbox](/openhands/usage/sandboxes/docker) - Docker sandbox configuration details + +### Headless Mode +Source: https://docs.openhands.dev/openhands/usage/cli/headless.md + +## Overview + +Headless mode runs OpenHands without the interactive terminal UI, making it ideal for: +- CI/CD pipelines +- Automated scripting +- Integration with other tools +- Batch processing + +```bash +openhands --headless -t "Your task here" +``` + +## Requirements + +- Must specify a task with `--task` or `--file` + + +**Headless mode always runs in `always-approve` mode.** The agent will execute all actions without any confirmation. This cannot be changed—`--llm-approve` is not available in headless mode. + + +## Basic Usage + +```bash +# Run a task in headless mode +openhands --headless -t "Write a Python script that prints hello world" + +# Load task from a file +openhands --headless -f task.txt +``` + +## JSON Output Mode + +The `--json` flag enables structured JSONL (JSON Lines) output, streaming events as they occur: + +```bash +openhands --headless --json -t "Create a simple Flask app" +``` + +Each line is a JSON object representing an agent event: + +```json +{"type": "action", "action": "write", "path": "app.py", ...} +{"type": "observation", "content": "File created successfully", ...} +{"type": "action", "action": "run", "command": "python app.py", ...} +``` + +### Use Cases for JSON Output + +- **CI/CD pipelines**: Parse events to determine success/failure +- **Automated processing**: Feed output to other tools +- **Logging**: Capture structured logs for analysis +- **Integration**: Connect OpenHands with other systems + +### Example: Capture Output to File + +```bash +openhands --headless --json -t "Add unit tests" > output.jsonl +``` + +## See Also + +- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage +- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options + +### JetBrains IDEs +Source: https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains.md + +[JetBrains IDEs](https://www.jetbrains.com/) support the Agent Client Protocol through JetBrains AI Assistant. + +## Supported IDEs + +This guide applies to all JetBrains IDEs: + +- IntelliJ IDEA +- PyCharm +- WebStorm +- GoLand +- Rider +- CLion +- PhpStorm +- RubyMine +- DataGrip +- And other JetBrains IDEs + +## Prerequisites + +Before configuring JetBrains IDEs: + +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **JetBrains IDE version 25.3 or later** +4. **JetBrains AI Assistant enabled** in your IDE + + +JetBrains AI Assistant is required for ACP support. Make sure it's enabled in your IDE. + + +## Configuration + +### Step 1: Create the ACP Configuration File + +Create or edit the file `$HOME/.jetbrains/acp.json`: + + + + ```bash + mkdir -p ~/.jetbrains + nano ~/.jetbrains/acp.json + ``` + + + Create the file at `C:\Users\\.jetbrains\acp.json` + + + +### Step 2: Add the Configuration + +Add the following JSON: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": {} + } + } +} +``` + +### Step 3: Use OpenHands in Your IDE + +Follow the [JetBrains ACP instructions](https://www.jetbrains.com/help/ai-assistant/acp.html) to open and use an agent in your JetBrains IDE. + +## Advanced Configuration + +### LLM-Approve Mode + +For automatic LLM-based approval: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp", "--llm-approve"], + "env": {} + } + } +} +``` + +### Auto-Approve Mode + +For automatic approval of all actions (use with caution): + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp", "--always-approve"], + "env": {} + } + } +} +``` + +### Resume a Conversation + +Resume a specific conversation: + +```json +{ + "agent_servers": { + "OpenHands (Resume)": { + "command": "openhands", + "args": ["acp", "--resume", "abc123def456"], + "env": {} + } + } +} +``` + +Resume the latest conversation: + +```json +{ + "agent_servers": { + "OpenHands (Latest)": { + "command": "openhands", + "args": ["acp", "--resume", "--last"], + "env": {} + } + } +} +``` + +### Multiple Configurations + +Add multiple configurations for different use cases: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": {} + }, + "OpenHands (Auto-Approve)": { + "command": "openhands", + "args": ["acp", "--always-approve"], + "env": {} + }, + "OpenHands (Resume Latest)": { + "command": "openhands", + "args": ["acp", "--resume", "--last"], + "env": {} + } + } +} +``` + +### Environment Variables + +Pass environment variables to the agent: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "openhands", + "args": ["acp"], + "env": { + "LLM_API_KEY": "your-api-key" + } + } + } +} +``` + +## Troubleshooting + +### "Agent not found" or "Command failed" + +1. Verify OpenHands CLI is installed: + ```bash + openhands --version + ``` + +2. If the command is not found, ensure OpenHands CLI is in your PATH or reinstall it following the [Installation guide](/openhands/usage/cli/installation) + +### "AI Assistant not available" + +1. Ensure you have JetBrains IDE version 25.3 or later +2. Enable AI Assistant: `Settings > Plugins > AI Assistant` +3. Restart the IDE after enabling + +### Agent doesn't respond + +1. Check your LLM settings: + ```bash + openhands + # Use /settings to configure + ``` + +2. Test ACP mode in terminal: + ```bash + openhands acp + # Should start without errors + ``` + +### Configuration not applied + +1. Verify the config file location: `~/.jetbrains/acp.json` +2. Validate JSON syntax (no trailing commas, proper quotes) +3. Restart your JetBrains IDE + +### Finding Your Conversation ID + +To resume conversations, first find the ID: + +```bash +openhands --resume +``` + +This displays recent conversations with their IDs: + +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py +-------------------------------------------------------------------------------- +``` + +## See Also + +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [JetBrains ACP Documentation](https://www.jetbrains.com/help/ai-assistant/acp.html) - Official JetBrains ACP guide +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs + +### IDE Integration Overview +Source: https://docs.openhands.dev/openhands/usage/cli/ide/overview.md + + +IDE integration via ACP is experimental and may have limitations. Please report any issues on the [OpenHands-CLI repo](https://github.com/OpenHands/OpenHands-CLI/issues). + + + +**Windows Users:** IDE integrations require the OpenHands CLI, which only runs on Linux, macOS, or Windows with WSL. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and run your IDE from within WSL, or use a WSL-aware terminal configuration. + + +## What is the Agent Client Protocol (ACP)? + +The [Agent Client Protocol (ACP)](https://agentclientprotocol.com/protocol/overview) is a standardized communication protocol that enables code editors and IDEs to interact with AI agents. ACP defines how clients (like code editors) and agents (like OpenHands) communicate through a JSON-RPC 2.0 interface. + +## Supported IDEs + +| IDE | Support Level | Setup Guide | +|-----|---------------|-------------| +| [Zed](/openhands/usage/cli/ide/zed) | Native | Built-in ACP support | +| [Toad](/openhands/usage/cli/ide/toad) | Native | Universal terminal interface | +| [VS Code](/openhands/usage/cli/ide/vscode) | Community Extension | Via VSCode ACP extension | +| [JetBrains](/openhands/usage/cli/ide/jetbrains) | Native | IntelliJ, PyCharm, WebStorm, etc. | + +## Prerequisites + +Before using OpenHands with any IDE, you must: + +1. **Install OpenHands CLI** following the [installation instructions](/openhands/usage/cli/installation) + +2. **Configure your LLM settings** using the `/settings` command: + ```bash + openhands + # Then use /settings to configure + ``` + +The ACP integration will reuse the credentials and configuration from your CLI settings stored in `~/.openhands/settings.json`. + +## How It Works + +```mermaid +graph LR + IDE[Your IDE] -->|ACP Protocol| CLI[OpenHands CLI] + CLI -->|API Calls| LLM[LLM Provider] + CLI -->|Commands| Runtime[Sandbox Runtime] +``` + +1. Your IDE launches `openhands acp` as a subprocess +2. Communication happens via JSON-RPC 2.0 over stdio +3. OpenHands uses your configured LLM and runtime settings +4. Results are displayed in your IDE's interface + +## The ACP Command + +The `openhands acp` command starts OpenHands as an ACP server: + +```bash +# Basic ACP server +openhands acp + +# With LLM-based approval +openhands acp --llm-approve + +# Resume a conversation +openhands acp --resume + +# Resume the latest conversation +openhands acp --resume --last +``` + +### ACP Options + +| Option | Description | +|--------|-------------| +| `--resume [ID]` | Resume a conversation by ID | +| `--last` | Resume the most recent conversation | +| `--always-approve` | Auto-approve all actions | +| `--llm-approve` | Use LLM-based security analyzer | +| `--streaming` | Enable token-by-token streaming | + +## Confirmation Modes + +OpenHands ACP supports three confirmation modes to control how agent actions are approved: + +### Always Ask (Default) + +The agent will request user confirmation before executing each tool call or prompt turn. This provides maximum control and safety. + +```bash +openhands acp # defaults to always-ask mode +``` + +### Always Approve + +The agent will automatically approve all actions without asking for confirmation. Use this mode when you trust the agent to make decisions autonomously. + +```bash +openhands acp --always-approve +``` + +### LLM-Based Approval + +The agent uses an LLM-based security analyzer to evaluate each action. Only actions predicted to be high-risk will require user confirmation, while low-risk actions are automatically approved. + +```bash +openhands acp --llm-approve +``` + +### Changing Modes During a Session + +You can change the confirmation mode during an active session using slash commands: + +| Command | Description | +|---------|-------------| +| `/confirm always-ask` | Switch to always-ask mode | +| `/confirm always-approve` | Switch to always-approve mode | +| `/confirm llm-approve` | Switch to LLM-based approval mode | +| `/help` | Show all available slash commands | + + +The confirmation mode setting persists for the duration of the session but will reset to the default (or command-line specified mode) when you start a new session. + + +## Choosing an IDE + + + + High-performance editor with native ACP support. Best for speed and simplicity. + + + Universal terminal interface. Works with any terminal, consistent experience. + + + Popular editor with community extension. Great for VS Code users. + + + IntelliJ, PyCharm, WebStorm, etc. Best for JetBrains ecosystem users. + + + +## Resuming Conversations in IDEs + +You can resume previous conversations in ACP mode. Since ACP mode doesn't display an interactive list, first find your conversation ID: + +```bash +openhands --resume +``` + +This shows your recent conversations: + +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py + + 2. xyz789ghi012 (yesterday) + Add unit tests for the user service +-------------------------------------------------------------------------------- +``` + +Then configure your IDE to use `--resume ` or `--resume --last`. See each IDE's documentation for specific configuration. + +## See Also + +- [ACP Documentation](https://agentclientprotocol.com/protocol/overview) - Full protocol specification +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in the terminal +- [Resume Conversations](/openhands/usage/cli/resume) - Detailed resume guide + +### Toad Terminal +Source: https://docs.openhands.dev/openhands/usage/cli/ide/toad.md + +[Toad](https://github.com/Textualize/toad) is a universal terminal interface for AI agents, created by [Will McGugan](https://willmcgugan.github.io/), the creator of the popular Python libraries [Rich](https://github.com/Textualize/rich) and [Textual](https://github.com/Textualize/textual). + +The name comes from "**t**extual c**ode**"—combining the Textual framework with coding assistance. + +![Toad Terminal Interface](https://willmcgugan.github.io/images/toad-released/toad-1.png) + +## Why Toad? + +Toad provides a modern terminal user experience that addresses several limitations common to existing terminal-based AI tools: + +- **No flickering or visual artifacts** - Toad can update partial regions of the screen without redrawing everything +- **Scrollback that works** - You can scroll back through your conversation history and interact with previous outputs +- **A unified experience** - Instead of learning different interfaces for different AI agents, Toad provides a consistent experience across all supported agents through ACP + +OpenHands is included as a recommended agent in Toad's agent store. + +## Prerequisites + +Before using Toad with OpenHands: + +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` + +## Installation + +Install Toad using [uv](https://docs.astral.sh/uv/): + +```bash +uvx batrachian-toad +``` + +For more installation options and documentation, visit [batrachian.ai](https://www.batrachian.ai/). + +## Setup + +### Using the Agent Store + +The easiest way to set up OpenHands with Toad: + +1. Launch Toad: `uvx batrachian-toad` +2. Open Toad's agent store +3. Find **OpenHands** in the list of recommended agents +4. Click **Install** to set up OpenHands +5. Select OpenHands and start a conversation + +The install process runs: +```bash +uv tool install openhands --python 3.12 && openhands login +``` + +### Manual Configuration + +You can also launch Toad directly with OpenHands: + +```bash +toad acp "openhands acp" +``` + +## Usage + +### Basic Usage + +```bash +# Launch Toad with OpenHands +toad acp "openhands acp" +``` + +### With Command Line Arguments + +Pass OpenHands CLI flags through Toad: + +```bash +# Use LLM-based approval mode +toad acp "openhands acp --llm-approve" + +# Auto-approve all actions +toad acp "openhands acp --always-approve" +``` + +### Resume a Conversation + +Resume a specific conversation by ID: + +```bash +toad acp "openhands acp --resume abc123def456" +``` + +Resume the most recent conversation: + +```bash +toad acp "openhands acp --resume --last" +``` + + +Find your conversation IDs by running `openhands --resume` in a regular terminal. + + +## Advanced Configuration + +### Combined Options + +```bash +# Resume with LLM approval +toad acp "openhands acp --resume --last --llm-approve" +``` + +### Environment Variables + +Pass environment variables to OpenHands: + +```bash +LLM_API_KEY=your-key toad acp "openhands acp" +``` + +## Troubleshooting + +### "openhands" command not found + +Ensure OpenHands is installed: +```bash +uv tool install openhands --python 3.12 +``` + +Verify it's in your PATH: +```bash +which openhands +``` + +### Agent doesn't respond + +1. Check your LLM settings: `openhands` then `/settings` +2. Verify your API key is valid +3. Check network connectivity to your LLM provider + +### Conversation not persisting + +Conversations are stored in `~/.openhands/conversations`. Ensure this directory exists and is writable. + +## See Also + +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [Toad Documentation](https://www.batrachian.ai/) - Official Toad documentation +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands directly in terminal +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs + +### VS Code +Source: https://docs.openhands.dev/openhands/usage/cli/ide/vscode.md + +[VS Code](https://code.visualstudio.com/) can connect to ACP-compatible agents through the [VSCode ACP](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) community extension. + + +VS Code does not have native ACP support. This extension is maintained by [Omer Cohen](https://github.com/omercnet) and is not officially supported by OpenHands or Microsoft. + + +## Prerequisites + +Before configuring VS Code: + +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **VS Code** - Download from [code.visualstudio.com](https://code.visualstudio.com/) + +## Installation + +### Step 1: Install the Extension + +1. Open VS Code +2. Go to Extensions (`Cmd+Shift+X` on Mac or `Ctrl+Shift+X` on Windows/Linux) +3. Search for **"VSCode ACP"** +4. Click **Install** + +Or install directly from the [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp). + +### Step 2: Connect to OpenHands + +1. Click the **VSCode ACP** icon in the Activity Bar (left sidebar) +2. Click **Connect** to start a session +3. Select **OpenHands** from the agent dropdown +4. Start chatting with OpenHands! + +## How It Works + +The VSCode ACP extension auto-detects installed agents by checking your system PATH. If OpenHands CLI is properly installed, it will appear in the agent dropdown automatically. + +The extension runs `openhands acp` as a subprocess and communicates via the Agent Client Protocol. + +## Verification + +Ensure OpenHands is discoverable: + +```bash +which openhands +# Should return a path like /Users/you/.local/bin/openhands +``` + +If the command is not found, install OpenHands CLI: +```bash +uv tool install openhands --python 3.12 +``` + +## Advanced Usage + +### Custom Arguments + +The VSCode ACP extension may support custom launch arguments. Check the extension's settings for options to pass flags like `--llm-approve`. + +### Resume Conversations + +To resume a conversation, you may need to: + +1. Find your conversation ID: `openhands --resume` +2. Configure the extension to use custom arguments (if supported) +3. Or use the terminal directly: `openhands acp --resume ` + + +The VSCode ACP extension's feature set depends on the extension maintainer. Check the [extension documentation](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) for the latest capabilities. + + +## Troubleshooting + +### OpenHands Not Appearing in Dropdown + +1. Verify OpenHands is installed and in PATH: + ```bash + which openhands + openhands --version + ``` + +2. Restart VS Code after installing OpenHands + +3. Check if the extension recognizes agents: + - Look for any error messages in the extension panel + - Check the VS Code Developer Tools (`Help > Toggle Developer Tools`) + +### Connection Failed + +1. Ensure your LLM settings are configured: + ```bash + openhands + # Use /settings to configure + ``` + +2. Check that `openhands acp` works in terminal: + ```bash + openhands acp + # Should start without errors (Ctrl+C to exit) + ``` + +### Extension Not Working + +1. Update to the latest version of the extension +2. Check for VS Code updates +3. Report issues on the [extension's GitHub](https://github.com/omercnet) + +## Limitations + +Since this is a community extension: + +- Feature availability may vary +- Support depends on the extension maintainer +- Not all OpenHands CLI flags may be accessible through the UI + +For the most control over OpenHands, consider using: +- [Terminal Mode](/openhands/usage/cli/terminal) - Direct CLI usage +- [Zed](/openhands/usage/cli/ide/zed) - Native ACP support + +## See Also + +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [VSCode ACP Extension](https://marketplace.visualstudio.com/items?itemName=omercnet.vscode-acp) - Extension marketplace page +- [Terminal Mode](/openhands/usage/cli/terminal) - Use OpenHands in terminal + +### Zed IDE +Source: https://docs.openhands.dev/openhands/usage/cli/ide/zed.md + +[Zed](https://zed.dev/) is a high-performance code editor with built-in support for the Agent Client Protocol. + + + +## Prerequisites + +Before configuring Zed, ensure you have: + +1. **OpenHands CLI installed** - See [Installation](/openhands/usage/cli/installation) +2. **LLM settings configured** - Run `openhands` and use `/settings` +3. **Zed editor** - Download from [zed.dev](https://zed.dev/) + +## Configuration + +### Step 1: Open Agent Settings + +1. Open Zed +2. Press `Cmd+Shift+P` (Mac) or `Ctrl+Shift+P` (Windows/Linux) to open the command palette +3. Search for `agent: open settings` + +![Zed Command Palette](/openhands/static/img/acp-zed-settings.png) + +### Step 2: Add OpenHands as an Agent + +1. On the right side, click `+ Add Agent` +2. Select `Add Custom Agent` + +![Zed Add Custom Agent](/openhands/static/img/acp-zed-add-agent.png) + +### Step 3: Configure the Agent + +Add the following configuration to the `agent_servers` field: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "uvx", + "args": [ + "openhands", + "acp" + ], + "env": {} + } + } +} +``` + +### Step 4: Save and Use + +1. Save the settings file +2. You can now use OpenHands within Zed! + +![Zed Use OpenHands Agent](/openhands/static/img/acp-zed-use-openhands.png) + +## Advanced Configuration + +### LLM-Approve Mode + +For automatic LLM-based approval of actions: + +```json +{ + "agent_servers": { + "OpenHands (LLM Approve)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--llm-approve" + ], + "env": {} + } + } +} +``` + +### Resume a Specific Conversation + +To resume a previous conversation: + +```json +{ + "agent_servers": { + "OpenHands (Resume)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--resume", + "abc123def456" + ], + "env": {} + } + } +} +``` + +Replace `abc123def456` with your actual conversation ID. Find conversation IDs by running `openhands --resume` in your terminal. + +### Resume Latest Conversation + +```json +{ + "agent_servers": { + "OpenHands (Latest)": { + "command": "uvx", + "args": [ + "openhands", + "acp", + "--resume", + "--last" + ], + "env": {} + } + } +} +``` + +### Multiple Configurations + +You can add multiple OpenHands configurations for different use cases: + +```json +{ + "agent_servers": { + "OpenHands": { + "command": "uvx", + "args": ["openhands", "acp"], + "env": {} + }, + "OpenHands (Auto-Approve)": { + "command": "uvx", + "args": ["openhands", "acp", "--always-approve"], + "env": {} + }, + "OpenHands (Resume Latest)": { + "command": "uvx", + "args": ["openhands", "acp", "--resume", "--last"], + "env": {} + } + } +} +``` + +## Troubleshooting + +### Accessing Debug Logs + +If you encounter issues: + +1. Open the command palette (`Cmd+Shift+P` or `Ctrl+Shift+P`) +2. Type and select `acp debug log` +3. Review the logs for errors or warnings +4. Restart the conversation to reload connections after configuration changes + +### Common Issues + +**"openhands" command not found** + +Ensure OpenHands is installed and in your PATH: +```bash +which openhands +# Should return a path like /Users/you/.local/bin/openhands +``` + +If using `uvx`, ensure uv is installed: +```bash +uv --version +``` + +**Agent doesn't start** + +1. Check that your LLM settings are configured: run `openhands` and verify `/settings` +2. Verify the configuration JSON syntax is valid +3. Check the ACP debug logs for detailed errors + +**Conversation doesn't persist** + +Conversations are stored in `~/.openhands/conversations`. Ensure this directory is writable. + + +After making configuration changes, restart the conversation in Zed to apply them. + + +## See Also + +- [IDE Integration Overview](/openhands/usage/cli/ide/overview) - ACP concepts and other IDEs +- [Zed Documentation](https://zed.dev/docs) - Official Zed documentation +- [Resume Conversations](/openhands/usage/cli/resume) - Find conversation IDs + +### Installation +Source: https://docs.openhands.dev/openhands/usage/cli/installation.md + + +**Windows Users:** The OpenHands CLI requires WSL (Windows Subsystem for Linux). Native Windows is not officially supported. Please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) first, then run all commands inside your WSL terminal. See [Windows Without WSL](/openhands/usage/windows-without-wsl) for an experimental, community-maintained alternative. + + +## Installation Methods + + + + Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/) installed. + + **Install OpenHands:** + ```bash + uv tool install openhands --python 3.12 + ``` + + **Run OpenHands:** + ```bash + openhands + ``` + + **Upgrade OpenHands:** + ```bash + uv tool upgrade openhands --python 3.12 + ``` + + + Install the OpenHands CLI binary with the install script: + + ```bash + curl -fsSL https://install.openhands.dev/install.sh | sh + ``` + + Then run: + ```bash + openhands + ``` + + + Your system may require you to allow permissions to run the executable. + + + When running the OpenHands CLI on Mac, you may get a warning that says "openhands can't be opened because Apple + cannot check it for malicious software." + + 1. Open `System Settings`. + 2. Go to `Privacy & Security`. + 3. Scroll down to `Security` and click `Allow Anyway`. + 4. Rerun the OpenHands CLI. + + ![mac-security](/openhands/static/img/cli-security-mac.png) + + + + + + 1. Set the following environment variable in your terminal: + - `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](/openhands/usage/sandboxes/docker#using-sandbox_volumes)) + + 2. Ensure you have configured your settings before starting: + - Set up `~/.openhands/settings.json` with your LLM configuration + + 3. Run the following command: + + ```bash + docker run -it \ + --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e SANDBOX_USER_ID=$(id -u) \ + -e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/root/.openhands \ + --add-host host.docker.internal:host-gateway \ + --name openhands-cli-$(date +%Y%m%d%H%M%S) \ + python:3.12-slim \ + bash -c "pip install uv && uv tool install openhands --python 3.12 && openhands" + ``` + + The `-e SANDBOX_USER_ID=$(id -u)` is passed to the Docker command to ensure the sandbox user matches the host user's + permissions. This prevents the agent from creating root-owned files in the mounted workspace. + + + +## First Run + +The first time you run the CLI, it will take you through configuring the required LLM settings. These will be saved +for future sessions in `~/.openhands/settings.json`. + +The conversation history will be saved in `~/.openhands/conversations`. + + +If you're upgrading from a CLI version before release 1.0.0, you'll need to redo your settings setup as the +configuration format has changed. + + +## Next Steps + +- [Quick Start](/openhands/usage/cli/quick-start) - Learn the basics of using the CLI +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers + +### MCP Servers +Source: https://docs.openhands.dev/openhands/usage/cli/mcp-servers.md + +## Overview + +[Model Context Protocol (MCP)](https://modelcontextprotocol.io/) servers provide additional tools and context to OpenHands agents. You can add HTTP/SSE servers with authentication or stdio-based local servers to extend what OpenHands can do. + +The CLI provides two ways to manage MCP servers: +1. **CLI commands** (`openhands mcp`) - Manage servers from the command line +2. **Interactive command** (`/mcp`) - View server status within a conversation + + +If you're upgrading from a version before release 1.0.0, you'll need to redo your MCP server configuration as the format has changed from TOML to JSON. + + +## MCP Commands + +### List Servers + +View all configured MCP servers: + +```bash +openhands mcp list +``` + +### Get Server Details + +View details for a specific server: + +```bash +openhands mcp get +``` + +### Remove a Server + +Remove a server configuration: + +```bash +openhands mcp remove +``` + +### Enable/Disable Servers + +Control which servers are active: + +```bash +# Enable a server +openhands mcp enable + +# Disable a server +openhands mcp disable +``` + +## Adding Servers + +### HTTP/SSE Servers + +Add remote servers with HTTP or SSE transport: + +```bash +openhands mcp add --transport http +``` + +#### With Bearer Token Authentication + +```bash +openhands mcp add my-api --transport http \ + --header "Authorization: Bearer your-token" \ + https://api.example.com/mcp +``` + +#### With API Key Authentication + +```bash +openhands mcp add weather-api --transport http \ + --header "X-API-Key: your-api-key" \ + https://weather.api.com +``` + +#### With Multiple Headers + +```bash +openhands mcp add secure-api --transport http \ + --header "Authorization: Bearer token123" \ + --header "X-Client-ID: client456" \ + https://api.example.com +``` + +#### With OAuth Authentication + +```bash +openhands mcp add notion-server --transport http \ + --auth oauth \ + https://mcp.notion.com/mcp +``` + +### Stdio Servers + +Add local servers that communicate via stdio: + +```bash +openhands mcp add --transport stdio -- [args...] +``` + +#### Basic Example + +```bash +openhands mcp add local-server --transport stdio \ + python -- -m my_mcp_server +``` + +#### With Environment Variables + +```bash +openhands mcp add local-server --transport stdio \ + --env "API_KEY=secret123" \ + --env "DATABASE_URL=postgresql://localhost/mydb" \ + python -- -m my_mcp_server --config config.json +``` + +#### Add in Disabled State + +```bash +openhands mcp add my-server --transport stdio --disabled \ + node -- my-server.js +``` + +### Command Reference + +```bash +openhands mcp add --transport [options] [-- args...] +``` + +| Option | Description | +|--------|-------------| +| `--transport` | Transport type: `http`, `sse`, or `stdio` (required) | +| `--header` | HTTP header for http/sse (format: `"Key: Value"`, repeatable) | +| `--env` | Environment variable for stdio (format: `KEY=value`, repeatable) | +| `--auth` | Authentication method (e.g., `oauth`) | +| `--enabled` | Enable immediately (default) | +| `--disabled` | Add in disabled state | + +## Example: Web Search with Tavily + +Add web search capability using [Tavily's MCP server](https://docs.tavily.com/documentation/mcp): + +```bash +openhands mcp add tavily --transport stdio \ + npx -- -y mcp-remote "https://mcp.tavily.com/mcp/?tavilyApiKey=" +``` + +## Manual Configuration + +You can also manually edit the MCP configuration file at `~/.openhands/mcp.json`. + +### Configuration Format + +The file uses the [MCP configuration format](https://gofastmcp.com/clients/client#configuration-format): + +```json +{ + "mcpServers": { + "server-name": { + "command": "command-to-run", + "args": ["arg1", "arg2"], + "env": { + "ENV_VAR": "value" + } + } + } +} +``` + +### Example Configuration + +```json +{ + "mcpServers": { + "tavily-remote": { + "command": "npx", + "args": [ + "-y", + "mcp-remote", + "https://mcp.tavily.com/mcp/?tavilyApiKey=your-api-key" + ] + }, + "local-tools": { + "command": "python", + "args": ["-m", "my_mcp_tools"], + "env": { + "DEBUG": "true" + } + } + } +} +``` + +## Interactive `/mcp` Command + +Within an OpenHands conversation, use `/mcp` to view server status: + +- **View active servers**: Shows which MCP servers are currently active in the conversation +- **View pending changes**: If `mcp.json` has been modified, shows which servers will be mounted when the conversation restarts + + +The `/mcp` command is read-only. Use `openhands mcp` commands to modify server configurations. + + +## Workflow + +1. **Add servers** using `openhands mcp add` +2. **Start a conversation** with `openhands` +3. **Check status** with `/mcp` inside the conversation +4. **Use the tools** provided by your MCP servers + +The agent will automatically have access to tools provided by enabled MCP servers. + +## Troubleshooting + +### Server Not Appearing + +1. Verify the server is enabled: + ```bash + openhands mcp list + ``` + +2. Check the configuration: + ```bash + openhands mcp get + ``` + +3. Restart the conversation to load new configurations + +### Server Fails to Start + +1. Test the command manually: + ```bash + # For stdio servers + python -m my_mcp_server + + # For HTTP servers, check the URL is reachable + curl https://api.example.com/mcp + ``` + +2. Check environment variables and credentials + +3. Review error messages in the CLI output + +### Configuration File Location + +The MCP configuration is stored at: +- **Config file**: `~/.openhands/mcp.json` + +## See Also + +- [Model Context Protocol](https://modelcontextprotocol.io/) - Official MCP documentation +- [MCP Server Settings](/openhands/usage/settings/mcp-settings) - GUI MCP configuration +- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI command reference + +### Quick Start +Source: https://docs.openhands.dev/openhands/usage/cli/quick-start.md + + +**Windows Users:** The CLI requires WSL. See [Installation](/openhands/usage/cli/installation) for details. + + +## Overview + +The OpenHands CLI provides multiple ways to interact with the OpenHands AI agent: + +| Mode | Command | Best For | +|------|---------|----------| +| [Terminal (CLI)](/openhands/usage/cli/terminal) | `openhands` | Interactive development | +| [Headless](/openhands/usage/cli/headless) | `openhands --headless` | Scripts & automation | +| [Web Interface](/openhands/usage/cli/web-interface) | `openhands web` | Browser-based terminal UI | +| [GUI Server](/openhands/usage/cli/gui-server) | `openhands serve` | Full web GUI | +| [IDE Integration](/openhands/usage/cli/ide/overview) | `openhands acp` | Zed, VS Code, JetBrains | + + + +## Your First Conversation + +**Set up your account** (first time only): + + + + ```bash + openhands login + ``` + This authenticates with OpenHands Cloud and fetches your settings. + + + The CLI will prompt you to configure your LLM provider and API key on first run. + + + +1. **Start the CLI:** + ```bash + openhands + ``` + +2. **Enter a task:** + ``` + Create a Python script that prints "Hello, World!" + ``` + +3. **Watch OpenHands work:** + The agent will create the file and show you the results. + +## Controls + +Once inside the CLI, use these controls: + +| Control | Description | +|---------|-------------| +| `Ctrl+P` | Open command palette (access Settings, MCP status) | +| `Esc` | Pause the running agent | +| `Ctrl+Q` or `/exit` | Exit the CLI | + +## Starting with a Task + +You can start the CLI with an initial task: + +```bash +# Start with a task +openhands -t "Fix the bug in auth.py" + +# Start with a task from a file +openhands -f task.txt +``` + +## Resuming Conversations + +Resume a previous conversation: + +```bash +# List recent conversations and select one +openhands --resume + +# Resume the most recent conversation +openhands --resume --last + +# Resume a specific conversation by ID +openhands --resume abc123def456 +``` + +For more details, see [Resume Conversations](/openhands/usage/cli/resume). + +## Next Steps + + + + Learn about the interactive terminal interface + + + Use OpenHands in Zed, VS Code, or JetBrains + + + Automate tasks with scripting + + + Add tools via Model Context Protocol + + + +### Resume Conversations +Source: https://docs.openhands.dev/openhands/usage/cli/resume.md + +## Overview + +OpenHands CLI automatically saves your conversation history in `~/.openhands/conversations`. You can resume any previous conversation to continue where you left off. + +## Listing Previous Conversations + +To see a list of your recent conversations, run: + +```bash +openhands --resume +``` + +This displays up to 15 recent conversations with their IDs, timestamps, and a preview of the first user message: + +``` +Recent Conversations: +-------------------------------------------------------------------------------- + 1. abc123def456 (2h ago) + Fix the login bug in auth.py + + 2. xyz789ghi012 (yesterday) + Add unit tests for the user service + + 3. mno345pqr678 (3 days ago) + Refactor the database connection module +-------------------------------------------------------------------------------- +To resume a conversation, use: openhands --resume +``` + +## Resuming a Specific Conversation + +To resume a specific conversation, use the `--resume` flag with the conversation ID: + +```bash +openhands --resume +``` + +For example: + +```bash +openhands --resume abc123def456 +``` + +## Resuming the Latest Conversation + +To quickly resume your most recent conversation without looking up the ID, use the `--last` flag: + +```bash +openhands --resume --last +``` + +This automatically finds and resumes the most recent conversation. + +## How It Works + +When you resume a conversation: + +1. OpenHands loads the full conversation history from disk +2. The agent has access to all previous context, including: + - Your previous messages and requests + - The agent's responses and actions + - Any files that were created or modified +3. You can continue the conversation as if you never left + + +The conversation history is stored locally on your machine. If you delete the `~/.openhands/conversations` directory, your conversation history will be lost. + + +## Resuming in Different Modes + +### Terminal Mode + +```bash +openhands --resume abc123def456 +openhands --resume --last +``` + +### ACP Mode (IDEs) + +```bash +openhands acp --resume abc123def456 +openhands acp --resume --last +``` + +For IDE-specific configurations, see: +- [Zed](/openhands/usage/cli/ide/zed#resume-a-specific-conversation) +- [Toad](/openhands/usage/cli/ide/toad#resume-a-conversation) +- [JetBrains](/openhands/usage/cli/ide/jetbrains#resume-a-conversation) + +### With Confirmation Modes + +Combine `--resume` with confirmation mode flags: + +```bash +# Resume with LLM-based approval +openhands --resume abc123def456 --llm-approve + +# Resume with auto-approve +openhands --resume --last --always-approve +``` + +## Tips + + +**Copy the conversation ID**: When you exit a conversation, OpenHands displays the conversation ID. Copy this for later use. + + + +**Use descriptive first messages**: The conversation list shows a preview of your first message, so starting with a clear description helps you identify conversations later. + + +## Storage Location + +Conversations are stored in: + +``` +~/.openhands/conversations/ +├── abc123def456/ +│ └── conversation.json +├── xyz789ghi012/ +│ └── conversation.json +└── ... +``` + +## See Also + +- [Terminal Mode](/openhands/usage/cli/terminal) - Interactive CLI usage +- [IDE Integration](/openhands/usage/cli/ide/overview) - Resuming in IDEs +- [Command Reference](/openhands/usage/cli/command-reference) - Full CLI reference + +### Terminal (CLI) +Source: https://docs.openhands.dev/openhands/usage/cli/terminal.md + +## Overview + +The Command Line Interface (CLI) is the default mode when you run `openhands`. It provides a rich, interactive experience directly in your terminal. + +```bash +openhands +``` + +## Features + +- **Real-time interaction**: Type natural language tasks and receive instant feedback +- **Live status monitoring**: Watch the agent's progress as it works +- **Command palette**: Press `Ctrl+P` to access settings, MCP status, and more + +## Command Palette + +Press `Ctrl+P` to open the command palette, then select from the dropdown options: + +| Option | Description | +|--------|-------------| +| **Settings** | Open the settings configuration menu | +| **MCP** | View MCP server status | + +## Controls + +| Control | Action | +|---------|--------| +| `Ctrl+P` | Open command palette | +| `Esc` | Pause the running agent | +| `Ctrl+Q` or `/exit` | Exit the CLI | + +## Starting with a Task + +Start a conversation with an initial task: + +```bash +# Provide a task directly +openhands -t "Create a REST API for user management" + +# Load task from a file +openhands -f requirements.txt +``` + +## Confirmation Modes + +Control how the agent requests approval for actions: + +```bash +# Default: Always ask for confirmation +openhands + +# Auto-approve all actions (use with caution) +openhands --always-approve + +# Use LLM-based security analyzer +openhands --llm-approve +``` + +## Resuming Conversations + +Resume previous conversations: + +```bash +# List recent conversations +openhands --resume + +# Resume the most recent +openhands --resume --last + +# Resume a specific conversation +openhands --resume abc123def456 +``` + +For more details, see [Resume Conversations](/openhands/usage/cli/resume). + +## Tips + + +Press `Ctrl+P` and select **Settings** to quickly adjust your LLM configuration without restarting the CLI. + + + +Press `Esc` to pause the agent if it's going in the wrong direction, then provide clarification. + + +## See Also + +- [Quick Start](/openhands/usage/cli/quick-start) - Get started with the CLI +- [MCP Servers](/openhands/usage/cli/mcp-servers) - Configure MCP servers +- [Headless Mode](/openhands/usage/cli/headless) - Run without UI for automation + +### Web Interface +Source: https://docs.openhands.dev/openhands/usage/cli/web-interface.md + +## Overview + +The `openhands web` command launches the CLI's terminal interface as a web application, accessible through your browser. This is useful when you want to: +- Access the CLI remotely +- Share your terminal session +- Use the CLI on devices without a full terminal + +```bash +openhands web +``` + + +This is different from `openhands serve`, which launches the full GUI web application. The web interface runs the same terminal UI experience you see in the terminal, just in a browser. + + +## Basic Usage + +```bash +# Start on default port (12000) +openhands web + +# Access at http://localhost:12000 +``` + +## Options + +| Option | Default | Description | +|--------|---------|-------------| +| `--host` | `0.0.0.0` | Host address to bind to | +| `--port` | `12000` | Port number to use | +| `--debug` | `false` | Enable debug mode | + +## Examples + +```bash +# Custom port +openhands web --port 8080 + +# Bind to localhost only (more secure) +openhands web --host 127.0.0.1 + +# Enable debug mode +openhands web --debug + +# Full example with custom host and port +openhands web --host 0.0.0.0 --port 3000 +``` + +## Remote Access + +To access the web interface from another machine: + +1. Start with `--host 0.0.0.0` to bind to all interfaces: + ```bash + openhands web --host 0.0.0.0 --port 12000 + ``` + +2. Access from another machine using the host's IP: + ``` + http://:12000 + ``` + + +When exposing the web interface to the network, ensure you have appropriate security measures in place. The web interface provides full access to OpenHands capabilities. + + +## Use Cases + +### Development on Remote Servers + +Access OpenHands on a remote development server through your local browser: + +```bash +# On remote server +openhands web --host 0.0.0.0 --port 12000 + +# On local machine, use SSH tunnel +ssh -L 12000:localhost:12000 user@remote-server + +# Access at http://localhost:12000 +``` + +### Sharing Sessions + +Run the web interface on a shared server for team access: + +```bash +openhands web --host 0.0.0.0 --port 8080 +``` + +## Comparison: Web Interface vs GUI Server + +| Feature | `openhands web` | `openhands serve` | +|---------|-----------------|-------------------| +| Interface | Terminal UI in browser | Full web GUI | +| Dependencies | None | Docker required | +| Resources | Lightweight | Full container | +| Best for | Quick access | Rich GUI experience | + +## See Also + +- [Terminal Mode](/openhands/usage/cli/terminal) - Direct terminal usage +- [GUI Server](/openhands/usage/cli/gui-server) - Full web GUI with Docker +- [Command Reference](/openhands/usage/cli/command-reference) - All CLI options + +## OpenHands Web App Server + +### About OpenHands +Source: https://docs.openhands.dev/openhands/usage/about.md + +## Research Strategy + +Achieving full replication of production-grade applications with LLMs is a complex endeavor. Our strategy involves: + +- **Core Technical Research:** Focusing on foundational research to understand and improve the technical aspects of code generation and handling. +- **Task Planning:** Developing capabilities for bug detection, codebase management, and optimization. +- **Evaluation:** Establishing comprehensive evaluation metrics to better understand and improve our agents. + +## Default Agent + +Our default Agent is currently the [CodeActAgent](./agents), which is capable of generating code and handling files. + +## Built With + +OpenHands is built using a combination of powerful frameworks and libraries, providing a robust foundation for its +development. Here are the key technologies used in the project: + +![FastAPI](https://img.shields.io/badge/FastAPI-black?style=for-the-badge) ![uvicorn](https://img.shields.io/badge/uvicorn-black?style=for-the-badge) ![LiteLLM](https://img.shields.io/badge/LiteLLM-black?style=for-the-badge) ![Docker](https://img.shields.io/badge/Docker-black?style=for-the-badge) ![Ruff](https://img.shields.io/badge/Ruff-black?style=for-the-badge) ![MyPy](https://img.shields.io/badge/MyPy-black?style=for-the-badge) ![LlamaIndex](https://img.shields.io/badge/LlamaIndex-black?style=for-the-badge) ![React](https://img.shields.io/badge/React-black?style=for-the-badge) + +Please note that the selection of these technologies is in progress, and additional technologies may be added or +existing ones may be removed as the project evolves. We strive to adopt the most suitable and efficient tools to +enhance the capabilities of OpenHands. + +## License + +Distributed under MIT [License](https://github.com/OpenHands/OpenHands/blob/main/LICENSE). + +### Configuration Options +Source: https://docs.openhands.dev/openhands/usage/advanced/configuration-options.md + + + This page documents the current V1 configuration model. + + Legacy config.toml / “runtime” configuration docs have been moved + to the Legacy (V0) section of the Web tab. + + +## Where configuration lives in V1 + +Most user-facing configuration is done via the **Settings** UI in the Web app +(LLM provider/model, integrations, MCP, secrets, etc.). + +For self-hosted deployments and advanced workflows, OpenHands also supports +environment-variable configuration. + +## Common V1 environment variables + +These are some commonly used variables in V1 deployments: + +- **LLM credentials** + - LLM_API_KEY + - LLM_MODEL + +- **Persistence** + - OH_PERSISTENCE_DIR: where OpenHands stores local state (defaults to + ~/.openhands). + +- **Public URL (optional)** + - OH_WEB_URL: the externally reachable URL of your OpenHands instance + (used for callbacks in some deployments). + +- **Sandbox workspace mounting** + - SANDBOX_VOLUMES: mount host directories into the sandbox (see + [Docker Sandbox](/openhands/usage/sandboxes/docker)). + +- **Sandbox image selection** + - AGENT_SERVER_IMAGE_REPOSITORY + - AGENT_SERVER_IMAGE_TAG + + +## Sandbox provider selection + +Some deployments still use the legacy RUNTIME environment variable to +choose which sandbox provider to use: + +- RUNTIME=docker (default) +- RUNTIME=process (aka legacy RUNTIME=local) +- RUNTIME=remote + +See [Sandboxes overview](/openhands/usage/sandboxes/overview) for details. + +## Need legacy options? + +If you are looking for the old config.toml reference or V0 “runtime” +providers, see: + +- Web → Legacy (V0) → V0 Configuration Options +- Web → Legacy (V0) → V0 Runtime Configuration + +### Custom Sandbox +Source: https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide.md + + + These settings are only available in [Local GUI](/openhands/usage/run-openhands/local-setup). OpenHands Cloud uses managed sandbox environments. + + +The sandbox is where the agent performs its tasks. Instead of running commands directly on your computer +(which could be risky), the agent runs them inside a Docker container. + +The default OpenHands sandbox (`python-nodejs:python3.12-nodejs22` +from [nikolaik/python-nodejs](https://hub.docker.com/r/nikolaik/python-nodejs)) comes with some packages installed such +as python and Node.js but may need other software installed by default. + +You have two options for customization: + +- Use an existing image with the required software. +- Create your own custom Docker image. + +If you choose the first option, you can skip the `Create Your Docker Image` section. + +## Create Your Docker Image + +To create a custom Docker image, it must be Debian based. + +For example, if you want OpenHands to have `ruby` installed, you could create a `Dockerfile` with the following content: + +```dockerfile +FROM nikolaik/python-nodejs:python3.12-nodejs22 + +# Install required packages +RUN apt-get update && apt-get install -y ruby +``` + +Or you could use a Ruby-specific base image: + +```dockerfile +FROM ruby:latest +``` + +Save this file in a folder. Then, build your Docker image (e.g., named custom-image) by navigating to the folder in +the terminal and running:: +```bash +docker build -t custom-image . +``` + +This will produce a new image called `custom-image`, which will be available in Docker. + +## Using the Docker Command + +When running OpenHands using [the docker command](/openhands/usage/run-openhands/local-setup#start-the-app), replace +the `AGENT_SERVER_IMAGE_REPOSITORY` and `AGENT_SERVER_IMAGE_TAG` environment variables with `-e SANDBOX_BASE_CONTAINER_IMAGE=`: + +```commandline +docker run -it --rm --pull=always \ + -e SANDBOX_BASE_CONTAINER_IMAGE=custom-image \ + ... +``` + +## Using the Development Workflow + +### Setup + +First, ensure you can run OpenHands by following the instructions in [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md). + +### Specify the Base Sandbox Image + +In the `config.toml` file within the OpenHands directory, set the `base_container_image` to the image you want to use. +This can be an image you’ve already pulled or one you’ve built: + +```bash +[core] +... +[sandbox] +base_container_image="custom-image" +``` + +### Additional Configuration Options + +The `config.toml` file supports several other options for customizing your sandbox: + +```toml +[core] +# Install additional dependencies when the runtime is built +# Can contain any valid shell commands +# If you need the path to the Python interpreter in any of these commands, you can use the $OH_INTERPRETER_PATH variable +runtime_extra_deps = """ +pip install numpy pandas +apt-get update && apt-get install -y ffmpeg +""" + +# Set environment variables for the runtime +# Useful for configuration that needs to be available at runtime +runtime_startup_env_vars = { DATABASE_URL = "postgresql://user:pass@localhost/db" } + +# Specify platform for multi-architecture builds (e.g., "linux/amd64" or "linux/arm64") +platform = "linux/amd64" +``` + +### Run + +Run OpenHands by running ```make run``` in the top level directory. + +### Search Engine Setup +Source: https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup.md + +## Setting Up Search Engine in OpenHands + +OpenHands can be configured to use [Tavily](https://tavily.com/) as a search engine, which allows the agent to +search the web for information when needed. This capability enhances the agent's ability to provide up-to-date +information and solve problems that require external knowledge. + + + Tavily is configured as a search engine by default in OpenHands Cloud! + + +### Getting a Tavily API Key + +To use the search functionality in OpenHands, you'll need to obtain a Tavily API key: + +1. Visit [Tavily's website](https://tavily.com/) and sign up for an account. +2. Navigate to the API section in your dashboard. +3. Generate a new API key. +4. Copy the API key (it should start with `tvly-`). + +### Configuring Search in OpenHands + +Once you have your Tavily API key, you can configure OpenHands to use it: + +#### In the OpenHands UI + +1. Open OpenHands and navigate to the `Settings > LLM` page. +2. Enter your Tavily API key (starting with `tvly-`) in the `Search API Key (Tavily)` field. +3. Click `Save` to apply the changes. + + + The search API key field is optional. If you don't provide a key, the search functionality will not be available to + the agent. + + +#### Using Configuration Files + +If you're running OpenHands in headless mode or via CLI, you can configure the search API key in your configuration file: + +```toml +# In your OpenHands config file +[core] +search_api_key = "tvly-your-api-key-here" +``` + +### How Search Works in OpenHands + +When the search engine is configured: + +- The agent can decide to search the web when it needs external information. +- Search queries are sent to Tavily's API via [Tavily's MCP server](https://github.com/tavily-ai/tavily-mcp) which + includes a variety of [tools](https://docs.tavily.com/documentation/api-reference/introduction) (search, extract, crawl, map). +- Results are returned and incorporated into the agent's context. +- The agent can use this information to provide more accurate and up-to-date responses. + +### Limitations + +- Search results depend on Tavily's coverage and freshness. +- Usage may be subject to Tavily's rate limits and pricing tiers. +- The agent will only search when it determines that external information is needed. + +### Troubleshooting + +If you encounter issues with the search functionality: + +- Verify that your API key is correct and active. +- Check that your API key starts with `tvly-`. +- Ensure you have an active internet connection. +- Check Tavily's status page for any service disruptions. + +### Main Agent and Capabilities +Source: https://docs.openhands.dev/openhands/usage/agents.md + +## CodeActAgent + +### Description + +This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a +unified **code** action space for both _simplicity_ and _performance_. + +The conceptual idea is illustrated below. At each turn, the agent can: + +1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc. +2. **CodeAct**: Choose to perform the task by executing code + +- Execute any valid Linux `bash` command +- Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details. + +![image](https://github.com/OpenHands/OpenHands/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3) + +### Demo + +https://github.com/OpenHands/OpenHands/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac + +_Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)_. + +### REST API (V1) +Source: https://docs.openhands.dev/openhands/usage/api/v1.md + + + OpenHands is in a transition period: legacy (V0) endpoints still exist alongside + the new /api/v1 endpoints. + + If you need the legacy OpenAPI reference, see the Legacy (V0) section in the Web tab. + + +## Overview + +OpenHands V1 REST endpoints are mounted under: + +- /api/v1 + +These endpoints back the current Web UI and are intended for newer integrations. + +## Key resources + +The V1 API is organized around a few core concepts: + +- **App conversations**: create/list conversations and access conversation metadata. + - POST /api/v1/app-conversations + - GET /api/v1/app-conversations + +- **Sandboxes**: list/start/pause/resume the execution environments that power conversations. + - GET /api/v1/sandboxes/search + - POST /api/v1/sandboxes + - POST /api/v1/sandboxes/{id}/pause + - POST /api/v1/sandboxes/{id}/resume + +- **Sandbox specs**: list the available sandbox “templates” (e.g., Docker image presets). + - GET /api/v1/sandbox-specs/search + +### Backend Architecture +Source: https://docs.openhands.dev/openhands/usage/architecture/backend.md + +This is a high-level overview of the system architecture. The system is divided into two main components: the frontend and the backend. The frontend is responsible for handling user interactions and displaying the results. The backend is responsible for handling the business logic and executing the agents. + +# System overview + +```mermaid +flowchart LR + U["User"] --> FE["Frontend (SPA)"] + FE -- "HTTP/WS" --> BE["OpenHands Backend"] + BE --> ES["EventStream"] + BE --> ST["Storage"] + BE --> RT["Runtime Interface"] + BE --> LLM["LLM Providers"] + + subgraph Runtime + direction TB + RT --> DRT["Docker Runtime"] + RT --> LRT["Local Runtime"] + RT --> RRT["Remote Runtime"] + DRT --> AES["Action Execution Server"] + LRT --> AES + RRT --> AES + AES --> Bash["Bash Session"] + AES --> Jupyter["Jupyter Plugin"] + AES --> Browser["BrowserEnv"] + end +``` + +This Overview is simplified to show the main components and their interactions. For a more detailed view of the backend architecture, see the Backend Architecture section below. + +# Backend Architecture + + +```mermaid +classDiagram + class Agent { + <> + +sandbox_plugins: list[PluginRequirement] + } + class CodeActAgent { + +tools + } + Agent <|-- CodeActAgent + + class EventStream + class Observation + class Action + Action --> Observation + Agent --> EventStream + + class Runtime { + +connect() + +send_action_for_execution() + } + class ActionExecutionClient { + +_send_action_server_request() + } + class DockerRuntime + class LocalRuntime + class RemoteRuntime + Runtime <|-- ActionExecutionClient + ActionExecutionClient <|-- DockerRuntime + ActionExecutionClient <|-- LocalRuntime + ActionExecutionClient <|-- RemoteRuntime + + class ActionExecutionServer { + +/execute_action + +/alive + } + class BashSession + class JupyterPlugin + class BrowserEnv + ActionExecutionServer --> BashSession + ActionExecutionServer --> JupyterPlugin + ActionExecutionServer --> BrowserEnv + + Agent --> Runtime + Runtime ..> ActionExecutionServer : REST +``` + +
+ Updating this Diagram +
+ We maintain architecture diagrams inline with Mermaid in this MDX. + + Guidance: + - Edit the Mermaid blocks directly (flowchart/classDiagram). + - Quote labels and edge text for GitHub preview compatibility. + - Keep relationships concise and reflect stable abstractions (agents, runtime client/server, plugins). + - Verify accuracy against code: + - openhands/runtime/impl/action_execution/action_execution_client.py + - openhands/runtime/impl/docker/docker_runtime.py + - openhands/runtime/impl/local/local_runtime.py + - openhands/runtime/action_execution_server.py + - openhands/runtime/plugins/* + - Build docs locally or view on GitHub to confirm diagrams render. + +
+
+ +### Runtime Architecture +Source: https://docs.openhands.dev/openhands/usage/architecture/runtime.md + +The OpenHands Docker Runtime is the core component that enables secure and flexible execution of AI agent's action. +It creates a sandboxed environment using Docker, where arbitrary code can be run safely without risking the host system. + +## Why do we need a sandboxed runtime? + +OpenHands needs to execute arbitrary code in a secure, isolated environment for several reasons: + +1. Security: Executing untrusted code can pose significant risks to the host system. A sandboxed environment prevents malicious code from accessing or modifying the host system's resources +2. Consistency: A sandboxed environment ensures that code execution is consistent across different machines and setups, eliminating "it works on my machine" issues +3. Resource Control: Sandboxing allows for better control over resource allocation and usage, preventing runaway processes from affecting the host system +4. Isolation: Different projects or users can work in isolated environments without interfering with each other or the host system +5. Reproducibility: Sandboxed environments make it easier to reproduce bugs and issues, as the execution environment is consistent and controllable + +## How does the Runtime work? + +The OpenHands Runtime system uses a client-server architecture implemented with Docker containers. Here's an overview of how it works: + +```mermaid +graph TD + A[User-provided Custom Docker Image] --> B[OpenHands Backend] + B -->|Builds| C[OH Runtime Image] + C -->|Launches| D[Action Executor] + D -->|Initializes| E[Browser] + D -->|Initializes| F[Bash Shell] + D -->|Initializes| G[Plugins] + G -->|Initializes| L[Jupyter Server] + + B -->|Spawn| H[Agent] + B -->|Spawn| I[EventStream] + I <--->|Execute Action to + Get Observation + via REST API + | D + + H -->|Generate Action| I + I -->|Obtain Observation| H + + subgraph "Docker Container" + D + E + F + G + L + end +``` + +1. User Input: The user provides a custom base Docker image +2. Image Building: OpenHands builds a new Docker image (the "OH runtime image") based on the user-provided image. This new image includes OpenHands-specific code, primarily the "runtime client" +3. Container Launch: When OpenHands starts, it launches a Docker container using the OH runtime image +4. Action Execution Server Initialization: The action execution server initializes an `ActionExecutor` inside the container, setting up necessary components like a bash shell and loading any specified plugins +5. Communication: The OpenHands backend (client: `openhands/runtime/impl/action_execution/action_execution_client.py`; runtimes: `openhands/runtime/impl/docker/docker_runtime.py`, `openhands/runtime/impl/local/local_runtime.py`) communicates with the action execution server over RESTful API, sending actions and receiving observations +6. Action Execution: The runtime client receives actions from the backend, executes them in the sandboxed environment, and sends back observations +7. Observation Return: The action execution server sends execution results back to the OpenHands backend as observations + +The role of the client: + +- It acts as an intermediary between the OpenHands backend and the sandboxed environment +- It executes various types of actions (shell commands, file operations, Python code, etc.) safely within the container +- It manages the state of the sandboxed environment, including the current working directory and loaded plugins +- It formats and returns observations to the backend, ensuring a consistent interface for processing results + +## How OpenHands builds and maintains OH Runtime images + +OpenHands' approach to building and managing runtime images ensures efficiency, consistency, and flexibility in creating and maintaining Docker images for both production and development environments. + +Check out the [relevant code](https://github.com/OpenHands/OpenHands/blob/main/openhands/runtime/utils/runtime_build.py) if you are interested in more details. + +### Image Tagging System + +OpenHands uses a three-tag system for its runtime images to balance reproducibility with flexibility. +The tags are: + +- **Versioned Tag**: `oh_v{openhands_version}_{base_image}` (e.g.: `oh_v0.9.9_nikolaik_s_python-nodejs_t_python3.12-nodejs22`) +- **Lock Tag**: `oh_v{openhands_version}_{16_digit_lock_hash}` (e.g.: `oh_v0.9.9_1234567890abcdef`) +- **Source Tag**: `oh_v{openhands_version}_{16_digit_lock_hash}_{16_digit_source_hash}` + (e.g.: `oh_v0.9.9_1234567890abcdef_1234567890abcdef`) + +#### Source Tag - Most Specific + +This is the first 16 digits of the MD5 of the directory hash for the source directory. This gives a hash +for only the openhands source + +#### Lock Tag + +This hash is built from the first 16 digits of the MD5 of: + +- The name of the base image upon which the image was built (e.g.: `nikolaik/python-nodejs:python3.12-nodejs22`) +- The content of the `pyproject.toml` included in the image. +- The content of the `poetry.lock` included in the image. + +This effectively gives a hash for the dependencies of Openhands independent of the source code. + +#### Versioned Tag - Most Generic + +This tag is a concatenation of openhands version and the base image name (transformed to fit in tag standard). + +#### Build Process + +When generating an image... + +- **No re-build**: OpenHands first checks whether an image with the same **most specific source tag** exists. If there is such an image, + no build is performed - the existing image is used. +- **Fastest re-build**: OpenHands next checks whether an image with the **generic lock tag** exists. If there is such an image, + OpenHands builds a new image based upon it, bypassing all installation steps (like `poetry install` and + `apt-get`) except a final operation to copy the current source code. The new image is tagged with a + **source** tag only. +- **Ok-ish re-build**: If neither a **source** nor **lock** tag exists, an image will be built based upon the **versioned** tag image. + In versioned tag image, most dependencies should already been installed hence saving time. +- **Slowest re-build**: If all of the three tags don't exists, a brand new image is built based upon the base + image (Which is a slower operation). This new image is tagged with all the **source**, **lock**, and **versioned** tags. + +This tagging approach allows OpenHands to efficiently manage both development and production environments. + +1. Identical source code and Dockerfile always produce the same image (via hash-based tags) +2. The system can quickly rebuild images when minor changes occur (by leveraging recent compatible images) +3. The **lock** tag (e.g., `runtime:oh_v0.9.3_1234567890abcdef`) always points to the latest build for a particular base image, dependency, and OpenHands version combination + +## Volume mounts: named volumes and overlay + +OpenHands supports both bind mounts and Docker named volumes in SandboxConfig.volumes: + +- Bind mount: "/abs/host/path:/container/path[:mode]" +- Named volume: "volume:``:/container/path[:mode]" or any non-absolute host spec treated as a named volume + +Overlay mode (copy-on-write layer) is supported for bind mounts by appending ":overlay" to the mode (e.g., ":ro,overlay"). +To enable overlay COW, set SANDBOX_VOLUME_OVERLAYS to a writable host directory; per-container upper/work dirs are created under it. If SANDBOX_VOLUME_OVERLAYS is unset, overlay mounts are skipped. + +Implementation references: +- openhands/runtime/impl/docker/docker_runtime.py (named volumes in _build_docker_run_args; overlay mounts in _process_overlay_mounts) +- openhands/core/config/sandbox_config.py (volumes field) + + +## Runtime Plugin System + +The OpenHands Runtime supports a plugin system that allows for extending functionality and customizing the runtime environment. Plugins are initialized when the action execution server starts up inside the runtime. + +## Ports and URLs + +- Host port allocation uses file-locked ranges for stability and concurrency: + - Main runtime port: find_available_port_with_lock on configured range + - VSCode port: SandboxConfig.sandbox.vscode_port if provided, else find_available_port_with_lock in VSCODE_PORT_RANGE + - App ports: two additional ranges for plugin/web apps +- DOCKER_HOST_ADDR (if set) adjusts how URLs are formed for LocalRuntime/Docker environments. +- VSCode URL is exposed with a connection token from the action execution server endpoint /vscode/connection_token and rendered as: + - Docker/Local: `http://localhost:{port}/?tkn={token}&folder={workspace_mount_path_in_sandbox}` + - RemoteRuntime: `scheme://vscode-{host}/?tkn={token}&folder={workspace_mount_path_in_sandbox}` + +References: +- openhands/runtime/impl/docker/docker_runtime.py (port ranges, locking, DOCKER_HOST_ADDR, vscode_url) +- openhands/runtime/impl/local/local_runtime.py (vscode_url factory) +- openhands/runtime/impl/remote/remote_runtime.py (vscode_url mapping) +- openhands/runtime/action_execution_server.py (/vscode/connection_token) + + +Examples: +- Jupyter: openhands/runtime/plugins/jupyter/__init__.py (JupyterPlugin, Kernel Gateway) +- VS Code: openhands/runtime/plugins/vscode/* (VSCodePlugin, exposes tokenized URL) +- Agent Skills: openhands/runtime/plugins/agent_skills/* + +Key aspects of the plugin system: + +1. Plugin Definition: Plugins are defined as Python classes that inherit from a base `Plugin` class +2. Plugin Registration: Available plugins are registered in `openhands/runtime/plugins/__init__.py` via `ALL_PLUGINS` +3. Plugin Specification: Plugins are associated with `Agent.sandbox_plugins: list[PluginRequirement]`. Users can specify which plugins to load when initializing the runtime +4. Initialization: Plugins are initialized asynchronously when the runtime starts and are accessible to actions +5. Usage: Plugins extend capabilities (e.g., Jupyter for IPython cells); the server exposes any web endpoints (ports) via host port mapping + +### Repository Customization +Source: https://docs.openhands.dev/openhands/usage/customization/repository.md + +## Skills (formerly Microagents) + +Skills allow you to extend OpenHands prompts with information specific to your project and define how OpenHands +should function. See [Skills Overview](/overview/skills) for more information. + + +## Setup Script +You can add a `.openhands/setup.sh` file, which will run every time OpenHands begins working with your repository. +This is an ideal location for installing dependencies, setting environment variables, and performing other setup tasks. + +For example: +```bash +#!/bin/bash +export MY_ENV_VAR="my value" +sudo apt-get update +sudo apt-get install -y lsof +cd frontend && npm install ; cd .. +``` + +## Pre-commit Script +You can add a `.openhands/pre-commit.sh` file to create a custom git pre-commit hook that runs before each commit. +This can be used to enforce code quality standards, run tests, or perform other checks before allowing commits. + +For example: +```bash +#!/bin/bash +# Run linting checks +cd frontend && npm run lint +if [ $? -ne 0 ]; then + echo "Frontend linting failed. Please fix the issues before committing." + exit 1 +fi + +# Run tests +cd backend && pytest tests/unit +if [ $? -ne 0 ]; then + echo "Backend tests failed. Please fix the issues before committing." + exit 1 +fi + +exit 0 +``` + +### Debugging +Source: https://docs.openhands.dev/openhands/usage/developers/debugging.md + +The following is intended as a primer on debugging OpenHands for Development purposes. + +## Server / VSCode + +The following `launch.json` will allow debugging the agent, controller and server elements, but not the sandbox (Which runs inside docker). It will ignore any changes inside the `workspace/` directory: + +``` +{ + "version": "0.2.0", + "configurations": [ + { + "name": "OpenHands CLI", + "type": "debugpy", + "request": "launch", + "module": "openhands.cli.main", + "justMyCode": false + }, + { + "name": "OpenHands WebApp", + "type": "debugpy", + "request": "launch", + "module": "uvicorn", + "args": [ + "openhands.server.listen:app", + "--reload", + "--reload-exclude", + "${workspaceFolder}/workspace", + "--port", + "3000" + ], + "justMyCode": false + } + ] +} +``` + +More specific debugging configurations which include more parameters may be specified: + +``` + ... + { + "name": "Debug CodeAct", + "type": "debugpy", + "request": "launch", + "module": "openhands.core.main", + "args": [ + "-t", + "Ask me what your task is.", + "-d", + "${workspaceFolder}/workspace", + "-c", + "CodeActAgent", + "-l", + "llm.o1", + "-n", + "prompts" + ], + "justMyCode": false + } + ... +``` + +Values in the snippet above can be updated such that: + + * *t*: the task + * *d*: the openhands workspace directory + * *c*: the agent + * *l*: the LLM config (pre-defined in config.toml) + * *n*: session name (e.g. eventstream name) + +### Development Overview +Source: https://docs.openhands.dev/openhands/usage/developers/development-overview.md + +## Core Documentation + +### Project Fundamentals +- **Main Project Overview** (`/README.md`) + The primary entry point for understanding OpenHands, including features and basic setup instructions. + +- **Development Guide** (`/Development.md`) + Guide for developers working on OpenHands, including setup, requirements, and development workflows. + +- **Contributing Guidelines** (`/CONTRIBUTING.md`) + Essential information for contributors, covering code style, PR process, and contribution workflows. + +### Component Documentation + +#### Frontend +- **Frontend Application** (`/frontend/README.md`) + Complete guide for setting up and developing the React-based frontend application. + +#### Backend +- **Backend Implementation** (`/openhands/README.md`) + Detailed documentation of the Python backend implementation and architecture. + +- **Server Documentation** (`/openhands/server/README.md`) + Server implementation details, API documentation, and service architecture. + +- **Runtime Environment** (`/openhands/runtime/README.md`) + Documentation covering the runtime environment, execution model, and runtime configurations. + +#### Infrastructure +- **Container Documentation** (`/containers/README.md`) + Information about Docker containers, deployment strategies, and container management. + +### Testing and Evaluation +- **Unit Testing Guide** (`/tests/unit/README.md`) + Instructions for writing, running, and maintaining unit tests. + +- **Evaluation Framework** (`/evaluation/README.md`) + Documentation for the evaluation framework, benchmarks, and performance testing. + +### Advanced Features +- **Skills (formerly Microagents) Architecture** (`/microagents/README.md`) + Detailed information about the skills architecture, implementation, and usage. + +### Documentation Standards +- **Documentation Style Guide** (`/docs/DOC_STYLE_GUIDE.md`) + Standards and guidelines for writing and maintaining project documentation. + +## Getting Started with Development + +If you're new to developing with OpenHands, we recommend following this sequence: + +1. Start with the main `README.md` to understand the project's purpose and features +2. Review the `CONTRIBUTING.md` guidelines if you plan to contribute +3. Follow the setup instructions in `Development.md` +4. Dive into specific component documentation based on your area of interest: + - Frontend developers should focus on `/frontend/README.md` + - Backend developers should start with `/openhands/README.md` + - Infrastructure work should begin with `/containers/README.md` + +## Documentation Updates + +When making changes to the codebase, please ensure that: +1. Relevant documentation is updated to reflect your changes +2. New features are documented in the appropriate README files +3. Any API changes are reflected in the server documentation +4. Documentation follows the style guide in `/docs/DOC_STYLE_GUIDE.md` + +### Evaluation Harness +Source: https://docs.openhands.dev/openhands/usage/developers/evaluation-harness.md + +This guide provides an overview of how to integrate your own evaluation benchmark into the OpenHands framework. + +## Setup Environment and LLM Configuration + +Please follow instructions [here](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to setup your local development environment. +OpenHands in development mode uses `config.toml` to keep track of most configurations. + +Here's an example configuration file you can use to define and use multiple LLMs: + +```toml +[llm] +# IMPORTANT: add your API key here, and set the model to the one you want to evaluate +model = "claude-3-5-sonnet-20241022" +api_key = "sk-XXX" + +[llm.eval_gpt4_1106_preview_llm] +model = "gpt-4-1106-preview" +api_key = "XXX" +temperature = 0.0 + +[llm.eval_some_openai_compatible_model_llm] +model = "openai/MODEL_NAME" +base_url = "https://OPENAI_COMPATIBLE_URL/v1" +api_key = "XXX" +temperature = 0.0 +``` + + +## How to use OpenHands in the command line + +OpenHands can be run from the command line using the following format: + +```bash +poetry run python ./openhands/core/main.py \ + -i \ + -t "" \ + -c \ + -l +``` + +For example: + +```bash +poetry run python ./openhands/core/main.py \ + -i 10 \ + -t "Write me a bash script that prints hello world." \ + -c CodeActAgent \ + -l llm +``` + +This command runs OpenHands with: +- A maximum of 10 iterations +- The specified task description +- Using the CodeActAgent +- With the LLM configuration defined in the `llm` section of your `config.toml` file + +## How does OpenHands work + +The main entry point for OpenHands is in `openhands/core/main.py`. Here's a simplified flow of how it works: + +1. Parse command-line arguments and load the configuration +2. Create a runtime environment using `create_runtime()` +3. Initialize the specified agent +4. Run the controller using `run_controller()`, which: + - Attaches the runtime to the agent + - Executes the agent's task + - Returns a final state when complete + +The `run_controller()` function is the core of OpenHands's execution. It manages the interaction between the agent, the runtime, and the task, handling things like user input simulation and event processing. + + +## Easiest way to get started: Exploring Existing Benchmarks + +We encourage you to review the various evaluation benchmarks available in the [`evaluation/benchmarks/` directory](https://github.com/OpenHands/benchmarks) of our repository. + +To integrate your own benchmark, we suggest starting with the one that most closely resembles your needs. This approach can significantly streamline your integration process, allowing you to build upon existing structures and adapt them to your specific requirements. + +## How to create an evaluation workflow + + +To create an evaluation workflow for your benchmark, follow these steps: + +1. Import relevant OpenHands utilities: + ```python + import openhands.agenthub + from evaluation.utils.shared import ( + EvalMetadata, + EvalOutput, + make_metadata, + prepare_dataset, + reset_logger_for_multiprocessing, + run_evaluation, + ) + from openhands.controller.state.state import State + from openhands.core.config import ( + AppConfig, + SandboxConfig, + get_llm_config_arg, + parse_arguments, + ) + from openhands.core.logger import openhands_logger as logger + from openhands.core.main import create_runtime, run_controller + from openhands.events.action import CmdRunAction + from openhands.events.observation import CmdOutputObservation, ErrorObservation + from openhands.runtime.runtime import Runtime + ``` + +2. Create a configuration: + ```python + def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig: + config = AppConfig( + default_agent=metadata.agent_class, + runtime='docker', + max_iterations=metadata.max_iterations, + sandbox=SandboxConfig( + base_container_image='your_container_image', + enable_auto_lint=True, + timeout=300, + ), + ) + config.set_llm_config(metadata.llm_config) + return config + ``` + +3. Initialize the runtime and set up the evaluation environment: + ```python + def initialize_runtime(runtime: Runtime, instance: pd.Series): + # Set up your evaluation environment here + # For example, setting environment variables, preparing files, etc. + pass + ``` + +4. Create a function to process each instance: + ```python + from openhands.utils.async_utils import call_async_from_sync + def process_instance(instance: pd.Series, metadata: EvalMetadata) -> EvalOutput: + config = get_config(instance, metadata) + runtime = create_runtime(config) + call_async_from_sync(runtime.connect) + initialize_runtime(runtime, instance) + + instruction = get_instruction(instance, metadata) + + state = run_controller( + config=config, + task_str=instruction, + runtime=runtime, + fake_user_response_fn=your_user_response_function, + ) + + # Evaluate the agent's actions + evaluation_result = await evaluate_agent_actions(runtime, instance) + + return EvalOutput( + instance_id=instance.instance_id, + instruction=instruction, + test_result=evaluation_result, + metadata=metadata, + history=compatibility_for_eval_history_pairs(state.history), + metrics=state.metrics.get() if state.metrics else None, + error=state.last_error if state and state.last_error else None, + ) + ``` + +5. Run the evaluation: + ```python + metadata = make_metadata(llm_config, dataset_name, agent_class, max_iterations, eval_note, eval_output_dir) + output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl') + instances = prepare_dataset(your_dataset, output_file, eval_n_limit) + + await run_evaluation( + instances, + metadata, + output_file, + num_workers, + process_instance + ) + ``` + +This workflow sets up the configuration, initializes the runtime environment, processes each instance by running the agent and evaluating its actions, and then collects the results into an `EvalOutput` object. The `run_evaluation` function handles parallelization and progress tracking. + +Remember to customize the `get_instruction`, `your_user_response_function`, and `evaluate_agent_actions` functions according to your specific benchmark requirements. + +By following this structure, you can create a robust evaluation workflow for your benchmark within the OpenHands framework. + + +## Understanding the `user_response_fn` + +The `user_response_fn` is a crucial component in OpenHands's evaluation workflow. It simulates user interaction with the agent, allowing for automated responses during the evaluation process. This function is particularly useful when you want to provide consistent, predefined responses to the agent's queries or actions. + + +### Workflow and Interaction + +The correct workflow for handling actions and the `user_response_fn` is as follows: + +1. Agent receives a task and starts processing +2. Agent emits an Action +3. If the Action is executable (e.g., CmdRunAction, IPythonRunCellAction): + - The Runtime processes the Action + - Runtime returns an Observation +4. If the Action is not executable (typically a MessageAction): + - The `user_response_fn` is called + - It returns a simulated user response +5. The agent receives either the Observation or the simulated response +6. Steps 2-5 repeat until the task is completed or max iterations are reached + +Here's a more accurate visual representation: + +``` + [Agent] + | + v + [Emit Action] + | + v + [Is Action Executable?] + / \ + Yes No + | | + v v + [Runtime] [user_response_fn] + | | + v v + [Return Observation] [Simulated Response] + \ / + \ / + v v + [Agent receives feedback] + | + v + [Continue or Complete Task] +``` + +In this workflow: + +- Executable actions (like running commands or executing code) are handled directly by the Runtime +- Non-executable actions (typically when the agent wants to communicate or ask for clarification) are handled by the `user_response_fn` +- The agent then processes the feedback, whether it's an Observation from the Runtime or a simulated response from the `user_response_fn` + +This approach allows for automated handling of both concrete actions and simulated user interactions, making it suitable for evaluation scenarios where you want to test the agent's ability to complete tasks with minimal human intervention. + +### Example Implementation + +Here's an example of a `user_response_fn` used in the SWE-Bench evaluation: + +```python +def codeact_user_response(state: State | None) -> str: + msg = ( + 'Please continue working on the task on whatever approach you think is suitable.\n' + 'If you think you have solved the task, please first send your answer to user through message and then exit .\n' + 'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP.\n' + ) + + if state and state.history: + # check if the agent has tried to talk to the user 3 times, if so, let the agent know it can give up + user_msgs = [ + event + for event in state.history + if isinstance(event, MessageAction) and event.source == 'user' + ] + if len(user_msgs) >= 2: + # let the agent know that it can give up when it has tried 3 times + return ( + msg + + 'If you want to give up, run: exit .\n' + ) + return msg +``` + +This function does the following: + +1. Provides a standard message encouraging the agent to continue working +2. Checks how many times the agent has attempted to communicate with the user +3. If the agent has made multiple attempts, it provides an option to give up + +By using this function, you can ensure consistent behavior across multiple evaluation runs and prevent the agent from getting stuck waiting for human input. + +### WebSocket Connection +Source: https://docs.openhands.dev/openhands/usage/developers/websocket-connection.md + +This guide explains how to connect to the OpenHands WebSocket API to receive real-time events and send actions to the agent. + +## Overview + +OpenHands uses [Socket.IO](https://socket.io/) for WebSocket communication between the client and server. The WebSocket connection allows you to: + +1. Receive real-time events from the agent +2. Send user actions to the agent +3. Maintain a persistent connection for ongoing conversations + +## Connecting to the WebSocket + +### Connection Parameters + +When connecting to the WebSocket, you need to provide the following query parameters: + +- `conversation_id`: The ID of the conversation you want to join +- `latest_event_id`: The ID of the latest event you've received (use `-1` for a new connection) +- `providers_set`: (Optional) A comma-separated list of provider types + +### Connection Example + +Here's a basic example of connecting to the WebSocket using JavaScript: + +```javascript +import { io } from "socket.io-client"; + +const socket = io("http://localhost:3000", { + transports: ["websocket"], + query: { + conversation_id: "your-conversation-id", + latest_event_id: -1, + providers_set: "github,gitlab" // Optional + } +}); + +socket.on("connect", () => { + console.log("Connected to OpenHands WebSocket"); +}); + +socket.on("oh_event", (event) => { + console.log("Received event:", event); +}); + +socket.on("connect_error", (error) => { + console.error("Connection error:", error); +}); + +socket.on("disconnect", (reason) => { + console.log("Disconnected:", reason); +}); +``` + +## Sending Actions to the Agent + +To send an action to the agent, use the `oh_user_action` event: + +```javascript +// Send a user message to the agent +socket.emit("oh_user_action", { + type: "message", + source: "user", + message: "Hello, can you help me with my project?" +}); +``` + +## Receiving Events from the Agent + +The server emits events using the `oh_event` event type. Here are some common event types you might receive: + +- User messages (`source: "user", type: "message"`) +- Agent messages (`source: "agent", type: "message"`) +- File edits (`action: "edit"`) +- File writes (`action: "write"`) +- Command executions (`action: "run"`) + +Example event handler: + +```javascript +socket.on("oh_event", (event) => { + if (event.source === "agent" && event.type === "message") { + console.log("Agent says:", event.message); + } else if (event.action === "run") { + console.log("Command executed:", event.args.command); + console.log("Result:", event.result); + } +}); +``` + +## Using Websocat for Testing + +[Websocat](https://github.com/vi/websocat) is a command-line tool for interacting with WebSockets. It's useful for testing your WebSocket connection without writing a full client application. + +### Installation + +```bash +# On macOS +brew install websocat + +# On Linux +curl -L https://github.com/vi/websocat/releases/download/v1.11.0/websocat.x86_64-unknown-linux-musl > websocat +chmod +x websocat +sudo mv websocat /usr/local/bin/ +``` + +### Connecting to the WebSocket + +```bash +# Connect to the WebSocket and print all received messages +echo "40{}" | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +``` + +### Sending a Message + +```bash +# Send a message to the agent +echo '42["oh_user_action",{"type":"message","source":"user","message":"Hello, agent!"}]' | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +``` + +### Complete Example with Websocat + +Here's a complete example of connecting to the WebSocket, sending a message, and receiving events: + +```bash +# Start a persistent connection +websocat -v "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" + +# In another terminal, send a message +echo '42["oh_user_action",{"type":"message","source":"user","message":"Can you help me with my project?"}]' | \ +websocat "ws://localhost:3000/socket.io/?EIO=4&transport=websocket&conversation_id=your-conversation-id&latest_event_id=-1" +``` + +## Event Structure + +Events sent and received through the WebSocket follow a specific structure: + +```typescript +interface OpenHandsEvent { + id: string; // Unique event ID + source: string; // "user" or "agent" + timestamp: string; // ISO timestamp + message?: string; // For message events + type?: string; // Event type (e.g., "message") + action?: string; // Action type (e.g., "run", "edit", "write") + args?: any; // Action arguments + result?: any; // Action result +} +``` + +## Best Practices + +1. **Handle Reconnection**: Implement reconnection logic in your client to handle network interruptions. +2. **Track Event IDs**: Store the latest event ID you've received and use it when reconnecting to avoid duplicate events. +3. **Error Handling**: Implement proper error handling for connection errors and failed actions. +4. **Rate Limiting**: Avoid sending too many actions in a short period to prevent overloading the server. + +## Troubleshooting + +### Connection Issues + +- Verify that the OpenHands server is running and accessible +- Check that you're providing the correct conversation ID +- Ensure your WebSocket URL is correctly formatted + +### Authentication Issues + +- Make sure you have the necessary authentication cookies if required +- Verify that you have permission to access the specified conversation + +### Event Handling Issues + +- Check that you're correctly parsing the event data +- Verify that your event handlers are properly registered + +### Environment Variables Reference +Source: https://docs.openhands.dev/openhands/usage/environment-variables.md + +This page provides a reference of environment variables that can be used to configure OpenHands. Environment variables provide an alternative to TOML configuration files and are particularly useful for containerized deployments, CI/CD pipelines, and cloud environments. + +## Environment Variable Naming Convention + +OpenHands follows a consistent naming pattern for environment variables: + +- **Core settings**: Direct uppercase mapping (e.g., `debug` → `DEBUG`) +- **LLM settings**: Prefixed with `LLM_` (e.g., `model` → `LLM_MODEL`) +- **Agent settings**: Prefixed with `AGENT_` (e.g., `enable_browsing` → `AGENT_ENABLE_BROWSING`) +- **Sandbox settings**: Prefixed with `SANDBOX_` (e.g., `timeout` → `SANDBOX_TIMEOUT`) +- **Security settings**: Prefixed with `SECURITY_` (e.g., `confirmation_mode` → `SECURITY_CONFIRMATION_MODE`) + +## Core Configuration Variables + +These variables correspond to the `[core]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `DEBUG` | boolean | `false` | Enable debug logging throughout the application | +| `DISABLE_COLOR` | boolean | `false` | Disable colored output in terminal | +| `CACHE_DIR` | string | `"/tmp/cache"` | Directory path for caching | +| `SAVE_TRAJECTORY_PATH` | string | `"./trajectories"` | Path to store conversation trajectories | +| `REPLAY_TRAJECTORY_PATH` | string | `""` | Path to load and replay a trajectory file | +| `FILE_STORE_PATH` | string | `"/tmp/file_store"` | File store directory path | +| `FILE_STORE` | string | `"memory"` | File store type (`memory`, `local`, etc.) | +| `FILE_UPLOADS_MAX_FILE_SIZE_MB` | integer | `0` | Maximum file upload size in MB (0 = no limit) | +| `FILE_UPLOADS_RESTRICT_FILE_TYPES` | boolean | `false` | Whether to restrict file upload types | +| `FILE_UPLOADS_ALLOWED_EXTENSIONS` | list | `[".*"]` | List of allowed file extensions for uploads | +| `MAX_BUDGET_PER_TASK` | float | `0.0` | Maximum budget per task (0.0 = no limit) | +| `MAX_ITERATIONS` | integer | `100` | Maximum number of iterations per task | +| `RUNTIME` | string | `"docker"` | Runtime environment (`docker`, `local`, `cli`, etc.) | +| `DEFAULT_AGENT` | string | `"CodeActAgent"` | Default agent class to use | +| `JWT_SECRET` | string | auto-generated | JWT secret for authentication | +| `RUN_AS_OPENHANDS` | boolean | `true` | Whether to run as the openhands user | +| `VOLUMES` | string | `""` | Volume mounts in format `host:container[:mode]` | + +## LLM Configuration Variables + +These variables correspond to the `[llm]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `LLM_MODEL` | string | `"claude-3-5-sonnet-20241022"` | LLM model to use | +| `LLM_API_KEY` | string | `""` | API key for the LLM provider | +| `LLM_BASE_URL` | string | `""` | Custom API base URL | +| `LLM_API_VERSION` | string | `""` | API version to use | +| `LLM_TEMPERATURE` | float | `0.0` | Sampling temperature | +| `LLM_TOP_P` | float | `1.0` | Top-p sampling parameter | +| `LLM_MAX_INPUT_TOKENS` | integer | `0` | Maximum input tokens (0 = no limit) | +| `LLM_MAX_OUTPUT_TOKENS` | integer | `0` | Maximum output tokens (0 = no limit) | +| `LLM_MAX_MESSAGE_CHARS` | integer | `30000` | Maximum characters that will be sent to the model in observation content | +| `LLM_TIMEOUT` | integer | `0` | API timeout in seconds (0 = no timeout) | +| `LLM_NUM_RETRIES` | integer | `8` | Number of retry attempts | +| `LLM_RETRY_MIN_WAIT` | integer | `15` | Minimum wait time between retries (seconds) | +| `LLM_RETRY_MAX_WAIT` | integer | `120` | Maximum wait time between retries (seconds) | +| `LLM_RETRY_MULTIPLIER` | float | `2.0` | Exponential backoff multiplier | +| `LLM_DROP_PARAMS` | boolean | `false` | Drop unsupported parameters without error | +| `LLM_CACHING_PROMPT` | boolean | `true` | Enable prompt caching if supported | +| `LLM_DISABLE_VISION` | boolean | `false` | Disable vision capabilities for cost reduction | +| `LLM_CUSTOM_LLM_PROVIDER` | string | `""` | Custom LLM provider name | +| `LLM_OLLAMA_BASE_URL` | string | `""` | Base URL for Ollama API | +| `LLM_INPUT_COST_PER_TOKEN` | float | `0.0` | Cost per input token | +| `LLM_OUTPUT_COST_PER_TOKEN` | float | `0.0` | Cost per output token | +| `LLM_REASONING_EFFORT` | string | `""` | Reasoning effort for o-series models (`low`, `medium`, `high`) | + +### AWS Configuration +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `LLM_AWS_ACCESS_KEY_ID` | string | `""` | AWS access key ID | +| `LLM_AWS_SECRET_ACCESS_KEY` | string | `""` | AWS secret access key | +| `LLM_AWS_REGION_NAME` | string | `""` | AWS region name | + +## Agent Configuration Variables + +These variables correspond to the `[agent]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `AGENT_LLM_CONFIG` | string | `""` | Name of LLM config group to use | +| `AGENT_FUNCTION_CALLING` | boolean | `true` | Enable function calling | +| `AGENT_ENABLE_BROWSING` | boolean | `false` | Enable browsing delegate | +| `AGENT_ENABLE_LLM_EDITOR` | boolean | `false` | Enable LLM-based editor | +| `AGENT_ENABLE_JUPYTER` | boolean | `false` | Enable Jupyter integration | +| `AGENT_ENABLE_HISTORY_TRUNCATION` | boolean | `true` | Enable history truncation | +| `AGENT_ENABLE_PROMPT_EXTENSIONS` | boolean | `true` | Enable skills (formerly known as microagents) (prompt extensions) | +| `AGENT_DISABLED_MICROAGENTS` | list | `[]` | List of skills to disable | + +## Sandbox Configuration Variables + +These variables correspond to the `[sandbox]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_TIMEOUT` | integer | `120` | Sandbox timeout in seconds | +| `SANDBOX_USER_ID` | integer | `1000` | User ID for sandbox processes | +| `SANDBOX_BASE_CONTAINER_IMAGE` | string | `"nikolaik/python-nodejs:python3.12-nodejs22"` | Base container image | +| `SANDBOX_USE_HOST_NETWORK` | boolean | `false` | Use host networking | +| `SANDBOX_RUNTIME_BINDING_ADDRESS` | string | `"0.0.0.0"` | Runtime binding address | +| `SANDBOX_ENABLE_AUTO_LINT` | boolean | `false` | Enable automatic linting | +| `SANDBOX_INITIALIZE_PLUGINS` | boolean | `true` | Initialize sandbox plugins | +| `SANDBOX_RUNTIME_EXTRA_DEPS` | string | `""` | Extra dependencies to install | +| `SANDBOX_RUNTIME_STARTUP_ENV_VARS` | dict | `{}` | Environment variables for runtime | +| `SANDBOX_BROWSERGYM_EVAL_ENV` | string | `""` | BrowserGym evaluation environment | +| `SANDBOX_VOLUMES` | string | `""` | Volume mounts (replaces deprecated workspace settings) | +| `AGENT_SERVER_IMAGE_REPOSITORY` | string | `""` | Runtime container image repository (e.g., `ghcr.io/openhands/agent-server`) | +| `AGENT_SERVER_IMAGE_TAG` | string | `""` | Runtime container image tag (e.g., `1.11.4-python`) | +| `SANDBOX_KEEP_RUNTIME_ALIVE` | boolean | `false` | Keep runtime alive after session ends | +| `SANDBOX_PAUSE_CLOSED_RUNTIMES` | boolean | `false` | Pause instead of stopping closed runtimes | +| `SANDBOX_CLOSE_DELAY` | integer | `300` | Delay before closing idle runtimes (seconds) | +| `SANDBOX_RM_ALL_CONTAINERS` | boolean | `false` | Remove all containers when stopping | +| `SANDBOX_ENABLE_GPU` | boolean | `false` | Enable GPU support | +| `SANDBOX_CUDA_VISIBLE_DEVICES` | string | `""` | Specify GPU devices by ID | +| `SANDBOX_VSCODE_PORT` | integer | auto | Specific port for VSCode server | + +### Sandbox Environment Variables +Variables prefixed with `SANDBOX_ENV_` are passed through to the sandbox environment: + +| Environment Variable | Description | +|---------------------|-------------| +| `SANDBOX_ENV_*` | Any variable with this prefix is passed to the sandbox (e.g., `SANDBOX_ENV_OPENAI_API_KEY`) | + +## Security Configuration Variables + +These variables correspond to the `[security]` section in `config.toml`: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SECURITY_CONFIRMATION_MODE` | boolean | `false` | Enable confirmation mode for actions | +| `SECURITY_SECURITY_ANALYZER` | string | `"llm"` | Security analyzer to use (`llm`, `invariant`) | +| `SECURITY_ENABLE_SECURITY_ANALYZER` | boolean | `true` | Enable security analysis | + +## Debug and Logging Variables + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `DEBUG` | boolean | `false` | Enable general debug logging | +| `DEBUG_LLM` | boolean | `false` | Enable LLM-specific debug logging | +| `DEBUG_RUNTIME` | boolean | `false` | Enable runtime debug logging | +| `LOG_TO_FILE` | boolean | auto | Log to file (auto-enabled when DEBUG=true) | + +## Runtime-Specific Variables + +### Docker Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_VOLUME_OVERLAYS` | string | `""` | Volume overlay configurations | + +### Remote Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `SANDBOX_API_KEY` | string | `""` | API key for remote runtime | +| `SANDBOX_REMOTE_RUNTIME_API_URL` | string | `""` | Remote runtime API URL | + +### Local Runtime +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `RUNTIME_URL` | string | `""` | Runtime URL for local runtime | +| `RUNTIME_URL_PATTERN` | string | `""` | Runtime URL pattern | +| `RUNTIME_ID` | string | `""` | Runtime identifier | +| `LOCAL_RUNTIME_MODE` | string | `""` | Enable local runtime mode (`1` to enable) | + +## Integration Variables + +### GitHub Integration +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `GITHUB_TOKEN` | string | `""` | GitHub personal access token | + +### Third-Party API Keys +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `OPENAI_API_KEY` | string | `""` | OpenAI API key | +| `ANTHROPIC_API_KEY` | string | `""` | Anthropic API key | +| `GOOGLE_API_KEY` | string | `""` | Google API key | +| `AZURE_API_KEY` | string | `""` | Azure API key | +| `TAVILY_API_KEY` | string | `""` | Tavily search API key | + +## Server Configuration Variables + +These are primarily used when running OpenHands as a server: + +| Environment Variable | Type | Default | Description | +|---------------------|------|---------|-------------| +| `FRONTEND_PORT` | integer | `3000` | Frontend server port | +| `BACKEND_PORT` | integer | `8000` | Backend server port | +| `FRONTEND_HOST` | string | `"localhost"` | Frontend host address | +| `BACKEND_HOST` | string | `"localhost"` | Backend host address | +| `WEB_HOST` | string | `"localhost"` | Web server host | +| `SERVE_FRONTEND` | boolean | `true` | Whether to serve frontend | + +## Deprecated Variables + +These variables are deprecated and should be replaced: + +| Environment Variable | Replacement | Description | +|---------------------|-------------|-------------| +| `WORKSPACE_BASE` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_PATH` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_PATH_IN_SANDBOX` | `SANDBOX_VOLUMES` | Use volume mounting instead | +| `WORKSPACE_MOUNT_REWRITE` | `SANDBOX_VOLUMES` | Use volume mounting instead | + +## Usage Examples + +### Basic Setup with OpenAI +```bash +export LLM_MODEL="gpt-4o" +export LLM_API_KEY="your-openai-api-key" +export DEBUG=true +``` + +### Docker Deployment with Custom Volumes +```bash +export RUNTIME="docker" +export SANDBOX_VOLUMES="/host/workspace:/workspace:rw,/host/data:/data:ro" +export SANDBOX_TIMEOUT=300 +``` + +### Remote Runtime Configuration +```bash +export RUNTIME="remote" +export SANDBOX_API_KEY="your-remote-api-key" +export SANDBOX_REMOTE_RUNTIME_API_URL="https://your-runtime-api.com" +``` + +### Security-Enhanced Setup +```bash +export SECURITY_CONFIRMATION_MODE=true +export SECURITY_SECURITY_ANALYZER="llm" +export DEBUG_RUNTIME=true +``` + +## Notes + +1. **Boolean Values**: Environment variables expecting boolean values accept `true`/`false`, `1`/`0`, or `yes`/`no` (case-insensitive). + +2. **List Values**: Lists should be provided as Python literal strings, e.g., `AGENT_DISABLED_MICROAGENTS='["skill1", "skill2"]'`. + +3. **Dictionary Values**: Dictionaries should be provided as Python literal strings, e.g., `SANDBOX_RUNTIME_STARTUP_ENV_VARS='{"KEY": "value"}'`. + +4. **Precedence**: Environment variables take precedence over TOML configuration files. + +5. **Docker Usage**: When using Docker, pass environment variables with the `-e` flag: + ```bash + docker run -e LLM_API_KEY="your-key" -e DEBUG=true openhands/openhands + ``` + +6. **Validation**: Invalid environment variable values will be logged as errors and fall back to defaults. + +### Good vs. Bad Instructions +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions.md + +The quality of your instructions directly impacts the quality of OpenHands' output. This guide shows concrete examples of good and bad prompts, explains why some work better than others, and provides principles for writing effective instructions. + +## Concrete Examples of Good/Bad Prompts + +### Bug Fixing Examples + +#### Bad Example + +``` +Fix the bug in my code. +``` + +**Why it's bad:** +- No information about what the bug is +- No indication of where to look +- No description of expected vs. actual behavior +- OpenHands would have to guess what's wrong + +#### Good Example + +``` +Fix the TypeError in src/api/users.py line 45. + +Error message: +TypeError: 'NoneType' object has no attribute 'get' + +Expected behavior: The get_user_preferences() function should return +default preferences when the user has no saved preferences. + +Actual behavior: It crashes with the error above when user.preferences is None. + +The fix should handle the None case gracefully and return DEFAULT_PREFERENCES. +``` + +**Why it works:** +- Specific file and line number +- Exact error message +- Clear expected vs. actual behavior +- Suggested approach for the fix + +### Feature Development Examples + +#### Bad Example + +``` +Add user authentication to my app. +``` + +**Why it's bad:** +- Scope is too large and undefined +- No details about authentication requirements +- No mention of existing code or patterns +- Could mean many different things + +#### Good Example + +``` +Add email/password login to our Express.js API. + +Requirements: +1. POST /api/auth/login endpoint +2. Accept email and password in request body +3. Validate against users in PostgreSQL database +4. Return JWT token on success, 401 on failure +5. Use bcrypt for password comparison (already in dependencies) + +Follow the existing patterns in src/api/routes.js for route structure. +Use the existing db.query() helper in src/db/index.js for database access. + +Success criteria: I can call the endpoint with valid credentials +and receive a JWT token that works with our existing auth middleware. +``` + +**Why it works:** +- Specific, scoped feature +- Clear technical requirements +- Points to existing patterns to follow +- Defines what "done" looks like + +### Code Review Examples + +#### Bad Example + +``` +Review my code. +``` + +**Why it's bad:** +- No code provided or referenced +- No indication of what to look for +- No context about the code's purpose +- No criteria for the review + +#### Good Example + +``` +Review this pull request for our payment processing module: + +Focus areas: +1. Security - we're handling credit card data +2. Error handling - payments must never silently fail +3. Idempotency - duplicate requests should be safe + +Context: +- This integrates with Stripe API +- It's called from our checkout flow +- We have ~10,000 transactions/day + +Please flag any issues as Critical/Major/Minor with explanations. +``` + +**Why it works:** +- Clear scope and focus areas +- Important context provided +- Business implications explained +- Requested output format specified + +### Refactoring Examples + +#### Bad Example + +``` +Make the code better. +``` + +**Why it's bad:** +- "Better" is subjective and undefined +- No specific problems identified +- No goals for the refactoring +- No constraints or requirements + +#### Good Example + +``` +Refactor the UserService class in src/services/user.js: + +Problems to address: +1. The class is 500+ lines - split into smaller, focused services +2. Database queries are mixed with business logic - separate them +3. There's code duplication in the validation methods + +Constraints: +- Keep the public API unchanged (other code depends on it) +- Maintain test coverage (run npm test after changes) +- Follow our existing service patterns in src/services/ + +Goal: Improve maintainability while keeping the same functionality. +``` + +**Why it works:** +- Specific problems identified +- Clear constraints and requirements +- Points to patterns to follow +- Measurable success criteria + +## Key Principles for Effective Instructions + +### Be Specific + +Vague instructions produce vague results. Be concrete about: + +| Instead of... | Say... | +|---------------|--------| +| "Fix the error" | "Fix the TypeError on line 45 of api.py" | +| "Add tests" | "Add unit tests for the calculateTotal function covering edge cases" | +| "Improve performance" | "Reduce the database queries from N+1 to a single join query" | +| "Clean up the code" | "Extract the validation logic into a separate ValidatorService class" | + +### Provide Context + +Help OpenHands understand the bigger picture: + +``` +Context to include: +- What does this code do? (purpose) +- Who uses it? (users/systems) +- Why does this matter? (business impact) +- What constraints exist? (performance, compatibility) +- What patterns should be followed? (existing conventions) +``` + +**Example with context:** + +``` +Add rate limiting to our public API endpoints. + +Context: +- This is a REST API serving mobile apps and third-party integrations +- We've been seeing abuse from web scrapers hitting us 1000+ times/minute +- Our infrastructure can handle 100 req/sec per client sustainably +- We use Redis (already available in the project) +- Our API follows the controller pattern in src/controllers/ + +Requirement: Limit each API key to 100 requests per minute with +appropriate 429 responses and Retry-After headers. +``` + +### Set Clear Goals + +Define what success looks like: + +``` +Success criteria checklist: +✓ What specific outcome do you want? +✓ How will you verify it worked? +✓ What tests should pass? +✓ What should the user experience be? +``` + +**Example with clear goals:** + +``` +Implement password reset functionality. + +Success criteria: +1. User can request reset via POST /api/auth/forgot-password +2. System sends email with secure reset link +3. Link expires after 1 hour +4. User can set new password via POST /api/auth/reset-password +5. Old sessions are invalidated after password change +6. All edge cases return appropriate error messages +7. Existing tests still pass, new tests cover the feature +``` + +### Include Constraints + +Specify what you can't or won't change: + +``` +Constraints to specify: +- API compatibility (can't break existing clients) +- Technology restrictions (must use existing stack) +- Performance requirements (must respond in <100ms) +- Security requirements (must not log PII) +- Time/scope limits (just this one file) +``` + +## Common Pitfalls to Avoid + +### Vague Requirements + + + + ``` + Make the dashboard faster. + ``` + + + ``` + The dashboard takes 5 seconds to load. + + Profile it and optimize to load in under 1 second. + + Likely issues: + - N+1 queries in getWidgetData() + - Uncompressed images + - Missing database indexes + + Focus on the biggest wins first. + ``` + + + +### Missing Context + + + + ``` + Add caching to the API. + ``` + + + ``` + Add caching to the product catalog API. + + Context: + - 95% of requests are for the same 1000 products + - Product data changes only via admin panel (rare) + - We already have Redis running for sessions + - Current response time is 200ms, target is <50ms + + Cache strategy: Cache product data in Redis with 5-minute TTL, + invalidate on product update. + ``` + + + +### Unrealistic Expectations + + + + ``` + Rewrite our entire backend from PHP to Go. + ``` + + + ``` + Create a Go microservice for the image processing currently in + src/php/ImageProcessor.php. + + This is the first step in our gradual migration. + The Go service should: + 1. Expose the same API endpoints + 2. Be deployable alongside the existing PHP app + 3. Include a feature flag to route traffic + + Start with just the resize and crop functions. + ``` + + + +### Incomplete Information + + + + ``` + The login is broken, fix it. + ``` + + + ``` + Users can't log in since yesterday's deployment. + + Symptoms: + - Login form submits but returns 500 error + - Server logs show: "Redis connection refused" + - Redis was moved to a new host yesterday + + The issue is likely in src/config/redis.js which may + have the old host hardcoded. + + Expected: Login should work with the new Redis at redis.internal:6380 + ``` + + + +## Best Practices + +### Structure Your Instructions + +Use clear structure for complex requests: + +``` +## Task +[One sentence describing what you want] + +## Background +[Context and why this matters] + +## Requirements +1. [Specific requirement] +2. [Specific requirement] +3. [Specific requirement] + +## Constraints +- [What you can't change] +- [What must be preserved] + +## Success Criteria +- [How to verify it works] +``` + +### Provide Examples + +Show what you want through examples: + +``` +Add input validation to the user registration endpoint. + +Example of what validation errors should look like: + +{ + "error": "validation_failed", + "details": [ + {"field": "email", "message": "Invalid email format"}, + {"field": "password", "message": "Must be at least 8 characters"} + ] +} + +Validate: +- email: valid format, not already registered +- password: min 8 chars, at least 1 number +- username: 3-20 chars, alphanumeric only +``` + +### Define Success Criteria + +Be explicit about what "done" means: + +``` +This task is complete when: +1. All existing tests pass (npm test) +2. New tests cover the added functionality +3. The feature works as described in the acceptance criteria +4. Code follows our style guide (npm run lint passes) +5. Documentation is updated if needed +``` + +### Iterate and Refine + +Build on previous work: + +``` +In our last session, you added the login endpoint. + +Now add the logout functionality: +1. POST /api/auth/logout endpoint +2. Invalidate the current session token +3. Clear any server-side session data +4. Follow the same patterns used in login + +The login implementation is in src/api/auth/login.js for reference. +``` + +## Quick Reference + +| Element | Bad | Good | +|---------|-----|------| +| Location | "in the code" | "in src/api/users.py line 45" | +| Problem | "it's broken" | "TypeError when user.preferences is None" | +| Scope | "add authentication" | "add JWT-based login endpoint" | +| Behavior | "make it work" | "return 200 with user data on success" | +| Patterns | (none) | "follow patterns in src/services/" | +| Success | (none) | "all tests pass, endpoint returns correct data" | + + +The investment you make in writing clear instructions pays off in fewer iterations, better results, and less time debugging miscommunication. Take the extra minute to be specific. + + +### OpenHands in Your SDLC +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration.md + +OpenHands can enhance every phase of your software development lifecycle (SDLC), from planning through deployment. This guide shows some example prompts that you can use when you integrate OpenHands into your development workflow. + +## Integration with Development Workflows + +### Planning Phase + +Use OpenHands during planning to accelerate technical decisions: + +**Technical specification assistance:** +``` +Create a technical specification for adding search functionality: + +Requirements from product: +- Full-text search across products and articles +- Filter by category, price range, and date +- Sub-200ms response time at 1000 QPS + +Provide: +1. Architecture options (Elasticsearch vs. PostgreSQL full-text) +2. Data model changes needed +3. API endpoint designs +4. Estimated implementation effort +5. Risks and mitigations +``` + +**Sprint planning support:** +``` +Review these user stories and create implementation tasks in our Linear task management software using the LINEAR_API_KEY environment variable: + +Story 1: As a user, I can reset my password via email +Story 2: As an admin, I can view user activity logs + +For each story, create: +- Technical subtasks +- Estimated effort (hours) +- Dependencies on other work +- Testing requirements +``` + +### Development Phase + +OpenHands excels during active development: + +**Feature implementation:** +- Write new features with clear specifications +- Follow existing code patterns automatically +- Generate tests alongside code +- Create documentation as you go + +**Bug fixing:** +- Analyze error logs and stack traces +- Identify root causes +- Implement fixes with regression tests +- Document the issue and solution + +**Code improvement:** +- Refactor for clarity and maintainability +- Optimize performance bottlenecks +- Update deprecated APIs +- Improve error handling + +### Testing Phase + +Automate test creation and improvement: + +``` +Add comprehensive tests for the UserService module: + +Current coverage: 45% +Target coverage: 85% + +1. Analyze uncovered code paths using the codecov module +2. Write unit tests for edge cases +3. Add integration tests for API endpoints +4. Create test data factories +5. Document test scenarios + +Each time you add new tests, re-run codecov to check the increased coverage. Continue until you have sufficient coverage, and all tests pass (by either fixing the tests, or fixing the code if your tests uncover bugs). +``` + +### Review Phase + +Accelerate code reviews: + +``` +Review this PR for our coding standards: + +Check for: +1. Security issues (SQL injection, XSS, etc.) +2. Performance concerns +3. Test coverage adequacy +4. Documentation completeness +5. Adherence to our style guide + +Provide actionable feedback with severity ratings. +``` + +### Deployment Phase + +Assist with deployment preparation: + +``` +Prepare for production deployment: + +1. Review all changes since last release +2. Check for breaking API changes +3. Verify database migrations are reversible +4. Update the changelog +5. Create release notes +6. Identify rollback steps if needed +``` + +## CI/CD Integration + +OpenHands can be integrated into your CI/CD pipelines through the [Software Agent SDK](/sdk/index). Rather than using hypothetical actions, you can build powerful, customized workflows using real, production-ready tools. + +### GitHub Actions Integration + +The Software Agent SDK provides composite GitHub Actions for common workflows: + +- **[Automated PR Review](/openhands/usage/use-cases/code-review)** - Automatically review pull requests with inline comments +- **[SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review)** - Build custom GitHub workflows with the SDK + +For example, to set up automated PR reviews, see the [Automated Code Review](/openhands/usage/use-cases/code-review) guide which uses the real `OpenHands/software-agent-sdk/.github/actions/pr-review` composite action. + +### What You Can Automate + +Using the SDK, you can create GitHub Actions workflows to: + +1. **Automatic code review** when a PR is opened +2. **Automatically update docs** weekly when new functionality is added +3. **Diagnose errors** that have appeared in monitoring software such as DataDog and automatically send analyses and improvements +4. **Manage TODO comments** and track technical debt +5. **Assign reviewers** based on code ownership patterns + +### Getting Started + +To integrate OpenHands into your CI/CD: + +1. Review the [SDK Getting Started guide](/sdk/getting-started) +2. Explore the [GitHub Workflows examples](/sdk/guides/github-workflows/pr-review) +3. Set up your `LLM_API_KEY` as a repository secret +4. Use the provided composite actions or build custom workflows + +See the [Use Cases](/openhands/usage/use-cases/code-review) section for complete examples of production-ready integrations. + +## Team Workflows + +### Solo Developer Workflows + +For individual developers: + +**Daily workflow:** +1. **Morning review**: Have OpenHands analyze overnight CI results +2. **Feature development**: Use OpenHands for implementation +3. **Pre-commit**: Request review before pushing +4. **Documentation**: Generate/update docs for changes + +**Best practices:** +- Set up automated reviews on all PRs +- Use OpenHands for boilerplate and repetitive tasks +- Keep AGENTS.md updated with project patterns + +### Small Team Workflows + +For teams of 2-10 developers: + +**Collaborative workflow:** +``` +Team Member A: Creates feature branch, writes initial implementation +OpenHands: Reviews code, suggests improvements +Team Member B: Reviews OpenHands suggestions, approves or modifies +OpenHands: Updates documentation, adds missing tests +Team: Merges after final human review +``` + +**Communication integration:** +- Slack notifications for OpenHands findings +- Automatic issue creation for bugs found +- Weekly summary reports + +### Enterprise Team Workflows + +For larger organizations: + +**Governance and oversight:** +- Configure approval requirements for OpenHands changes +- Set up audit logging for all AI-assisted changes +- Define scope limits for automated actions +- Establish human review requirements + +**Scale patterns:** +``` +Central Platform Team: +├── Defines OpenHands policies +├── Manages integrations +└── Monitors usage and quality + +Feature Teams: +├── Use OpenHands within policies +├── Customize for team needs +└── Report issues to platform team +``` + +## Best Practices + +### Code Review Integration + +Set up effective automated reviews: + +```yaml +# .openhands/review-config.yml +review: + focus_areas: + - security + - performance + - test_coverage + - documentation + + severity_levels: + block_merge: + - critical + - security + require_response: + - major + informational: + - minor + - suggestion + + ignore_patterns: + - "*.generated.*" + - "vendor/*" +``` + +### Pull Request Automation + +Automate common PR tasks: + +| Trigger | Action | +|---------|--------| +| PR opened | Auto-review, label by type | +| Tests fail | Analyze failures, suggest fixes | +| Coverage drops | Identify missing tests | +| PR approved | Update changelog, check docs | + +### Quality Gates + +Define automated quality gates: + +```yaml +quality_gates: + - name: test_coverage + threshold: 80% + action: block_merge + + - name: security_issues + threshold: 0 critical + action: block_merge + + - name: code_review_score + threshold: 7/10 + action: require_review + + - name: documentation + requirement: all_public_apis + action: warn +``` + +### Automated Testing + +Integrate OpenHands with your testing strategy: + +**Test generation triggers:** +- New code without tests +- Coverage below threshold +- Bug fix without regression test +- API changes without contract tests + +**Example workflow:** +```yaml +on: + push: + branches: [main] + +jobs: + ensure-coverage: + steps: + - name: Check coverage + run: | + COVERAGE=$(npm test -- --coverage | grep "All files" | awk '{print $10}') + if [ "$COVERAGE" -lt "80" ]; then + openhands generate-tests --target 80 + fi +``` + +## Common Integration Patterns + +### Pre-Commit Hooks + +Run OpenHands checks before commits: + +```bash +# .git/hooks/pre-commit +#!/bin/bash + +# Quick code review +openhands review --quick --staged-only + +if [ $? -ne 0 ]; then + echo "OpenHands found issues. Review and fix before committing." + exit 1 +fi +``` + +### Post-Commit Actions + +Automate tasks after commits: + +```yaml +# .github/workflows/post-commit.yml +on: + push: + branches: [main] + +jobs: + update-docs: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Update API docs + run: openhands update-docs --api + - name: Commit changes + run: | + git add docs/ + git commit -m "docs: auto-update API documentation" || true + git push +``` + +### Scheduled Tasks + +Run regular maintenance: + +```yaml +# Weekly dependency check +on: + schedule: + - cron: '0 9 * * 1' # Monday 9am + +jobs: + dependency-review: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Check dependencies + run: | + openhands check-dependencies --security --outdated + - name: Create issues + run: openhands create-issues --from-report deps.json +``` + +### Event-Triggered Workflows + +You can build custom event-triggered workflows using the Software Agent SDK. For example, the [Incident Triage](/openhands/usage/use-cases/incident-triage) use case shows how to automatically analyze and respond to issues. + +For more event-driven automation patterns, see: +- [SDK GitHub Workflows Guide](/sdk/guides/github-workflows/pr-review) - Build custom workflows triggered by GitHub events +- [GitHub Action Integration](/openhands/usage/run-openhands/github-action) - Use the OpenHands resolver for issue triage + +### When to Use OpenHands +Source: https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands.md + +OpenHands excels at many development tasks, but knowing when to use it—and when to handle things yourself—helps you get the best results. This guide helps you identify the right tasks for OpenHands and set yourself up for success. + +## Task Complexity Guidance + +### Simple Tasks + +**Ideal for OpenHands** — These tasks can often be completed in a single session with minimal guidance. + +- Adding a new function or method +- Writing unit tests for existing code +- Fixing simple bugs with clear error messages +- Code formatting and style fixes +- Adding documentation or comments +- Simple refactoring (rename, extract method) +- Configuration changes + +**Example prompt:** +``` +Add a calculateDiscount() function to src/utils/pricing.js that takes +a price and discount percentage, returns the discounted price. +Add unit tests. +``` + +### Medium Complexity Tasks + +**Good for OpenHands** — These tasks may need more context and possibly some iteration. + +- Implementing a new API endpoint +- Adding a feature to an existing module +- Debugging issues that span multiple files +- Migrating code to a new pattern +- Writing integration tests +- Performance optimization with clear metrics +- Setting up CI/CD workflows + +**Example prompt:** +``` +Add a user profile endpoint to our API: +- GET /api/users/:id/profile +- Return user data with their recent activity +- Follow patterns in existing controllers +- Add integration tests +- Handle not-found and unauthorized cases +``` + +### Complex Tasks + +**May require iteration** — These benefit from breaking down into smaller pieces. + +- Large refactoring across many files +- Architectural changes +- Implementing complex business logic +- Multi-service integrations +- Performance optimization without clear cause +- Security audits +- Framework or major dependency upgrades + +**Recommended approach:** +``` +Break large tasks into phases: + +Phase 1: "Analyze the current authentication system and document +all touch points that need to change for OAuth2 migration." + +Phase 2: "Implement the OAuth2 provider configuration and basic +token flow, keeping existing auth working in parallel." + +Phase 3: "Migrate the user login flow to use OAuth2, maintaining +backwards compatibility." +``` + +## Best Use Cases + +### Ideal Scenarios + +OpenHands is **most effective** when: + +| Scenario | Why It Works | +|----------|--------------| +| Clear requirements | OpenHands can work independently | +| Well-defined scope | Less ambiguity, fewer iterations | +| Existing patterns to follow | Consistency with codebase | +| Good test coverage | Easy to verify changes | +| Isolated changes | Lower risk of side effects | + +**Perfect use cases:** + +- **Bug fixes with reproduction steps**: Clear problem, measurable solution +- **Test additions**: Existing code provides the specification +- **Documentation**: Code is the source of truth +- **Boilerplate generation**: Follows established patterns +- **Code review and analysis**: Read-only, analytical tasks + +### Good Fit Scenarios + +OpenHands works **well with some guidance** for: + +- **Feature implementation**: When requirements are documented +- **Refactoring**: When goals and constraints are clear +- **Debugging**: When you can provide logs and context +- **Code modernization**: When patterns are established +- **API development**: When specs exist + +**Tips for these scenarios:** + +1. Provide clear acceptance criteria +2. Point to examples of similar work in the codebase +3. Specify constraints and non-goals +4. Be ready to iterate and clarify + +### Poor Fit Scenarios + +**Consider alternatives** when: + +| Scenario | Challenge | Alternative | +|----------|-----------|-------------| +| Vague requirements | Unclear what "done" means | Define requirements first | +| Exploratory work | Need human creativity/intuition | Brainstorm first, then implement | +| Highly sensitive code | Risk tolerance is zero | Human review essential | +| Organizational knowledge | Needs tribal knowledge | Pair with domain expert | +| Visual design | Subjective aesthetic judgments | Use design tools | + +**Red flags that a task may not be suitable:** + +- "Make it look better" (subjective) +- "Figure out what's wrong" (too vague) +- "Rewrite everything" (too large) +- "Do what makes sense" (unclear requirements) +- Changes to production infrastructure without review + +## Limitations + +### Current Limitations + +Be aware of these constraints: + +- **Long-running processes**: Sessions have time limits +- **Interactive debugging**: Can't set breakpoints interactively +- **Visual verification**: Can't see rendered UI easily +- **External system access**: May need credentials configured +- **Large codebase analysis**: Memory and time constraints + +### Technical Constraints + +| Constraint | Impact | Workaround | +|------------|--------|------------| +| Session duration | Very long tasks may timeout | Break into smaller tasks | +| Context window | Can't see entire large codebase at once | Focus on relevant files | +| No persistent state | Previous sessions not remembered | Use AGENTS.md for context | +| Network access | Some external services may be blocked | Use local resources when possible | + +### Scope Boundaries + +OpenHands works within your codebase but has boundaries: + +**Can do:** +- Read and write files in the repository +- Run tests and commands +- Access configured services and APIs +- Browse documentation and reference material + +**Cannot do:** +- Access your local environment outside the sandbox +- Make decisions requiring business context it doesn't have +- Replace human judgment for critical decisions +- Guarantee production-safe changes without review + +## Pre-Task Checklist + +### Prerequisites + +Before starting a task, ensure: + +- [ ] Clear description of what you want +- [ ] Expected outcome is defined +- [ ] Relevant files are identified +- [ ] Dependencies are available +- [ ] Tests can be run + +### Environment Setup + +Prepare your repository: + +```markdown +## AGENTS.md Checklist + +- [ ] Build commands documented +- [ ] Test commands documented +- [ ] Code style guidelines noted +- [ ] Architecture overview included +- [ ] Common patterns described +``` + +See [Repository Setup](/openhands/usage/customization/repository) for details. + +### Repository Preparation + +Optimize for success: + +1. **Clean state**: Commit or stash uncommitted changes +2. **Working build**: Ensure the project builds +3. **Passing tests**: Start from a green state +4. **Updated dependencies**: Resolve any dependency issues +5. **Clear documentation**: Update AGENTS.md if needed + +## Post-Task Review + +### Quality Checks + +After OpenHands completes a task: + +- [ ] Review all changed files +- [ ] Understand each change made +- [ ] Check for unintended modifications +- [ ] Verify code style consistency +- [ ] Look for hardcoded values or credentials + +### Validation Steps + +1. **Run tests**: `npm test`, `pytest`, etc. +2. **Check linting**: Ensure style compliance +3. **Build the project**: Verify it still compiles +4. **Manual testing**: Test the feature yourself +5. **Edge cases**: Try unusual inputs + +### Learning from Results + +After each significant task: + +**What went well?** +- Note effective prompt patterns +- Document successful approaches +- Update AGENTS.md with learnings + +**What could improve?** +- Identify unclear instructions +- Note missing context +- Plan better for next time + +**Update your repository:** +```markdown +## Things OpenHands Should Know (add to AGENTS.md) + +- When adding API endpoints, always add to routes/index.js +- Our date format is ISO 8601 everywhere +- All database queries go through the repository pattern +``` + +## Decision Framework + +Use this framework to decide if a task is right for OpenHands: + +``` +Is the task well-defined? +├── No → Define it better first +└── Yes → Continue + +Do you have clear success criteria? +├── No → Define acceptance criteria +└── Yes → Continue + +Is the scope manageable (< 100 LOC)? +├── No → Break into smaller tasks +└── Yes → Continue + +Do examples exist in the codebase? +├── No → Provide examples or patterns +└── Yes → Continue + +Can you verify the result? +├── No → Add tests or verification steps +└── Yes → ✅ Good candidate for OpenHands +``` + +OpenHands can be used for most development tasks -- the developers of OpenHands write most of their code with OpenHands! + +But it can be particularly useful for certain types of tasks. For instance: + +- **Clearly Specified Tasks:** Generally, if the task has a very clear success criterion, OpenHands will do better. It is especially useful if you can define it in a way that can be verified programmatically, like making sure that all of the tests pass or test coverage gets above a certain value using a particular program. But even when you don't have something like that, you can just provide a checklist of things that need to be done. +- **Highly Repetitive Tasks:** These are tasks that need to be done over and over again, but nobody really wants to do them. Some good examples include code review, improving test coverage, upgrading dependency libraries. In addition to having clear success criteria, you can create "[skills](/overview/skills)" that clearly describe your policies about how to perform these tasks, and improve the skills over time. +- **Helping Answer Questions:** OpenHands agents are generally pretty good at answering questions about code bases, so you can feel free to ask them when you don't understand how something works. They can explore the code base and understand it deeply before providing an answer. +- **Checking the Correctness of Library/Backend Code:** when agents work, they can run code, and they are particularly good at checking whether libraries or backend code works well. +- **Reading Logs and Understanding Errors:** Agents can read blogs from GitHub or monitoring software and understand what is going wrong with your service in a live production setting. They're actually quite good at filtering through large amounts of data, especially if pushed in the correct direction. + +There are also some tasks where agent struggle a little more. + +- **Quality Assurance of Frontend Apps:** Agents can spin up a website and check whether it works by clicking through the buttons. But they are a little bit less good at visual understanding of frontends at the moment and can sometimes make mistakes if they don't understand the workflow very well. +- **Implementing Code they Cannot Test Live:** If agents are not able to actually run and test the app, such as connecting to a live service that they do not have access to, often they will fail at performing tasks all the way to the end, unless they get some encouragement. + +### Tutorial Library +Source: https://docs.openhands.dev/openhands/usage/get-started/tutorials.md + +Welcome to the OpenHands tutorial library. These tutorials show you how to use OpenHands for common development tasks, from testing to feature development. Each tutorial includes example prompts, expected workflows, and tips for success. + +## Categories Overview + +| Category | Best For | Complexity | +|----------|----------|------------| +| [Testing](#testing) | Adding tests, improving coverage | Simple to Medium | +| [Data Analysis](#data-analysis) | Processing data, generating reports | Simple to Medium | +| [Web Scraping](#web-scraping) | Extracting data from websites | Medium | +| [Code Review](#code-review) | Analyzing PRs, finding issues | Simple | +| [Bug Fixing](#bug-fixing) | Diagnosing and fixing errors | Medium | +| [Feature Development](#feature-development) | Building new functionality | Medium to Complex | + + +For in-depth guidance on specific use cases, see our [Use Cases](/openhands/usage/use-cases/code-review) section which includes detailed workflows for Code Review, Incident Triage, and more. + + +## Task Complexity Guidance + +Before starting, assess your task's complexity: + +**Simple tasks** (5-15 minutes): +- Single file changes +- Clear, well-defined requirements +- Existing patterns to follow + +**Medium tasks** (15-45 minutes): +- Multiple file changes +- Some discovery required +- Integration with existing code + +**Complex tasks** (45+ minutes): +- Architectural changes +- Multiple components +- Requires iteration + + +Start with simpler tutorials to build familiarity with OpenHands before tackling complex tasks. + + +## Best Use Cases + +OpenHands excels at: + +- **Repetitive tasks**: Boilerplate code, test generation +- **Pattern application**: Following established conventions +- **Analysis**: Code review, debugging, documentation +- **Exploration**: Understanding new codebases + +## Example Tutorials by Category + +### Testing + +#### Tutorial: Add Unit Tests for a Module + +**Goal**: Achieve 80%+ test coverage for a service module + +**Prompt**: +``` +Add unit tests for the UserService class in src/services/user.js. + +Current coverage: 35% +Target coverage: 80% + +Requirements: +1. Test all public methods +2. Cover edge cases (null inputs, empty arrays, etc.) +3. Mock external dependencies (database, API calls) +4. Follow our existing test patterns in tests/services/ +5. Use Jest as the testing framework + +Focus on these methods: +- createUser() +- updateUser() +- deleteUser() +- getUserById() +``` + +**What OpenHands does**: +1. Analyzes the UserService class +2. Identifies untested code paths +3. Creates test file with comprehensive tests +4. Mocks dependencies appropriately +5. Runs tests to verify they pass + +**Tips**: +- Provide existing test files as examples +- Specify the testing framework +- Mention any mocking conventions + +--- + +#### Tutorial: Add Integration Tests for an API + +**Goal**: Test API endpoints end-to-end + +**Prompt**: +``` +Add integration tests for the /api/products endpoints. + +Endpoints to test: +- GET /api/products (list all) +- GET /api/products/:id (get one) +- POST /api/products (create) +- PUT /api/products/:id (update) +- DELETE /api/products/:id (delete) + +Requirements: +1. Use our test database (configured in jest.config.js) +2. Set up and tear down test data properly +3. Test success cases and error cases +4. Verify response bodies and status codes +5. Follow patterns in tests/integration/ +``` + +--- + +### Data Analysis + +#### Tutorial: Create a Data Processing Script + +**Goal**: Process CSV data and generate a report + +**Prompt**: +``` +Create a Python script to analyze our sales data. + +Input: sales_data.csv with columns: date, product, quantity, price, region + +Requirements: +1. Load and validate the CSV data +2. Calculate: + - Total revenue by product + - Monthly sales trends + - Top 5 products by quantity + - Revenue by region +3. Generate a summary report (Markdown format) +4. Create visualizations (bar chart for top products, line chart for trends) +5. Save results to reports/ directory + +Use pandas for data processing and matplotlib for charts. +``` + +**What OpenHands does**: +1. Creates a Python script with proper structure +2. Implements data loading with validation +3. Calculates requested metrics +4. Generates formatted report +5. Creates and saves visualizations + +--- + +#### Tutorial: Database Query Analysis + +**Goal**: Analyze and optimize slow database queries + +**Prompt**: +``` +Analyze our slow query log and identify optimization opportunities. + +File: logs/slow_queries.log + +For each slow query: +1. Explain why it's slow +2. Suggest index additions if helpful +3. Rewrite the query if it can be optimized +4. Estimate the improvement + +Create a report in reports/query_optimization.md with: +- Summary of findings +- Prioritized recommendations +- SQL for suggested changes +``` + +--- + +### Web Scraping + +#### Tutorial: Build a Web Scraper + +**Goal**: Extract product data from a website + +**Prompt**: +``` +Create a web scraper to extract product information from our competitor's site. + +Target URL: https://example-store.com/products + +Extract for each product: +- Name +- Price +- Description +- Image URL +- SKU (if available) + +Requirements: +1. Use Python with BeautifulSoup or Scrapy +2. Handle pagination (site has 50 pages) +3. Respect rate limits (1 request/second) +4. Save results to products.json +5. Handle errors gracefully +6. Log progress to console + +Include a README with usage instructions. +``` + +**Tips**: +- Specify rate limiting requirements +- Mention error handling expectations +- Request logging for debugging + +--- + +### Code Review + + +For comprehensive code review guidance, see the [Code Review Use Case](/openhands/usage/use-cases/code-review) page. For automated PR reviews using GitHub Actions, see the [PR Review SDK Guide](/sdk/guides/github-workflows/pr-review). + + +#### Tutorial: Security-Focused Code Review + +**Goal**: Identify security vulnerabilities in a PR + +**Prompt**: +``` +Review this pull request for security issues: + +Focus areas: +1. Input validation - check all user inputs are sanitized +2. Authentication - verify auth checks are in place +3. SQL injection - check for parameterized queries +4. XSS - verify output encoding +5. Sensitive data - ensure no secrets in code + +For each issue found, provide: +- File and line number +- Severity (Critical/High/Medium/Low) +- Description of the vulnerability +- Suggested fix with code example + +Output format: Markdown suitable for PR comments +``` + +--- + +#### Tutorial: Performance Review + +**Goal**: Identify performance issues in code + +**Prompt**: +``` +Review the OrderService class for performance issues. + +File: src/services/order.js + +Check for: +1. N+1 database queries +2. Missing indexes (based on query patterns) +3. Inefficient loops or algorithms +4. Missing caching opportunities +5. Unnecessary data fetching + +For each issue: +- Explain the impact +- Show the problematic code +- Provide an optimized version +- Estimate the improvement +``` + +--- + +### Bug Fixing + + +For production incident investigation and automated error analysis, see the [Incident Triage Use Case](/openhands/usage/use-cases/incident-triage) which covers integration with monitoring tools like Datadog. + + +#### Tutorial: Fix a Crash Bug + +**Goal**: Diagnose and fix an application crash + +**Prompt**: +``` +Fix the crash in the checkout process. + +Error: +TypeError: Cannot read property 'price' of undefined + at calculateTotal (src/checkout/calculator.js:45) + at processOrder (src/checkout/processor.js:23) + +Steps to reproduce: +1. Add item to cart +2. Apply discount code "SAVE20" +3. Click checkout +4. Crash occurs + +The bug was introduced in commit abc123 (yesterday's deployment). + +Requirements: +1. Identify the root cause +2. Fix the bug +3. Add a regression test +4. Verify the fix doesn't break other functionality +``` + +**What OpenHands does**: +1. Analyzes the stack trace +2. Reviews recent changes +3. Identifies the null reference issue +4. Implements a defensive fix +5. Creates test to prevent regression + +--- + +#### Tutorial: Fix a Memory Leak + +**Goal**: Identify and fix a memory leak + +**Prompt**: +``` +Investigate and fix the memory leak in our Node.js application. + +Symptoms: +- Memory usage grows 100MB/hour +- After 24 hours, app becomes unresponsive +- Restarting temporarily fixes the issue + +Suspected areas: +- Event listeners in src/events/ +- Cache implementation in src/cache/ +- WebSocket connections in src/ws/ + +Analyze these areas and: +1. Identify the leak source +2. Explain why it's leaking +3. Implement a fix +4. Add monitoring to detect future leaks +``` + +--- + +### Feature Development + +#### Tutorial: Add a REST API Endpoint + +**Goal**: Create a new API endpoint with full functionality + +**Prompt**: +``` +Add a user preferences API endpoint. + +Endpoint: /api/users/:id/preferences + +Operations: +- GET: Retrieve user preferences +- PUT: Update user preferences +- PATCH: Partially update preferences + +Preferences schema: +{ + theme: "light" | "dark", + notifications: { email: boolean, push: boolean }, + language: string, + timezone: string +} + +Requirements: +1. Follow patterns in src/api/routes/ +2. Add request validation with Joi +3. Use UserPreferencesService for business logic +4. Add appropriate error handling +5. Document the endpoint in OpenAPI format +6. Add unit and integration tests +``` + +**What OpenHands does**: +1. Creates route handler following existing patterns +2. Implements validation middleware +3. Creates or updates the service layer +4. Adds error handling +5. Generates API documentation +6. Creates comprehensive tests + +--- + +#### Tutorial: Implement a Feature Flag System + +**Goal**: Add feature flags to the application + +**Prompt**: +``` +Implement a feature flag system for our application. + +Requirements: +1. Create a FeatureFlags service +2. Support these flag types: + - Boolean (on/off) + - Percentage (gradual rollout) + - User-based (specific user IDs) +3. Load flags from environment variables initially +4. Add a React hook: useFeatureFlag(flagName) +5. Add middleware for API routes + +Initial flags to configure: +- new_checkout: boolean, default false +- dark_mode: percentage, default 10% +- beta_features: user-based + +Include documentation and tests. +``` + +--- + +## Contributing Tutorials + +Have a great use case? Share it with the community! + +**What makes a good tutorial:** +- Solves a common problem +- Has clear, reproducible steps +- Includes example prompts +- Explains expected outcomes +- Provides tips for success + +**How to contribute:** +1. Create a detailed example following this format +2. Test it with OpenHands to verify it works +3. Submit via GitHub pull request to the docs repository +4. Include any prerequisites or setup required + + +These tutorials are starting points. The best results come from adapting them to your specific codebase, conventions, and requirements. + + +### Key Features +Source: https://docs.openhands.dev/openhands/usage/key-features.md + + + + - Displays the conversation between the user and OpenHands. + - OpenHands explains its actions in this panel. + + ![overview](/openhands/static/img/chat-panel.png) + + + - Shows the file changes performed by OpenHands. + + ![overview](/openhands/static/img/changes-tab.png) + + + - Embedded VS Code for browsing and modifying files. + - Can also be used to upload and download files. + + ![overview](/openhands/static/img/vs-tab.png) + + + - A space for OpenHands and users to run terminal commands. + + ![overview](/openhands/static/img/terminal-tab.png) + + + - Displays the web server when OpenHands runs an application. + - Users can interact with the running application. + + ![overview](/openhands/static/img/app-tab.png) + + + - Used by OpenHands to browse websites. + - The browser is non-interactive. + + ![overview](/openhands/static/img/browser-tab.png) + + + +### Azure +Source: https://docs.openhands.dev/openhands/usage/llms/azure-llms.md + +## Azure OpenAI Configuration + +When running OpenHands, you'll need to set the following environment variable using `-e` in the +docker run command: + +``` +LLM_API_VERSION="" # e.g. "2023-05-15" +``` + +Example: +```bash +docker run -it --pull=always \ + -e LLM_API_VERSION="2023-05-15" + ... +``` + +Then in the OpenHands UI Settings under the `LLM` tab: + + +You will need your ChatGPT deployment name which can be found on the deployments page in Azure. This is referenced as +<deployment-name> below. + + +1. Enable `Advanced` options. +2. Set the following: + - `Custom Model` to azure/<deployment-name> + - `Base URL` to your Azure API Base URL (e.g. `https://example-endpoint.openai.azure.com`) + - `API Key` to your Azure API key + +### Azure OpenAI Configuration + +When running OpenHands, set the following environment variable using `-e` in the +docker run command: + +``` +LLM_API_VERSION="" # e.g. "2024-02-15-preview" +``` + +### Custom LLM Configurations +Source: https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs.md + +## How It Works + +Named LLM configurations are defined in the `config.toml` file using sections that start with `llm.`. For example: + +```toml +# Default LLM configuration +[llm] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.0 + +# Custom LLM configuration for a cheaper model +[llm.gpt3] +model = "gpt-3.5-turbo" +api_key = "your-api-key" +temperature = 0.2 + +# Another custom configuration with different parameters +[llm.high-creativity] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.8 +top_p = 0.9 +``` + +Each named configuration inherits all settings from the default `[llm]` section and can override any of those settings. You can define as many custom configurations as needed. + +## Using Custom Configurations + +### With Agents + +You can specify which LLM configuration an agent should use by setting the `llm_config` parameter in the agent's configuration section: + +```toml +[agent.RepoExplorerAgent] +# Use the cheaper GPT-3 configuration for this agent +llm_config = 'gpt3' + +[agent.CodeWriterAgent] +# Use the high creativity configuration for this agent +llm_config = 'high-creativity' +``` + +### Configuration Options + +Each named LLM configuration supports all the same options as the default LLM configuration. These include: + +- Model selection (`model`) +- API configuration (`api_key`, `base_url`, etc.) +- Model parameters (`temperature`, `top_p`, etc.) +- Retry settings (`num_retries`, `retry_multiplier`, etc.) +- Token limits (`max_input_tokens`, `max_output_tokens`) +- And all other LLM configuration options + +For a complete list of available options, see the LLM Configuration section in the [Configuration Options](/openhands/usage/advanced/configuration-options) documentation. + +## Use Cases + +Custom LLM configurations are particularly useful in several scenarios: + +- **Cost Optimization**: Use cheaper models for tasks that don't require high-quality responses, like repository exploration or simple file operations. +- **Task-Specific Tuning**: Configure different temperature and top_p values for tasks that require different levels of creativity or determinism. +- **Different Providers**: Use different LLM providers or API endpoints for different tasks. +- **Testing and Development**: Easily switch between different model configurations during development and testing. + +## Example: Cost Optimization + +A practical example of using custom LLM configurations to optimize costs: + +```toml +# Default configuration using GPT-4 for high-quality responses +[llm] +model = "gpt-4" +api_key = "your-api-key" +temperature = 0.0 + +# Cheaper configuration for repository exploration +[llm.repo-explorer] +model = "gpt-3.5-turbo" +temperature = 0.2 + +# Configuration for code generation +[llm.code-gen] +model = "gpt-4" +temperature = 0.0 +max_output_tokens = 2000 + +[agent.RepoExplorerAgent] +llm_config = 'repo-explorer' + +[agent.CodeWriterAgent] +llm_config = 'code-gen' +``` + +In this example: +- Repository exploration uses a cheaper model since it mainly involves understanding and navigating code +- Code generation uses GPT-4 with a higher token limit for generating larger code blocks +- The default configuration remains available for other tasks + +# Custom Configurations with Reserved Names + +OpenHands can use custom LLM configurations named with reserved names, for specific use cases. If you specify the model and other settings under the reserved names, then OpenHands will load and them for a specific purpose. As of now, one such configuration is implemented: draft editor. + +## Draft Editor Configuration + +The `draft_editor` configuration is a group of settings you can provide, to specify the model to use for preliminary drafting of code edits, for any tasks that involve editing and refining code. You need to provide it under the section `[llm.draft_editor]`. + +For example, you can define in `config.toml` a draft editor like this: + +```toml +[llm.draft_editor] +model = "gpt-4" +temperature = 0.2 +top_p = 0.95 +presence_penalty = 0.0 +frequency_penalty = 0.0 +``` + +This configuration: +- Uses GPT-4 for high-quality edits and suggestions +- Sets a low temperature (0.2) to maintain consistency while allowing some flexibility +- Uses a high top_p value (0.95) to consider a wide range of token options +- Disables presence and frequency penalties to maintain focus on the specific edits needed + +Use this configuration when you want to let an LLM draft edits before making them. In general, it may be useful to: +- Review and suggest code improvements +- Refine existing content while maintaining its core meaning +- Make precise, focused changes to code or text + + +Custom LLM configurations are only available when using OpenHands in development mode, via `main.py` or `cli.py`. When running via `docker run`, please use the standard configuration options. + + +### Google Gemini/Vertex +Source: https://docs.openhands.dev/openhands/usage/llms/google-llms.md + +## Gemini - Google AI Studio Configs + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `Gemini` +- `LLM Model` to the model you will be using. +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` +(e.g. gemini/<model-name> like `gemini/gemini-2.0-flash`). +- `API Key` to your Gemini API key + +## VertexAI - Google Cloud Platform Configs + +To use Vertex AI through Google Cloud Platform when running OpenHands, you'll need to set the following environment +variables using `-e` in the docker run command: + +``` +GOOGLE_APPLICATION_CREDENTIALS="" +VERTEXAI_PROJECT="" +VERTEXAI_LOCATION="" +``` + +Then set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `VertexAI` +- `LLM Model` to the model you will be using. +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` +(e.g. vertex_ai/<model-name>). + +### Groq +Source: https://docs.openhands.dev/openhands/usage/llms/groq.md + +## Configuration + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `Groq` +- `LLM Model` to the model you will be using. [Visit here to see the list of +models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list, +enable `Advanced` options, and enter it in `Custom Model` (e.g. groq/<model-name> like `groq/llama3-70b-8192`). +- `API key` to your Groq API key. To find or create your Groq API Key, [see here](https://console.groq.com/keys). + +## Using Groq as an OpenAI-Compatible Endpoint + +The Groq endpoint for chat completion is [mostly OpenAI-compatible](https://console.groq.com/docs/openai). Therefore, you can access Groq models as you +would access any OpenAI-compatible endpoint. In the OpenHands UI through the Settings under the `LLM` tab: +1. Enable `Advanced` options +2. Set the following: + - `Custom Model` to the prefix `openai/` + the model you will be using (e.g. `openai/llama3-70b-8192`) + - `Base URL` to `https://api.groq.com/openai/v1` + - `API Key` to your Groq API key + +### LiteLLM Proxy +Source: https://docs.openhands.dev/openhands/usage/llms/litellm-proxy.md + +## Configuration + +To use LiteLLM proxy with OpenHands, you need to: + +1. Set up a LiteLLM proxy server (see [LiteLLM documentation](https://docs.litellm.ai/docs/proxy/quick_start)) +2. When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: + * Enable `Advanced` options + * `Custom Model` to the prefix `litellm_proxy/` + the model you will be using (e.g. `litellm_proxy/anthropic.claude-3-5-sonnet-20241022-v2:0`) + * `Base URL` to your LiteLLM proxy URL (e.g. `https://your-litellm-proxy.com`) + * `API Key` to your LiteLLM proxy API key + +## Supported Models + +The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy +is configured to handle. + +Refer to your LiteLLM proxy configuration for the list of available models and their names. + +### Overview +Source: https://docs.openhands.dev/openhands/usage/llms/llms.md + + +This section is for users who want to connect OpenHands to different LLMs. + + + +OpenHands now delegates all LLM orchestration to the Agent SDK. The guidance on this +page focuses on how the OpenHands interfaces surface those capabilities. When in doubt, refer to the SDK documentation +for the canonical list of supported parameters. + + +## Model Recommendations + +Based on our evaluations of language models for coding tasks (using the SWE-bench dataset), we can provide some +recommendations for model selection. Our latest benchmarking results can be found in +[this spreadsheet](https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0). + +Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands: + +### Cloud / API-Based Models + +- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended) +- [anthropic/claude-sonnet-4-5-20250929](https://www.anthropic.com/api) (recommended) +- [openai/gpt-5-2025-08-07](https://openai.com/api/) (recommended) +- [gemini/gemini-3-pro-preview](https://blog.google/products/gemini/gemini-3/) +- [deepseek/deepseek-chat](https://api-docs.deepseek.com/) +- [moonshot/kimi-k2-0711-preview](https://platform.moonshot.ai/docs/pricing/chat#generation-model-kimi-k2) + +If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process +to help others using the same provider! + +For a full list of the providers and models available, please consult the +[litellm documentation](https://docs.litellm.ai/docs/providers). + + +OpenHands will issue many prompts to the LLM you configure. Most of these LLMs cost money, so be sure to set spending +limits and monitor usage. + + +### Local / Self-Hosted Models + +- [mistralai/devstral-small](https://openhands.dev/blog/devstral-a-new-state-of-the-art-open-model-for-coding-agents) (20 May 2025) -- also available through [OpenRouter](https://openrouter.ai/mistralai/devstral-small:free) +- [all-hands/openhands-lm-32b-v0.1](https://openhands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) (31 March 2025) -- also available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1) + +### Known Issues + + +Most current local and open source models are not as powerful. When using such models, you may see long +wait times between messages, poor responses, or errors about malformed JSON. OpenHands can only be as powerful as the +models driving it. However, if you do find ones that work, please add them to the verified list above. + + +## LLM Configuration + +The following can be set in the OpenHands UI through the Settings. Each option is serialized into the +`LLM.load_from_env()` schema before being passed to the Agent SDK: + +- `LLM Provider` +- `LLM Model` +- `API Key` +- `Base URL` (through `Advanced` settings) + +There are some settings that may be necessary for certain providers that cannot be set directly through the UI. Set them +as environment variables (or add them to your `config.toml`) so the SDK picks them up during startup: + +- `LLM_API_VERSION` +- `LLM_EMBEDDING_MODEL` +- `LLM_EMBEDDING_DEPLOYMENT_NAME` +- `LLM_DROP_PARAMS` +- `LLM_DISABLE_VISION` +- `LLM_CACHING_PROMPT` + +## LLM Provider Guides + +We have a few guides for running OpenHands with specific model providers: + +- [Azure](/openhands/usage/llms/azure-llms) +- [Google](/openhands/usage/llms/google-llms) +- [Groq](/openhands/usage/llms/groq) +- [Local LLMs with SGLang or vLLM](/openhands/usage/llms/local-llms) +- [LiteLLM Proxy](/openhands/usage/llms/litellm-proxy) +- [Moonshot AI](/openhands/usage/llms/moonshot) +- [OpenAI](/openhands/usage/llms/openai-llms) +- [OpenHands](/openhands/usage/llms/openhands-llms) +- [OpenRouter](/openhands/usage/llms/openrouter) + +These pages remain the authoritative provider references for both the Agent SDK +and the OpenHands interfaces. + +## Model Customization + +LLM providers have specific settings that can be customized to optimize their performance with OpenHands, such as: + +- **Custom Tokenizers**: For specialized models, you can add a suitable tokenizer. +- **Native Tool Calling**: Toggle native function/tool calling capabilities. + +For detailed information about model customization, see +[LLM Configuration Options](/openhands/usage/advanced/configuration-options#llm-configuration). + +### API retries and rate limits + +LLM providers typically have rate limits, sometimes very low, and may require retries. OpenHands will automatically +retry requests if it receives a Rate Limit Error (429 error code). + +You can customize these options as you need for the provider you're using. Check their documentation, and set the +following environment variables to control the number of retries and the time between retries: + +- `LLM_NUM_RETRIES` (Default of 4 times) +- `LLM_RETRY_MIN_WAIT` (Default of 5 seconds) +- `LLM_RETRY_MAX_WAIT` (Default of 30 seconds) +- `LLM_RETRY_MULTIPLIER` (Default of 2) + +If you are running OpenHands in development mode, you can also set these options in the `config.toml` file: + +```toml +[llm] +num_retries = 4 +retry_min_wait = 5 +retry_max_wait = 30 +retry_multiplier = 2 +``` + +### Local LLMs +Source: https://docs.openhands.dev/openhands/usage/llms/local-llms.md + +## News + +- 2025/12/12: We now recommend two powerful local models for OpenHands: [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) and [Devstral Small 2 (24B)](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Both models deliver excellent performance on coding tasks and work great with OpenHands! + +## Quickstart: Running OpenHands with a Local LLM using LM Studio + +This guide explains how to serve a local LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it. + +We recommend: +- **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration. +- **Qwen3-Coder-30B-A3B-Instruct** as the LLM for software development. This model is optimized for coding tasks and works excellently with agent-style workflows like OpenHands. + +### Hardware Requirements + +Running Qwen3-Coder-30B-A3B-Instruct requires: +- A recent GPU with at least 12GB of VRAM (tested on RTX 3060 with 12GB VRAM + 64GB RAM), or +- A Mac with Apple Silicon with at least 32GB of RAM + +### 1. Install LM Studio + +Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstudio.ai/). + +### 2. Download the Model + +1. Make sure to set the User Interface Complexity Level to "Power User", by clicking on the appropriate label at the bottom of the window. +2. Click the "Discover" button (Magnifying Glass icon) on the left navigation bar to open the Models download page. + +![image](./screenshots/01_lm_studio_open_model_hub.png) + +3. Search for **"Qwen3-Coder-30B-A3B-Instruct"**, confirm you're downloading from the official Qwen publisher, then proceed to download. + +![image](./screenshots/02_lm_studio_download_devstral.png) + +4. Wait for the download to finish. + +### 3. Load the Model + +1. Click the "Developer" button (Console icon) on the left navigation bar to open the Developer Console. +2. Click the "Select a model to load" dropdown at the top of the application window. + +![image](./screenshots/03_lm_studio_open_load_model.png) + +3. Enable the "Manually choose model load parameters" switch. +4. Select **Qwen3-Coder-30B-A3B-Instruct** from the model list. + +![image](./screenshots/04_lm_studio_setup_devstral_part_1.png) + +5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings. +6. Set "Context Length" to at least 22000 (for lower VRAM systems) or 32768 (recommended for better performance) and enable Flash Attention. +7. Click "Load Model" to start loading the model. + +![image](./screenshots/05_lm_studio_setup_devstral_part_2.png) + +### 4. Start the LLM server + +1. Enable the switch next to "Status" at the top-left of the Window. +2. Take note of the Model API Identifier shown on the sidebar on the right. + +![image](./screenshots/06_lm_studio_start_server.png) + +### 5. Start OpenHands + +1. Check [the installation guide](/openhands/usage/run-openhands/local-setup) and ensure all prerequisites are met before running OpenHands, then run: + +```bash +docker run -it --rm --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e LOG_ALL_EVENTS=true \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:1.4 +``` + +2. Wait until the server is running (see log below): +``` +Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f +Status: Image is up to date for docker.openhands.dev/openhands/openhands:1.4 +Starting OpenHands... +Running OpenHands as root +14:22:13 - openhands:INFO: server_config.py:50 - Using config class None +INFO: Started server process [8] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit) +``` + +3. Visit `http://localhost:3000` in your browser. + +### 6. Configure OpenHands to use the LLM server + +Once you open OpenHands in your browser, you'll need to configure it to use the local LLM server you just started. + +When started for the first time, OpenHands will prompt you to set up the LLM provider. + +1. Click "see advanced settings" to open the LLM Settings page. + +![image](./screenshots/07_openhands_open_advanced_settings.png) + +2. Enable the "Advanced" switch at the top of the page to show all the available settings. + +3. Set the following values: + - **Custom Model**: `openai/qwen/qwen3-coder-30b-a3b-instruct` (the Model API identifier from LM Studio, prefixed with "openai/") + - **Base URL**: `http://host.docker.internal:1234/v1` + - **API Key**: `local-llm` + +4. Click "Save Settings" to save the configuration. + +![image](./screenshots/08_openhands_configure_local_llm_parameters.png) + +That's it! You can now start using OpenHands with the local LLM server. + +If you encounter any issues, let us know on [Slack](https://openhands.dev/joinslack). + +## Advanced: Alternative LLM Backends + +This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio. + +### Create an OpenAI-Compatible Endpoint with Ollama + +- Install Ollama following [the official documentation](https://ollama.com/download). +- Example launch command for Qwen3-Coder-30B-A3B-Instruct: + +```bash +# ⚠️ WARNING: OpenHands requires a large context size to work properly. +# When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 22000. +# The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly. +OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve & +ollama pull qwen3-coder:30b +``` + +### Create an OpenAI-Compatible Endpoint with vLLM or SGLang + +First, download the model checkpoint: + +```bash +huggingface-cli download Qwen/Qwen3-Coder-30B-A3B-Instruct --local-dir Qwen/Qwen3-Coder-30B-A3B-Instruct +``` + +#### Serving the model using SGLang + +- Install SGLang following [the official documentation](https://docs.sglang.io/get_started/install.html). +- Example launch command (with at least 2 GPUs): + +```bash +SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \ + --model Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --port 8000 \ + --tp 2 --dp 1 \ + --host 0.0.0.0 \ + --api-key mykey --context-length 131072 +``` + +#### Serving the model using vLLM + +- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html). +- Example launch command (with at least 2 GPUs): + +```bash +vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --host 0.0.0.0 --port 8000 \ + --api-key mykey \ + --tensor-parallel-size 2 \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --enable-prefix-caching +``` + +If you are interested in further improved inference speed, you can also try Snowflake's version +of vLLM, [ArcticInference](https://www.snowflake.com/en/engineering-blog/fast-speculative-decoding-vllm-arctic/), +which can achieve up to 2x speedup in some cases. + +1. Install the Arctic Inference library that automatically patches vLLM: + +```bash +pip install git+https://github.com/snowflakedb/ArcticInference.git +``` + +2. Run the launch command with speculative decoding enabled: + +```bash +vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \ + --host 0.0.0.0 --port 8000 \ + --api-key mykey \ + --tensor-parallel-size 2 \ + --served-model-name Qwen3-Coder-30B-A3B-Instruct \ + --speculative-config '{"method": "suffix"}' +``` + +### Run OpenHands (Alternative Backends) + +#### Using Docker + +Run OpenHands using [the official docker run command](/openhands/usage/run-openhands/local-setup). + +#### Using Development Mode + +Use the instructions in [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md) to build OpenHands. + +Start OpenHands using `make run`. + +### Configure OpenHands (Alternative Backends) + +Once OpenHands is running, open the Settings page in the UI and go to the `LLM` tab. + +1. Click **"see advanced settings"** to access the full configuration panel. +2. Enable the **Advanced** toggle at the top of the page. +3. Set the following parameters, if you followed the examples above: + - **Custom Model**: `openai/` + - For **Ollama**: `openai/qwen3-coder:30b` + - For **SGLang/vLLM**: `openai/Qwen3-Coder-30B-A3B-Instruct` + - **Base URL**: `http://host.docker.internal:/v1` + Use port `11434` for Ollama, or `8000` for SGLang and vLLM. + - **API Key**: + - For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`) + - For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`) + +### Moonshot AI +Source: https://docs.openhands.dev/openhands/usage/llms/moonshot.md + +## Using Moonshot AI with OpenHands + +[Moonshot AI](https://platform.moonshot.ai/) offers several powerful models, including Kimi-K2, which has been verified to work well with OpenHands. + +### Setup + +1. Sign up for an account at [Moonshot AI Platform](https://platform.moonshot.ai/) +2. Generate an API key from your account settings +3. Configure OpenHands to use Moonshot AI: + +| Setting | Value | +| --- | --- | +| LLM Provider | `moonshot` | +| LLM Model | `kimi-k2-0711-preview` | +| API Key | Your Moonshot API key | + +### Recommended Models + +- `moonshot/kimi-k2-0711-preview` - Kimi-K2 is Moonshot's most powerful model with a 131K context window, function calling support, and web search capabilities. + +### OpenAI +Source: https://docs.openhands.dev/openhands/usage/llms/openai-llms.md + +## Configuration + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +* `LLM Provider` to `OpenAI` +* `LLM Model` to the model you will be using. +[Visit here to see a full list of OpenAI models that LiteLLM supports.](https://docs.litellm.ai/docs/providers/openai#openai-chat-completion-models) +If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model` (e.g. openai/<model-name> like `openai/gpt-4o`). +* `API Key` to your OpenAI API key. To find or create your OpenAI Project API Key, [see here](https://platform.openai.com/api-keys). + +## Using OpenAI-Compatible Endpoints + +Just as for OpenAI Chat completions, we use LiteLLM for OpenAI-compatible endpoints. You can find their full documentation on this topic [here](https://docs.litellm.ai/docs/providers/openai_compatible). + +## Using an OpenAI Proxy + +If you're using an OpenAI proxy, in the OpenHands UI through the Settings under the `LLM` tab: +1. Enable `Advanced` options +2. Set the following: + - `Custom Model` to openai/<model-name> (e.g. `openai/gpt-4o` or openai/<proxy-prefix>/<model-name>) + - `Base URL` to the URL of your OpenAI proxy + - `API Key` to your OpenAI API key + +### OpenHands +Source: https://docs.openhands.dev/openhands/usage/llms/openhands-llms.md + +## Obtain Your OpenHands LLM API Key + +1. [Log in to OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). +2. Go to the Settings page and navigate to the `API Keys` tab. +3. Copy your `LLM API Key`. + +![OpenHands LLM API Key](/openhands/static/img/openhands-llm-api-key.png) + +## Configuration + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +- `LLM Provider` to `OpenHands` +- `LLM Model` to the model you will be using (e.g. claude-sonnet-4-20250514 or claude-sonnet-4-5-20250929) +- `API Key` to your OpenHands LLM API key copied from above + +## Using OpenHands LLM Provider in the CLI + +1. [Run OpenHands CLI](/openhands/usage/cli/quick-start). +2. To select OpenHands as the LLM provider: + - If this is your first time running the CLI, choose `openhands` and then select the model that you would like to use. + - If you have previously run the CLI, run the `/settings` command and select to modify the `Basic` settings. Then + choose `openhands` and finally the model. + +![OpenHands Provider in CLI](/openhands/static/img/openhands-provider-cli.png) + + + +When you use OpenHands as an LLM provider in the CLI, we may collect minimal usage metadata and send it to All Hands AI. For details, see our Privacy Policy: https://openhands.dev/privacy + + +## Using OpenHands LLM Provider with the SDK + +You can use your OpenHands API key with the [OpenHands SDK](https://docs.openhands.dev/sdk) to build custom agents and automation pipelines. + +### Configuration + +The SDK automatically configures the correct API endpoint when you use the `openhands/` model prefix. Simply set two environment variables: + +```bash +export LLM_API_KEY="your-openhands-api-key" +export LLM_MODEL="openhands/claude-sonnet-4-20250514" +``` + +### Example + +```python +from openhands.sdk import LLM + +# The openhands/ prefix auto-configures the base URL +llm = LLM.load_from_env() + +# Or configure directly +llm = LLM( + model="openhands/claude-sonnet-4-20250514", + api_key="your-openhands-api-key", +) +``` + +The `openhands/` prefix tells the SDK to automatically route requests to the OpenHands LLM proxy—no need to manually set a base URL. + +### Available Models + +When using the SDK, prefix any model from the pricing table below with `openhands/`: +- `openhands/claude-sonnet-4-20250514` +- `openhands/claude-sonnet-4-5-20250929` +- `openhands/claude-opus-4-20250514` +- `openhands/gpt-5-2025-08-07` +- etc. + + +If your network has firewall restrictions, ensure the `all-hands.dev` domain is allowed. The SDK connects to `llm-proxy.app.all-hands.dev`. + + +## Pricing + +Pricing follows official API provider rates. Below are the current pricing details for OpenHands models: + + +| Model | Input Cost (per 1M tokens) | Cached Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Max Input Tokens | Max Output Tokens | +|-------|----------------------------|-----------------------------------|------------------------------|------------------|-------------------| +| claude-sonnet-4-5-20250929 | $3.00 | $0.30 | $15.00 | 200,000 | 64,000 | +| claude-sonnet-4-20250514 | $3.00 | $0.30 | $15.00 | 1,000,000 | 64,000 | +| claude-opus-4-20250514 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | +| claude-opus-4-1-20250805 | $15.00 | $1.50 | $75.00 | 200,000 | 32,000 | +| claude-haiku-4-5-20251001 | $1.00 | $0.10 | $5.00 | 200,000 | 64,000 | +| gpt-5-codex | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | +| gpt-5-2025-08-07 | $1.25 | $0.125 | $10.00 | 272,000 | 128,000 | +| gpt-5-mini-2025-08-07 | $0.25 | $0.025 | $2.00 | 272,000 | 128,000 | +| devstral-medium-2507 | $0.40 | N/A | $2.00 | 128,000 | 128,000 | +| devstral-small-2507 | $0.10 | N/A | $0.30 | 128,000 | 128,000 | +| o3 | $2.00 | $0.50 | $8.00 | 200,000 | 100,000 | +| o4-mini | $1.10 | $0.275 | $4.40 | 200,000 | 100,000 | +| gemini-3-pro-preview | $2.00 | $0.20 | $12.00 | 1,048,576 | 65,535 | +| kimi-k2-0711-preview | $0.60 | $0.15 | $2.50 | 131,072 | 131,072 | +| qwen3-coder-480b | $0.40 | N/A | $1.60 | N/A | N/A | + +**Note:** Prices listed reflect provider rates with no markup, sourced via LiteLLM’s model price database and provider pricing pages. Cached input tokens are charged at a reduced rate when the same content is reused across requests. Models that don't support prompt caching show "N/A" for cached input cost. + +### OpenRouter +Source: https://docs.openhands.dev/openhands/usage/llms/openrouter.md + +## Configuration + +When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab: +* `LLM Provider` to `OpenRouter` +* `LLM Model` to the model you will be using. +[Visit here to see a full list of OpenRouter models](https://openrouter.ai/models). +If the model is not in the list, enable `Advanced` options, and enter it in +`Custom Model` (e.g. openrouter/<model-name> like `openrouter/anthropic/claude-3.5-sonnet`). +* `API Key` to your OpenRouter API key. + +### OpenHands GitHub Action +Source: https://docs.openhands.dev/openhands/usage/run-openhands/github-action.md + +## Using the Action in the OpenHands Repository + +To use the OpenHands GitHub Action in a repository, you can: + +1. Create an issue in the repository. +2. Add the `fix-me` label to the issue or leave a comment on the issue starting with `@openhands-agent`. + +The action will automatically trigger and attempt to resolve the issue. + +## Installing the Action in a New Repository + +To install the OpenHands GitHub Action in your own repository, follow +the [README for the OpenHands Resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md). + +## Usage Tips + +### Iterative resolution + +1. Create an issue in the repository. +2. Add the `fix-me` label to the issue, or leave a comment starting with `@openhands-agent`. +3. Review the attempt to resolve the issue by checking the pull request. +4. Follow up with feedback through general comments, review comments, or inline thread comments. +5. Add the `fix-me` label to the pull request, or address a specific comment by starting with `@openhands-agent`. + +### Label versus Macro + +- Label (`fix-me`): Requests OpenHands to address the **entire** issue or pull request. +- Macro (`@openhands-agent`): Requests OpenHands to consider only the issue/pull request description and **the specific comment**. + +## Advanced Settings + +### Add custom repository settings + +You can provide custom directions for OpenHands by following the [README for the resolver](https://github.com/OpenHands/OpenHands/blob/main/openhands/resolver/README.md#providing-custom-instructions). + +### Custom configurations + +GitHub resolver will automatically check for valid [repository secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions?tool=webui#creating-secrets-for-a-repository) or [repository variables](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#creating-configuration-variables-for-a-repository) to customize its behavior. +The customization options you can set are: + +| **Attribute name** | **Type** | **Purpose** | **Example** | +| -------------------------------- | -------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- | +| `LLM_MODEL` | Variable | Set the LLM to use with OpenHands | `LLM_MODEL="anthropic/claude-3-5-sonnet-20241022"` | +| `OPENHANDS_MAX_ITER` | Variable | Set max limit for agent iterations | `OPENHANDS_MAX_ITER=10` | +| `OPENHANDS_MACRO` | Variable | Customize default macro for invoking the resolver | `OPENHANDS_MACRO=@resolveit` | +| `OPENHANDS_BASE_CONTAINER_IMAGE` | Variable | Custom Sandbox ([learn more](/openhands/usage/advanced/custom-sandbox-guide)) | `OPENHANDS_BASE_CONTAINER_IMAGE="custom_image"` | +| `TARGET_BRANCH` | Variable | Merge to branch other than `main` | `TARGET_BRANCH="dev"` | +| `TARGET_RUNNER` | Variable | Target runner to execute the agent workflow (default ubuntu-latest) | `TARGET_RUNNER="custom-runner"` | + +### Configure +Source: https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode.md + +## Prerequisites + +- [OpenHands is running](/openhands/usage/run-openhands/local-setup) + +## Launching the GUI Server + +### Using the CLI Command + +You can launch the OpenHands GUI server directly from the command line using the `serve` command: + + +**Prerequisites**: You need to have the [OpenHands CLI installed](/openhands/usage/cli/installation) first, OR have `uv` +installed and run `uv tool install openhands --python 3.12` and `openhands serve`. Otherwise, you'll need to use Docker +directly (see the [Docker section](#using-docker-directly) below). + + +```bash +openhands serve +``` + +This command will: +- Check that Docker is installed and running +- Pull the required Docker images +- Launch the OpenHands GUI server at http://localhost:3000 +- Use the same configuration directory (`~/.openhands`) as the CLI mode + +#### Mounting Your Current Directory + +To mount your current working directory into the GUI server container, use the `--mount-cwd` flag: + +```bash +openhands serve --mount-cwd +``` + +This is useful when you want to work on files in your current directory through the GUI. The directory will be mounted at `/workspace` inside the container. + +#### Using GPU Support + +If you have NVIDIA GPUs and want to make them available to the OpenHands container, use the `--gpu` flag: + +```bash +openhands serve --gpu +``` + +This will enable GPU support via nvidia-docker, mounting all available GPUs into the container. You can combine this with other flags: + +```bash +openhands serve --gpu --mount-cwd +``` + +**Prerequisites for GPU support:** +- NVIDIA GPU drivers must be installed on your host system +- [NVIDIA Container Toolkit (nvidia-docker2)](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) must be installed and configured + +#### Requirements + +Before using the `openhands serve` command, ensure that: +- Docker is installed and running on your system +- You have internet access to pull the required Docker images +- Port 3000 is available on your system + +The CLI will automatically check these requirements and provide helpful error messages if anything is missing. + +### Using Docker Directly + +Alternatively, you can run the GUI server using Docker directly. See the [local setup guide](/openhands/usage/run-openhands/local-setup) for detailed Docker instructions. + +## Overview + +### Initial Setup + +1. Upon first launch, you'll see a settings popup. +2. Select an `LLM Provider` and `LLM Model` from the dropdown menus. If the required model does not exist in the list, + select `see advanced settings`. Then toggle `Advanced` options and enter it with the correct prefix in the + `Custom Model` text box. +3. Enter the corresponding `API Key` for your chosen provider. +4. Click `Save Changes` to apply the settings. + +### Settings + +You can use the Settings page at any time to: + +- [Setup the LLM provider and model for OpenHands](/openhands/usage/settings/llm-settings). +- [Setup the search engine](/openhands/usage/advanced/search-engine-setup). +- [Configure MCP servers](/openhands/usage/settings/mcp-settings). +- [Connect to GitHub](/openhands/usage/settings/integrations-settings#github-setup), + [connect to GitLab](/openhands/usage/settings/integrations-settings#gitlab-setup) + and [connect to Bitbucket](/openhands/usage/settings/integrations-settings#bitbucket-setup). +- Set application settings like your preferred language, notifications and other preferences. +- [Manage custom secrets](/openhands/usage/settings/secrets-settings). + +### Key Features + +For an overview of the key features available inside a conversation, please refer to the +[Key Features](/openhands/usage/key-features) section of the documentation. + +## Other Ways to Run Openhands +- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) +- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/terminal) + +### Setup +Source: https://docs.openhands.dev/openhands/usage/run-openhands/local-setup.md + +## Recommended Methods for Running Openhands on Your Local System + +### System Requirements + +- MacOS with [Docker Desktop support](https://docs.docker.com/desktop/setup/install/mac-install/#system-requirements) +- Linux +- Windows with [WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and [Docker Desktop support](https://docs.docker.com/desktop/setup/install/windows-install/#system-requirements) + +A system with a modern processor and a minimum of **4GB RAM** is recommended to run OpenHands. + +### Prerequisites + + + + + + **Docker Desktop** + + 1. [Install Docker Desktop on Mac](https://docs.docker.com/desktop/setup/install/mac-install). + 2. Open Docker Desktop, go to `Settings > Advanced` and ensure `Allow the default Docker socket to be used` is enabled. + + + + + + Tested with Ubuntu 22.04. + + + **Docker Desktop** + + 1. [Install Docker Desktop on Linux](https://docs.docker.com/desktop/setup/install/linux/). + + + + + + **WSL** + + 1. [Install WSL](https://learn.microsoft.com/en-us/windows/wsl/install). + 2. Run `wsl --version` in powershell and confirm `Default Version: 2`. + + **Ubuntu (Linux Distribution)** + + 1. Install Ubuntu: `wsl --install -d Ubuntu` in PowerShell as Administrator. + 2. Restart computer when prompted. + 3. Open Ubuntu from Start menu to complete setup. + 4. Verify installation: `wsl --list` should show Ubuntu. + + **Docker Desktop** + + 1. [Install Docker Desktop on Windows](https://docs.docker.com/desktop/setup/install/windows-install). + 2. Open Docker Desktop, go to `Settings` and confirm the following: + - General: `Use the WSL 2 based engine` is enabled. + - Resources > WSL Integration: `Enable integration with my default WSL distro` is enabled. + + + The docker command below to start the app must be run inside the WSL terminal. Use `wsl -d Ubuntu` in PowerShell or search "Ubuntu" in the Start menu to access the Ubuntu terminal. + + + + + + +### Start the App + +#### Option 1: Using the CLI Launcher with uv (Recommended) + +We recommend using [uv](https://docs.astral.sh/uv/) for the best OpenHands experience. uv provides better isolation from your current project's virtual environment and is required for OpenHands' default MCP servers (like the [fetch MCP server](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)). + +**Install uv** (if you haven't already): + +See the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for the latest installation instructions for your platform. + +**Install OpenHands**: +```bash +uv tool install openhands --python 3.12 +``` + +**Launch OpenHands**: +```bash +# Launch the GUI server +openhands serve + +# Or with GPU support (requires nvidia-docker) +openhands serve --gpu + +# Or with current directory mounted +openhands serve --mount-cwd +``` + +This will automatically handle Docker requirements checking, image pulling, and launching the GUI server. The `--gpu` flag enables GPU support via nvidia-docker, and `--mount-cwd` mounts your current directory into the container. + +**Upgrade OpenHands**: +```bash +uv tool upgrade openhands --python 3.12 +``` + + + +If you prefer to use pip and have Python 3.12+ installed: + +```bash +# Install OpenHands +pip install openhands + +# Launch the GUI server +openhands serve +``` + +Note that you'll still need `uv` installed for the default MCP servers to work properly. + + + +#### Option 2: Using Docker Directly + + + +```bash +docker run -it --rm --pull=always \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -e LOG_ALL_EVENTS=true \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:1.4 +``` + + + +> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. + +You'll find OpenHands running at http://localhost:3000! + +### Setup + +After launching OpenHands, you **must** select an `LLM Provider` and `LLM Model` and enter a corresponding `API Key`. +This can be done during the initial settings popup or by selecting the `Settings` +button (gear icon) in the UI. + +If the required model does not exist in the list, in `Settings` under the `LLM` tab, you can toggle `Advanced` options +and manually enter it with the correct prefix in the `Custom Model` text box. +The `Advanced` options also allow you to specify a `Base URL` if required. + +#### Getting an API Key + +OpenHands requires an API key to access most language models. Here's how to get an API key from the recommended providers: + + + + + +1. [Log in to OpenHands Cloud](https://app.all-hands.dev). +2. Go to the Settings page and navigate to the `API Keys` tab. +3. Copy your `LLM API Key`. + +OpenHands provides access to state-of-the-art agentic coding models with competitive pricing. [Learn more about OpenHands LLM provider](/openhands/usage/llms/openhands-llms). + + + + + +1. [Create an Anthropic account](https://console.anthropic.com/). +2. [Generate an API key](https://console.anthropic.com/settings/keys). +3. [Set up billing](https://console.anthropic.com/settings/billing). + + + + + +1. [Create an OpenAI account](https://platform.openai.com/). +2. [Generate an API key](https://platform.openai.com/api-keys). +3. [Set up billing](https://platform.openai.com/account/billing/overview). + + + + + +1. Create a Google account if you don't already have one. +2. [Generate an API key](https://aistudio.google.com/apikey). +3. [Set up billing](https://aistudio.google.com/usage?tab=billing). + + + + + +If your local LLM server isn’t behind an authentication proxy, you can enter any value as the API key (e.g. `local-key`, `test123`) — it won’t be used. + + + + + +Consider setting usage limits to control costs. + +#### Using a Local LLM + + +Effective use of local models for agent tasks requires capable hardware, along with models specifically tuned for instruction-following and agent-style behavior. + + +To run OpenHands with a locally hosted language model instead of a cloud provider, see the [Local LLMs guide](/openhands/usage/llms/local-llms) for setup instructions. + +#### Setting Up Search Engine + +OpenHands can be configured to use a search engine to allow the agent to search the web for information when needed. + +To enable search functionality in OpenHands: + +1. Get a Tavily API key from [tavily.com](https://tavily.com/). +2. Enter the Tavily API key in the Settings page under `LLM` tab > `Search API Key (Tavily)` + +For more details, see the [Search Engine Setup](/openhands/usage/advanced/search-engine-setup) guide. + +### Versions + +The [docker command above](/openhands/usage/run-openhands/local-setup#start-the-app) pulls the most recent stable release of OpenHands. You have other options as well: +- For a specific release, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with the version number. +For example, `0.9` will automatically point to the latest `0.9.x` release, and `0` will point to the latest `0.x.x` release. +- For the most up-to-date development version, replace `$VERSION` in `openhands:$VERSION` and `runtime:$VERSION`, with `main`. +This version is unstable and is recommended for testing or development purposes only. + +## Next Steps + +- [Mount your local code into the sandbox](/openhands/usage/sandboxes/docker#mounting-your-code-into-the-sandbox) to use OpenHands with your repositories +- [Run OpenHands in a scriptable headless mode.](/openhands/usage/cli/headless) +- [Run OpenHands with a friendly CLI.](/openhands/usage/cli/quick-start) +- [Run OpenHands on tagged issues with a GitHub action.](/openhands/usage/run-openhands/github-action) + +### Docker Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/docker.md + +The **Docker sandbox** runs the agent server inside a Docker container. This is +the default and recommended option for most users. + + + In some self-hosted deployments, the sandbox provider is controlled via the + legacy RUNTIME environment variable. Docker is the default. + + + +## Why Docker? + +- Isolation: reduces risk when the agent runs commands. +- Reproducibility: consistent environment across machines. + +## Mounting your code into the sandbox + +If you want OpenHands to work directly on a local repository, mount it into the +sandbox. + +### Recommended: CLI launcher + +If you start OpenHands via: + +```bash +openhands serve --mount-cwd +``` + +your current directory will be mounted into the sandbox workspace. + +### Using SANDBOX_VOLUMES + +You can also configure mounts via the SANDBOX_VOLUMES environment +variable (format: host_path:container_path[:mode]): + +```bash +export SANDBOX_VOLUMES=$PWD:/workspace:rw +``` + + + Anything mounted read-write into /workspace can be modified by the + agent. + + +## Custom sandbox images + +To customize the container image (extra tools, system deps, etc.), see +[Custom Sandbox Guide](/openhands/usage/advanced/custom-sandbox-guide). + +### Overview +Source: https://docs.openhands.dev/openhands/usage/sandboxes/overview.md + +A **sandbox** is the environment where OpenHands runs commands, edits files, and +starts servers while working on your task. + +In **OpenHands V1**, we use the term **sandbox** (not “runtime”) for this concept. + +## Sandbox providers + +OpenHands supports multiple sandbox “providers”, with different tradeoffs: + +- **Docker sandbox (recommended)** + - Runs the agent server inside a Docker container. + - Good isolation from your host machine. + +- **Process sandbox (unsafe, but fast)** + - Runs the agent server as a regular process on your machine. + - No container isolation. + +- **Remote sandbox** + - Runs the agent server in a remote environment. + - Used by managed deployments and some hosted setups. + +## Selecting a provider (current behavior) + +In some deployments, the provider selection is still controlled via the legacy +RUNTIME environment variable: + +- RUNTIME=docker (default) +- RUNTIME=process (aka legacy RUNTIME=local) +- RUNTIME=remote + + + The user-facing terminology in V1 is sandbox, but the configuration knob + may still be called RUNTIME while the migration is in progress. + + +## Terminology note (V0 vs V1) + +Older documentation refers to these environments as **runtimes**. +Those legacy docs are now in the Legacy (V0) section of the Web tab. + +### Process Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/process.md + +The **Process sandbox** runs the agent server directly on your machine as a +regular process. + + + This mode provides **no sandbox isolation**. + + The agent can read/write files your user account can access and execute + commands on your host system. + + Only use this in controlled environments. + + +## When to use it + +- Local development when Docker is unavailable +- Some CI environments +- Debugging issues that only reproduce outside containers + +## Choosing process mode + +In some deployments, this is selected via the legacy RUNTIME +environment variable: + +```bash +export RUNTIME=process +# (legacy alias) +# export RUNTIME=local +``` + +If you are unsure, prefer the [Docker Sandbox](/openhands/usage/sandboxes/docker). + +### Remote Sandbox +Source: https://docs.openhands.dev/openhands/usage/sandboxes/remote.md + +A **remote sandbox** runs the agent server in a remote execution environment +instead of on your local machine. + +This is typically used by managed deployments (e.g., OpenHands Cloud) and +advanced self-hosted setups. + +## Selecting remote mode + +In some self-hosted deployments, remote sandboxes are selected via the legacy +RUNTIME environment variable: + +```bash +export RUNTIME=remote +``` + +Remote sandboxes require additional configuration (API URL + API key). The exact +variable names depend on your deployment, but you may see legacy names like: + +- SANDBOX_REMOTE_RUNTIME_API_URL +- SANDBOX_API_KEY + +## Notes + +- Remote sandboxes may expose additional service URLs (e.g., VS Code, app ports) + depending on the provider. +- Configuration and credentials vary by deployment. + +If you are using OpenHands Cloud, see the [Cloud UI guide](/openhands/usage/cloud/cloud-ui). + +### API Keys Settings +Source: https://docs.openhands.dev/openhands/usage/settings/api-keys-settings.md + + + These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + + +## Overview + +Use the API Keys settings page to manage your OpenHands LLM key and create API keys for programmatic access to +OpenHands Cloud + +## OpenHands LLM Key + + +You must purchase at least $10 in OpenHands Cloud credits before generating an OpenHands LLM Key. To purchase credits, go to [Settings > Billing](https://app.all-hands.dev/settings/billing) in OpenHands Cloud. + + +You can use the API key under `OpenHands LLM Key` with [the OpenHands CLI](/openhands/usage/cli/quick-start), +[running OpenHands on your own](/openhands/usage/run-openhands/local-setup), or even other AI coding agents. This will +use credits from your OpenHands Cloud account. If you need to refresh it at anytime, click the `Refresh API Key` button. + +## OpenHands API Key + +These keys can be used to programmatically interact with OpenHands Cloud. See the guide for using the +[OpenHands Cloud API](/openhands/usage/cloud/cloud-api). + +### Create API Key + +1. Navigate to the `Settings > API Keys` page. +2. Click `Create API Key`. +3. Give your API key a name and click `Create`. + +### Delete API Key + +1. On the `Settings > API Keys` page, click the `Delete` button next to the API key you'd like to remove. +2. Click `Delete` to confirm removal. + +### Application Settings +Source: https://docs.openhands.dev/openhands/usage/settings/application-settings.md + +## Overview + +The Application settings allows you to customize various application-level behaviors in OpenHands, including +language preferences, notification settings, custom Git author configuration and more. + +## Setting Maximum Budget Per Conversation + +To limit spending, go to `Settings > Application` and set a maximum budget per conversation (in USD) +in the `Maximum Budget Per Conversation` field. OpenHands will stop the conversation once the budget is reached, but +you can choose to continue the conversation with a prompt. + +## Git Author Settings + +OpenHands provides the ability to customize the Git author information used when making commits and creating +pull requests on your behalf. + +By default, OpenHands uses the following Git author information for all commits and pull requests: + +- **Username**: `openhands` +- **Email**: `openhands@all-hands.dev` + +To override the defaults: + +1. Navigate to the `Settings > Application` page. +2. Under the `Git Settings` section, enter your preferred `Git Username` and `Git Email`. +3. Click `Save Changes` + + + When you configure a custom Git author, OpenHands will use your specified username and email as the primary author + for commits and pull requests. OpenHands will remain as a co-author. + + +### Integrations Settings +Source: https://docs.openhands.dev/openhands/usage/settings/integrations-settings.md + +## Overview + +OpenHands offers several integrations, including GitHub, GitLab, Bitbucket, and Slack, with more to come. Some +integrations, like Slack, are only available in OpenHands Cloud. Configuration may also vary depending on whether +you're using [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) or +[running OpenHands on your own](/openhands/usage/run-openhands/local-setup). + +## OpenHands Cloud Integrations Settings + + + These settings are only available in [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud). + + +### GitHub Settings + +- `Configure GitHub Repositories` - Allows you to +[modify GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. + +### Slack Settings + +- `Install OpenHands Slack App` - Install [the OpenHands Slack app](/openhands/usage/cloud/slack-installation) in + your Slack workspace. Make sure your Slack workspace admin/owner has installed the OpenHands Slack app first. + +## Running on Your Own Integrations Settings + + + These settings are only available in [OpenHands Local GUI](/openhands/usage/run-openhands/local-setup). + + +### Version Control Integrations + +#### GitHub Setup + +OpenHands automatically exports a `GITHUB_TOKEN` to the shell environment if provided: + + + + + 1. **Generate a Personal Access Token (PAT)**: + - On GitHub, go to `Settings > Developer Settings > Personal Access Tokens`. + - **Tokens (classic)** + - Required scopes: + - `repo` (Full control of private repositories) + - **Fine-grained tokens** + - All Repositories (You can select specific repositories, but this will impact what returns in repo search) + - Minimal Permissions (Select `Meta Data = Read-only` read for search, `Pull Requests = Read and Write` and `Content = Read and Write` for branch creation) + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `GitHub Token` field. + - Click `Save Changes` to apply the changes. + + If you're working with organizational repositories, additional setup may be required: + + 1. **Check organization requirements**: + - Organization admins may enforce specific token policies. + - Some organizations require tokens to be created with SSO enabled. + - Review your organization's [token policy settings](https://docs.github.com/en/organizations/managing-programmatic-access-to-your-organization/setting-a-personal-access-token-policy-for-your-organization). + 2. **Verify organization access**: + - Go to your token settings on GitHub. + - Look for the organization under `Organization access`. + - If required, click `Enable SSO` next to your organization. + - Complete the SSO authorization process. + + + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + - Try regenerating the token. + + - **Organization Access Denied**: + - Check if SSO is required but not enabled. + - Verify organization membership. + - Contact organization admin if token policies are blocking access. + + + +#### GitLab Setup + +OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if provided: + + + + 1. **Generate a Personal Access Token (PAT)**: + - On GitLab, go to `User Settings > Access Tokens`. + - Create a new token with the following scopes: + - `api` (API access) + - `read_user` (Read user information) + - `read_repository` (Read repository) + - `write_repository` (Write repository) + - Set an expiration date or leave it blank for a non-expiring token. + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `GitLab Token` field. + - Click `Save Changes` to apply the changes. + + 3. **(Optional): Restrict agent permissions** + - Create another PAT using Step 1 and exclude `api` scope . + - In the `Settings > Secrets` page, create a new secret `GITLAB_TOKEN` and paste your lower scope token. + - OpenHands will use the higher scope token, and the agent will use the lower scope token. + + + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + + - **Access Denied**: + - Verify project access permissions. + - Check if the token has the necessary scopes. + - For group/organization repositories, ensure you have proper access. + + + +#### BitBucket Setup + + +1. **Generate an App password**: + - On Bitbucket, go to `Account Settings > App Password`. + - Create a new password with the following scopes: + - `account`: `read` + - `repository: write` + - `pull requests: write` + - `issues: write` + - App passwords are non-expiring token. OpenHands will migrate to using API tokens in the future. + 2. **Enter token in OpenHands**: + - Navigate to the `Settings > Integrations` page. + - Paste your token in the `BitBucket Token` field. + - Click `Save Changes` to apply the changes. + + + + - **Token Not Recognized**: + - Check that the token hasn't expired. + - Verify the token has the required scopes. + + + + +### Language Model (LLM) Settings +Source: https://docs.openhands.dev/openhands/usage/settings/llm-settings.md + +## Overview + +The LLM settings allows you to bring your own LLM and API key to use with OpenHands. This can be any model that is +supported by litellm, but it requires a powerful model to work properly. +[See our recommended models here](/openhands/usage/llms/llms#model-recommendations). You can also configure some +additional LLM settings on this page. + +## Basic LLM Settings + +The most popular providers and models are available in the basic settings. Some of the providers have been verified to +work with OpenHands such as the [OpenHands provider](/openhands/usage/llms/openhands-llms), Anthropic, OpenAI and +Mistral AI. + +1. Choose your preferred provider using the `LLM Provider` dropdown. +2. Choose your favorite model using the `LLM Model` dropdown. +3. Set the `API Key` for your chosen provider and model and click `Save Changes`. + +This will set the LLM for all new conversations. If you want to use this new LLM for older conversations, you must first +restart older conversations. + +## Advanced LLM Settings + +Toggling the `Advanced` settings, allows you to set custom models as well as some additional LLM settings. You can use +this when your preferred provider or model does not exist in the basic settings dropdowns. + +1. `Custom Model`: Set your custom model with the provider as the prefix. For information on how to specify the + custom model, follow [the specific provider docs on litellm](https://docs.litellm.ai/docs/providers). We also have + [some guides for popular providers](/openhands/usage/llms/llms#llm-provider-guides). +2. `Base URL`: If your provider has a specific base URL, specify it here. +3. `API Key`: Set the API key for your custom model. +4. Click `Save Changes` + +### Memory Condensation + +The memory condenser manages the language model's context by ensuring only the most important and relevant information +is presented. Keeping the context focused improves latency and reduces token consumption, especially in long-running +conversations. + +- `Enable memory condensation` - Turn on this setting to activate this feature. +- `Memory condenser max history size` - The condenser will summarize the history after this many events. + +### Model Context Protocol (MCP) +Source: https://docs.openhands.dev/openhands/usage/settings/mcp-settings.md + +## Overview + +Model Context Protocol (MCP) is a mechanism that allows OpenHands to communicate with external tool servers. These +servers can provide additional functionality to the agent, such as specialized data processing, external API access, +or custom tools. MCP is based on the open standard defined at [modelcontextprotocol.io](https://modelcontextprotocol.io). + +## Supported MCPs + +OpenHands supports the following MCP transport protocols: + +* [Server-Sent Events (SSE)](https://modelcontextprotocol.io/specification/2024-11-05/basic/transports#http-with-sse) +* [Streamable HTTP (SHTTP)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#streamable-http) +* [Standard Input/Output (stdio)](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#stdio) + +## How MCP Works + +When OpenHands starts, it: + +1. Reads the MCP configuration. +2. Connects to any configured SSE and SHTTP servers. +3. Starts any configured stdio servers. +4. Registers the tools provided by these servers with the agent. + +The agent can then use these tools just like any built-in tool. When the agent calls an MCP tool: + +1. OpenHands routes the call to the appropriate MCP server. +2. The server processes the request and returns a response. +3. OpenHands converts the response to an observation and presents it to the agent. + +## Configuration + +MCP configuration can be defined in: +* The OpenHands UI in the `Settings > MCP` page. +* The `config.toml` file under the `[mcp]` section if not using the UI. + +### Configuration Options + + + + SSE servers are configured using either a string URL or an object with the following properties: + + - `url` (required) + - Type: `str` + - Description: The URL of the SSE server. + + - `api_key` (optional) + - Type: `str` + - Description: API key for authentication. + + + SHTTP (Streamable HTTP) servers are configured using either a string URL or an object with the following properties: + + - `url` (required) + - Type: `str` + - Description: The URL of the SHTTP server. + + - `api_key` (optional) + - Type: `str` + - Description: API key for authentication. + + - `timeout` (optional) + - Type: `int` + - Default: `60` + - Range: `1-3600` seconds (1 hour maximum) + - Description: Timeout in seconds for tool execution. This prevents tool calls from hanging indefinitely. + - **Use Cases:** + - **Short timeout (1-30s)**: For lightweight operations like status checks or simple queries. + - **Medium timeout (30-300s)**: For standard processing tasks like data analysis or API calls. + - **Long timeout (300-3600s)**: For heavy operations like file processing, complex calculations, or batch operations. + + This timeout only applies to individual tool calls, not server connection establishment. + + + + + While stdio servers are supported, [we recommend using MCP proxies](/openhands/usage/settings/mcp-settings#configuration-examples) for + better reliability and performance. + + + Stdio servers are configured using an object with the following properties: + + - `name` (required) + - Type: `str` + - Description: A unique name for the server. + + - `command` (required) + - Type: `str` + - Description: The command to run the server. + + - `args` (optional) + - Type: `list of str` + - Default: `[]` + - Description: Command-line arguments to pass to the server. + + - `env` (optional) + - Type: `dict of str to str` + - Default: `{}` + - Description: Environment variables to set for the server process. + + + +#### When to Use Direct Stdio + +Direct stdio connections may still be appropriate in these scenarios: +- **Development and testing**: Quick prototyping of MCP servers. +- **Simple, single-use tools**: Tools that don't require high reliability or concurrent access. +- **Local-only environments**: When you don't want to manage additional proxy processes. + +### Configuration Examples + + + + For stdio-based MCP servers, we recommend using MCP proxy tools like + [`supergateway`](https://github.com/supercorp-ai/supergateway) instead of direct stdio connections. + [SuperGateway](https://github.com/supercorp-ai/supergateway) is a popular MCP proxy that converts stdio MCP servers to + HTTP/SSE endpoints. + + Start the proxy servers separately: + ```bash + # Terminal 1: Filesystem server proxy + supergateway --stdio "npx @modelcontextprotocol/server-filesystem /" --port 8080 + + # Terminal 2: Fetch server proxy + supergateway --stdio "uvx mcp-server-fetch" --port 8081 + ``` + + Then configure OpenHands to use the HTTP endpoint: + + ```toml + [mcp] + # SSE Servers - Recommended approach using proxy tools + sse_servers = [ + # Basic SSE server with just a URL + "http://example.com:8080/mcp", + + # SuperGateway proxy for fetch server + "http://localhost:8081/sse", + + # External MCP service with authentication + {url="https://api.example.com/mcp/sse", api_key="your-api-key"} + ] + + # SHTTP Servers - Modern streamable HTTP transport (recommended) + shttp_servers = [ + # Basic SHTTP server with default 60s timeout + "https://api.example.com/mcp/shttp", + + # Server with custom timeout for heavy operations + { + url = "https://files.example.com/mcp/shttp", + api_key = "your-api-key", + timeout = 1800 # 30 minutes for large file processing + } + ] + ``` + + + + This setup is not Recommended for production. + + ```toml + [mcp] + # Direct stdio servers - use only for development/testing + stdio_servers = [ + # Basic stdio server + {name="fetch", command="uvx", args=["mcp-server-fetch"]}, + + # Stdio server with environment variables + { + name="filesystem", + command="npx", + args=["@modelcontextprotocol/server-filesystem", "/"], + env={ + "DEBUG": "true" + } + } + ] + ``` + + For production use, we recommend using proxy tools like SuperGateway. + + + +Other options include: + +- **Custom FastAPI/Express servers**: Build your own HTTP wrapper around stdio MCP servers. +- **Docker-based proxies**: Containerized solutions for better isolation. +- **Cloud-hosted MCP services**: Third-party services that provide MCP endpoints. + +### Secrets Management +Source: https://docs.openhands.dev/openhands/usage/settings/secrets-settings.md + +## Overview + +OpenHands provides a secrets manager that allows you to securely store and manage sensitive information that can be +accessed by the agent during runtime, such as API keys. These secrets are automatically exported as environment +variables in the agent's runtime environment. + +## Accessing the Secrets Manager + +Navigate to the `Settings > Secrets` page. Here, you'll see a list of all your existing custom secrets. + +## Adding a New Secret +1. Click `Add a new secret`. +2. Fill in the following fields: + - **Name**: A unique identifier for your secret (e.g., `AWS_ACCESS_KEY`). This will be the environment variable name. + - **Value**: The sensitive information you want to store. + - **Description** (optional): A brief description of what the secret is used for, which is also provided to the agent. +3. Click `Add secret` to save. + +## Editing a Secret + +1. Click the `Edit` button next to the secret you want to modify. +2. You can update the name and description of the secret. + + For security reasons, you cannot view or edit the value of an existing secret. If you need to change the + value, delete the secret and create a new one. + + +## Deleting a Secret + +1. Click the `Delete` button next to the secret you want to remove. +2. Select `Confirm` to delete the secret. + +## Using Secrets in the Agent + - All custom secrets are automatically exported as environment variables in the agent's runtime environment. + - You can access them in your code using standard environment variable access methods. For example, if you create a + secret named `OPENAI_API_KEY`, you can access it in your code as `process.env.OPENAI_API_KEY` in JavaScript or + `os.environ['OPENAI_API_KEY']` in Python. + +### Prompting Best Practices +Source: https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices.md + +## Characteristics of Good Prompts + +Good prompts are: + +- **Concrete**: Clearly describe what functionality should be added or what error needs fixing. +- **Location-specific**: Specify the locations in the codebase that should be modified, if known. +- **Appropriately scoped**: Focus on a single feature, typically not exceeding 100 lines of code. + +## Examples + +### Good Prompt Examples + +- Add a function `calculate_average` in `utils/math_operations.py` that takes a list of numbers as input and returns their average. +- Fix the TypeError in `frontend/src/components/UserProfile.tsx` occurring on line 42. The error suggests we're trying to access a property of undefined. +- Implement input validation for the email field in the registration form. Update `frontend/src/components/RegistrationForm.tsx` to check if the email is in a valid format before submission. + +### Bad Prompt Examples + +- Make the code better. (Too vague, not concrete) +- Rewrite the entire backend to use a different framework. (Not appropriately scoped) +- There's a bug somewhere in the user authentication. Can you find and fix it? (Lacks specificity and location information) + +## Tips for Effective Prompting + +- Be as specific as possible about the desired outcome or the problem to be solved. +- Provide context, including relevant file paths and line numbers if available. +- Break large tasks into smaller, manageable prompts. +- Include relevant error messages or logs. +- Specify the programming language or framework, if not obvious. + +The more precise and informative your prompt, the better OpenHands can assist you. + +See [First Projects](/overview/first-projects) for more examples of helpful prompts. + +### Troubleshooting +Source: https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting.md + + +OpenHands only supports Windows via WSL. Please be sure to run all commands inside your WSL terminal. + + +### Launch docker client failed + +**Description** + +When running OpenHands, the following error is seen: +``` +Launch docker client failed. Please make sure you have installed docker and started docker desktop/daemon. +``` + +**Resolution** + +Try these in order: +* Confirm `docker` is running on your system. You should be able to run `docker ps` in the terminal successfully. +* If using Docker Desktop, ensure `Settings > Advanced > Allow the default Docker socket to be used` is enabled. +* Depending on your configuration you may need `Settings > Resources > Network > Enable host networking` enabled in Docker Desktop. +* Reinstall Docker Desktop. + +### Permission Error + +**Description** + +On initial prompt, an error is seen with `Permission Denied` or `PermissionError`. + +**Resolution** + +* Check if the `~/.openhands` is owned by `root`. If so, you can: + * Change the directory's ownership: `sudo chown : ~/.openhands`. + * or update permissions on the directory: `sudo chmod 777 ~/.openhands` + * or delete it if you don’t need previous data. OpenHands will recreate it. You'll need to re-enter LLM settings. +* If mounting a local directory, ensure your `WORKSPACE_BASE` has the necessary permissions for the user running + OpenHands. + +### On Linux, Getting ConnectTimeout Error + +**Description** + +When running on Linux, you might run into the error `ERROR:root:: timed out`. + +**Resolution** + +If you installed Docker from your distribution’s package repository (e.g., docker.io on Debian/Ubuntu), be aware that +these packages can sometimes be outdated or include changes that cause compatibility issues. try reinstalling Docker +[using the official instructions](https://docs.docker.com/engine/install/) to ensure you are running a compatible version. + +If that does not solve the issue, try incrementally adding the following parameters to the docker run command: +* `--network host` +* `-e SANDBOX_USE_HOST_NETWORK=true` +* `-e DOCKER_HOST_ADDR=127.0.0.1` + +### Internal Server Error. Ports are not available + +**Description** + +When running on Windows, the error `Internal Server Error ("ports are not available: exposing port TCP +...: bind: An attempt was made to access a socket in a +way forbidden by its access permissions.")` is encountered. + +**Resolution** + +* Run the following command in PowerShell, as Administrator to reset the NAT service and release the ports: +``` +Restart-Service -Name "winnat" +``` + +### Unable to access VS Code tab via local IP + +**Description** + +When accessing OpenHands through a non-localhost URL (such as a LAN IP address), the VS Code tab shows a "Forbidden" +error, while other parts of the UI work fine. + +**Resolution** + +This happens because VS Code runs on a random high port that may not be exposed or accessible from other machines. +To fix this: + +1. Set a specific port for VS Code using the `SANDBOX_VSCODE_PORT` environment variable: + ```bash + docker run -it --rm \ + -e SANDBOX_VSCODE_PORT=41234 \ + -e AGENT_SERVER_IMAGE_REPOSITORY=ghcr.io/openhands/agent-server \ + -e AGENT_SERVER_IMAGE_TAG=1.11.4-python \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v ~/.openhands:/.openhands \ + -p 3000:3000 \ + -p 41234:41234 \ + --add-host host.docker.internal:host-gateway \ + --name openhands-app \ + docker.openhands.dev/openhands/openhands:latest + ``` + + > **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location. + +2. Make sure to expose the same port with `-p 41234:41234` in your Docker command. +3. If running with the development workflow, you can set this in your `config.toml` file: + ```toml + [sandbox] + vscode_port = 41234 + ``` + +### GitHub Organization Rename Issues + +**Description** + +After the GitHub organization rename from `All-Hands-AI` to `OpenHands`, you may encounter issues with git remotes, Docker images, or broken links. + +**Resolution** + +* Update your git remote URL: + ```bash + # Check current remote + git remote get-url origin + + # Update SSH remote + git remote set-url origin git@github.com:OpenHands/OpenHands.git + + # Or update HTTPS remote + git remote set-url origin https://github.com/OpenHands/OpenHands.git + ``` +* Update Docker image references from `ghcr.io/all-hands-ai/` to `ghcr.io/openhands/` +* Find and update any hardcoded references: + ```bash + git grep -i "all-hands-ai" + git grep -i "ghcr.io/all-hands-ai" + ``` + +### COBOL Modernization +Source: https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization.md + +Legacy COBOL systems power critical business operations across banking, insurance, government, and retail. OpenHands can help you understand, document, and modernize these systems while preserving their essential business logic. + + +This guide is based on our blog post [Refactoring COBOL to Java with AI Agents](https://openhands.dev/blog/20251218-cobol-to-java-refactoring). + + +## The COBOL Modernization Challenge + +[COBOL](https://en.wikipedia.org/wiki/COBOL) modernization is one of the most pressing challenges facing enterprises today. Gartner estimated there were over 200 billion lines of COBOL code in existence, running 80% of the world's business systems. As of 2020, COBOL was still running background processes for 95% of credit and debit card transactions. + +The challenge is acute: [47% of organizations](https://softwaremodernizationservices.com/mainframe-modernization) struggle to fill COBOL roles, with salaries rising 25% annually. By 2027, 92% of remaining COBOL developers will have retired. Traditional modernization approaches have seen high failure rates, with COBOL's specialized nature requiring a unique skill set that makes it difficult for human teams alone. + +## Overview + +COBOL modernization is a complex undertaking. Every modernization effort is unique and requires careful planning, execution, and validation to ensure the modernized code behaves identically to the original. The migration needs to be driven by an experienced team of developers and domain experts, but even that isn't sufficient to ensure the job is done quickly or cost-effectively. This is where OpenHands comes in. + +OpenHands is a powerful agent that assists in modernizing COBOL code along every step of the process: + +1. **Understanding**: Analyze and document existing COBOL code +2. **Translation**: Convert COBOL to modern languages like Java, Python, or C# +3. **Validation**: Ensure the modernized code behaves identically to the original + +In this document, we will explore the different ways OpenHands contributes to COBOL modernization, with example prompts and techniques to use in your own efforts. While the examples are specific to COBOL, the principles laid out here can help with any legacy system modernization. + +## Understanding + +A significant challenge in modernization is understanding the business function of the code. Developers have practice determining the "how" of the code, even in legacy systems with unfamiliar syntax and keywords, but understanding the "why" is more important to ensure that business logic is preserved accurately. The difficulty then comes from the fact that business function is only implicitly represented in the code and requires external documentation or domain expertise to untangle. + +Fortunately, agents like OpenHands are able to understand source code _and_ process-oriented documentation, and this simultaneous view lets them link the two together in a way that makes every downstream process more transparent and predictable. Your COBOL source might already have some structure or comments that make this link clear, but if not OpenHands can help. If your COBOL source is in `/src` and your process-oriented documentation is in `/docs`, the following prompt will establish a link between the two and save it for future reference: + +``` +For each COBOL program in `/src`, identify which business functions it supports. Search through the documentation in `/docs` to find all relevant sections describing that business function, and generate a summary of how the program supports that function. + +Save the results in `business_functions.json` in the following format: + +{ + ..., + "COBIL00C.cbl": { + "function": "Bill payment -- pay account balance in full and a transaction action for the online payment", + "references": [ + "docs/billing.md#bill-payment", + "docs/transactions.md#transaction-action" + ], + }, + ... +} +``` + +OpenHands uses tools like `grep`, `sed`, and `awk` to navigate files and pull in context. This is natural for source code and also works well for process-oriented documentation, but in some cases exposing the latter using a _semantic search engine_ instead will yield better results. Semantic search engines can understand the meaning behind words and phrases, making it easier to find relevant information. + +## Translation + +With a clear picture of what each program does and why, the next step is translating the COBOL source into your target language. The example prompts in this section target Java, but the same approach works for Python, C#, or any modern language. Just adjust for language-specific idioms and data types as needed. + +One thing to watch out for: COBOL keywords and data types do not always match one-to-one with their Java counterparts. For example, COBOL's decimal data type (`PIC S9(9)V9(9)`), which represents a fixed-point number with a scale of 9 digits, does not have a direct equivalent in Java. Instead, you might use `BigDecimal` with a scale of 9, but be aware of potential precision issues when converting between the two. A solid test suite will help catch these corner cases but including such _known problems_ in the translation prompt can help prevent such errors from being introduced at all. + +An example prompt is below: + +``` +Convert the COBOL files in `/src` to Java in `/src/java`. + +Requirements: +1. Create a Java class for each COBOL program +2. Preserve the business logic and data structures (see `business_functions.json`) +3. Use appropriate Java naming conventions (camelCase for methods, PascalCase) +4. Convert COBOL data types to appropriate Java types (use BigDecimal for decimal data types) +5. Implement proper error handling with try-catch blocks +6. Add JavaDoc comments explaining the purpose of each class and method +7. In JavaDoc comments, include traceability to the original COBOL source using + the format: @source : (e.g., @source CBACT01C.cbl:73-77) +8. Create a clean, maintainable object-oriented design +9. Each Java file should be compilable and follow Java best practices +``` + +Note the rule that introduces traceability comments to the resulting Java. These comments help agents understand the provenance of the code, but are also helpful for developers attempting to understand the migration process. They can be used, for example, to check how much COBOL code has been translated into Java or to identify areas where business logic has been distributed across multiple Java classes. + +## Validation + +Building confidence in the migrated code is crucial. Ideally, existing end-to-end tests can be reused to validate that business logic has been preserved. If you need to strengthen the testing setup, consider _golden file testing_. This involves capturing the COBOL program's outputs for a set of known inputs, then verifying the translated code produces identical results. When generating inputs, pay particular attention to decimal precision in monetary calculations (COBOL's fixed-point arithmetic doesn't always map cleanly to Java's BigDecimal) and date handling, where COBOL's conventions can diverge from modern defaults. + +Every modernization effort is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Best practices still apply. A solid test suite will not only ensure the migrated code works as expected, but will also help the translation agent converge to a high-quality solution. Of course, OpenHands can help migrate tests, ensure they run and test the migrated code correctly, and even generate new tests to cover edge cases. + +## Scaling Up + +The largest challenge in scaling modernization efforts is dealing with agents' limited attention span. Asking a single agent to handle the entire migration process in one go will almost certainly lead to errors and low-quality code as the context window is filled and flushed again and again. One way to address this is by tying translation and validation together in an iterative refinement loop. + +The idea is straightforward: one agent migrates some amount of code, and another agent critiques the migration. If the quality doesn't meet the standards of the critic, the first agent is given some actionable feedback and the process repeats. Here's what that looks like using the [OpenHands SDK](https://github.com/OpenHands/software-agent-sdk): + +```python +while current_score < QUALITY_THRESHOLD and iteration < MAX_ITERATIONS: + # Migrating agent converts COBOL to Java + migration_conversation.send_message(migration_prompt) + migration_conversation.run() + + # Critiquing agent evaluates the conversion + critique_conversation.send_message(critique_prompt) + critique_conversation.run() + + # Parse the score and decide whether to continue + current_score = parse_critique_score(critique_file) +``` + +By tweaking the critic's prompt and scoring rubric, you can fine-tune the evaluation process to better align with your needs. For example, you might have code quality standards that are difficult to detect with static analysis tools or architectural patterns that are unique to your organization. The following prompt can be easily modified to support a wide range of requirements: + +``` +Evaluate the quality of the COBOL to Java migration in `/src`. + +For each Java file, assess using the following criteria: +1. Correctness: Does the Java code preserve the original business logic (see `business_functions.json`)? +2. Code Quality: Is the code clean, readable, and following Java 17 conventions? +3. Completeness: Are all COBOL features properly converted? +4. Best Practices: Does it use proper OOP, error handling, and documentation? + +For each instance of a criteria not met, deduct a point. + +Then generate a report containing actionable feedback for each file. The feedback, if addressed, should improve the score. + +Save the results in `critique.json` in the following format: + +{ + "total_score": -12, + "files": [ + { + "cobol": "COBIL00C.cbl", + "java": "bill_payment.java", + "scores": { + "correctness": 0, + "code_quality": 0, + "completeness": -1, + "best_practices": -2 + }, + "feedback": [ + "Rename single-letter variables to meaningful names.", + "Ensure all COBOL functionality is translated -- the transaction action for the bill payment is missing.", + ], + }, + ... + ] +} +``` + +In future iterations, the migration agent should be given the file `critique.json` and be prompted to act on the feedback. + +This iterative refinement pattern works well for medium-sized projects with a moderate level of complexity. For legacy systems that span hundreds of files, however, the migration and critique processes need to be further decomposed to prevent agents from being overwhelmed. A natural way to do so is to break the system into smaller components, each with its own migration and critique processes. This process can be automated by using the OpenHands large codebase SDK, which combines agentic intelligence with static analysis tools to decompose large projects and orchestrate parallel agents in a dependency-aware manner. + +## Try It Yourself + +The full iterative refinement example is available in the OpenHands SDK: + +```bash +export LLM_API_KEY="your-api-key" +cd software-agent-sdk +uv run python examples/01_standalone_sdk/31_iterative_refinement.py +``` + +For real-world COBOL files, you can use the [AWS CardDemo application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl), which provides a representative mainframe application for testing modernization approaches. + + +## Related Resources + +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [AWS CardDemo Application](https://github.com/aws-samples/aws-mainframe-modernization-carddemo/tree/main/app/cbl) - Sample COBOL application for testing +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + +### Automated Code Review +Source: https://docs.openhands.dev/openhands/usage/use-cases/code-review.md + +Automated code review helps maintain code quality, catch bugs early, and enforce coding standards consistently across your team. OpenHands provides a GitHub Actions workflow powered by the [Software Agent SDK](/sdk/index) that automatically reviews pull requests and posts inline comments directly on your PRs. + +## Overview + +The OpenHands PR Review workflow is a GitHub Actions workflow that: + +- **Triggers automatically** when PRs are opened or when you request a review +- **Analyzes code changes** in the context of your entire repository +- **Posts inline comments** directly on specific lines of code in the PR +- **Provides fast feedback** - typically within 2-3 minutes + +## How It Works + +The PR review workflow uses the OpenHands Software Agent SDK to analyze your code changes: + +1. **Trigger**: The workflow runs when: + - A new non-draft PR is opened + - A draft PR is marked as ready for review + - The `review-this` label is added to a PR + - `openhands-agent` is requested as a reviewer + +2. **Analysis**: The agent receives the complete PR diff and uses two skills: + - [**`/codereview`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview) or [**`/codereview-roasted`**](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted): Analyzes code for quality, security, and best practices + - [**`/github-pr-review`**](https://github.com/OpenHands/extensions/tree/main/skills/github-pr-review): Posts structured inline comments via the GitHub API + +3. **Output**: Review comments are posted directly on the PR with: + - Priority labels (🔴 Critical, 🟠 Important, 🟡 Suggestion, 🟢 Nit) + - Specific line references + - Actionable suggestions with code examples + +### Review Styles + +Choose between two review styles: + +| Style | Description | Best For | +|-------|-------------|----------| +| **Standard** ([`/codereview`](https://github.com/OpenHands/extensions/tree/main/skills/codereview)) | Pragmatic, constructive feedback focusing on code quality, security, and best practices | Day-to-day code reviews | +| **Roasted** ([`/codereview-roasted`](https://github.com/OpenHands/extensions/tree/main/skills/codereview-roasted)) | Linus Torvalds-style brutally honest review emphasizing "good taste", data structures, and simplicity | Critical code paths, learning opportunities | + +## Quick Start + + + + Create `.github/workflows/pr-review-by-openhands.yml` in your repository: + + ```yaml + name: PR Review by OpenHands + + on: + pull_request_target: + types: [opened, ready_for_review, labeled, review_requested] + + permissions: + contents: read + pull-requests: write + issues: write + + jobs: + pr-review: + if: | + (github.event.action == 'opened' && github.event.pull_request.draft == false) || + github.event.action == 'ready_for_review' || + github.event.label.name == 'review-this' || + github.event.requested_reviewer.login == 'openhands-agent' + runs-on: ubuntu-latest + steps: + - name: Run PR Review + uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main + with: + llm-model: anthropic/claude-sonnet-4-5-20250929 + review-style: standard + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} + ``` + + + + Go to your repository's **Settings → Secrets and variables → Actions** and add: + - **`LLM_API_KEY`**: Your LLM API key (get one from [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms)) + + + + Create a `review-this` label in your repository: + 1. Go to **Issues → Labels** + 2. Click **New label** + 3. Name: `review-this` + 4. Description: `Trigger OpenHands PR review` + + + + Open a PR and either: + - Add the `review-this` label, OR + - Request `openhands-agent` as a reviewer + + + +## Composite Action + +The workflow uses a reusable composite action from the Software Agent SDK that handles all the setup automatically: + +- Checking out the SDK at the specified version +- Setting up Python and dependencies +- Running the PR review agent +- Uploading logs as artifacts + +### Action Inputs + +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | Yes | - | +| `llm-base-url` | LLM base URL (for custom endpoints) | No | `''` | +| `review-style` | Review style: `standard` or `roasted` | No | `roasted` | +| `sdk-version` | Git ref for SDK (tag, branch, or commit SHA) | No | `main` | +| `sdk-repo` | SDK repository (owner/repo) | No | `OpenHands/software-agent-sdk` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | + + +Use `sdk-version` to pin to a specific version tag (e.g., `v1.0.0`) for production stability, or use `main` to always get the latest features. + + +## Customization + +### Repository-Specific Review Guidelines + +Create custom review guidelines for your repository by adding a skill file at `.agents/skills/code-review.md`: + +```markdown +--- +name: code-review +description: Custom code review guidelines for this repository +triggers: +- /codereview +--- + +# Repository Code Review Guidelines + +You are reviewing code for [Your Project Name]. Follow these guidelines: + +## Review Decisions + +### When to APPROVE +- Configuration changes following existing patterns +- Documentation-only changes +- Test-only changes without production code changes +- Simple additions following established conventions + +### When to COMMENT +- Issues that need attention (bugs, security concerns) +- Suggestions for improvement +- Questions about design decisions + +## Core Principles + +1. **[Your Principle 1]**: Description +2. **[Your Principle 2]**: Description + +## What to Check + +- **[Category 1]**: What to look for +- **[Category 2]**: What to look for + +## Repository Conventions + +- Use [your linter] for style checking +- Follow [your style guide] +- Tests should be in [your test directory] +``` + + +The skill file must use `/codereview` as the trigger to override the default review behavior. See the [software-agent-sdk's own code-review skill](https://github.com/OpenHands/software-agent-sdk/blob/main/.agents/skills/code-review.md) for a complete example. + + +### Workflow Configuration + +Customize the workflow by modifying the action inputs: + +```yaml +- name: Run PR Review + uses: OpenHands/software-agent-sdk/.github/actions/pr-review@main + with: + # Change the LLM model + llm-model: anthropic/claude-sonnet-4-5-20250929 + # Use a custom LLM endpoint + llm-base-url: https://your-llm-proxy.example.com + # Switch to "roasted" style for brutally honest reviews + review-style: roasted + # Pin to a specific SDK version for stability + sdk-version: main + # Secrets + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} +``` + +### Trigger Customization + +Modify when reviews are triggered by editing the workflow conditions: + +```yaml +# Only trigger on label (disable auto-review on PR open) +if: github.event.label.name == 'review-this' + +# Only trigger when specific reviewer is requested +if: github.event.requested_reviewer.login == 'openhands-agent' + +# Trigger on all PRs (including drafts) +if: | + github.event.action == 'opened' || + github.event.action == 'synchronize' +``` + +## Security Considerations + +The workflow uses `pull_request_target` so the code review agent can work properly for PRs from forks. Only users with write access can trigger reviews via labels or reviewer requests. + + +**Potential Risk**: A malicious contributor could submit a PR from a fork containing code designed to exfiltrate your `LLM_API_KEY` when the review agent analyzes their code. + +To mitigate this, the PR review workflow passes API keys as [SDK secrets](/sdk/guides/secrets) rather than environment variables, which prevents the agent from directly accessing these credentials during code execution. + + +## Example Reviews + +See real automated reviews in action on the OpenHands Software Agent SDK repository: + +| PR | Description | Review Highlights | +|----|-------------|-------------------| +| [#1927](https://github.com/OpenHands/software-agent-sdk/pull/1927#pullrequestreview-3767493657) | Composite GitHub Action refactor | Comprehensive review with 🔴 Critical, 🟠 Important, and 🟡 Suggestion labels | +| [#1916](https://github.com/OpenHands/software-agent-sdk/pull/1916#pullrequestreview-3758297071) | Add example for reconstructing messages | Critical issues flagged with clear explanations | +| [#1904](https://github.com/OpenHands/software-agent-sdk/pull/1904#pullrequestreview-3751821740) | Update code-review skill guidelines | APPROVED review highlighting key strengths | +| [#1889](https://github.com/OpenHands/software-agent-sdk/pull/1889#pullrequestreview-3747576245) | Fix tmux race condition | Technical review of concurrency fix with dual-lock strategy analysis | + +## Troubleshooting + + + + - Ensure the `LLM_API_KEY` secret is set correctly + - Check that the label name matches exactly (`review-this`) + - Verify the workflow file is in `.github/workflows/` + - Check the Actions tab for workflow run errors + + + + - Ensure `GITHUB_TOKEN` has `pull-requests: write` permission + - Check the workflow logs for API errors + - Verify the PR is not from a fork with restricted permissions + + + + - Large PRs may take longer to analyze + - Consider splitting large PRs into smaller ones + - Check if the LLM API is experiencing delays + + + +## Related Resources + +- [PR Review Workflow Reference](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/02_pr_review) - Full workflow example and agent script +- [Composite Action](https://github.com/OpenHands/software-agent-sdk/blob/main/.github/actions/pr-review/action.yml) - Reusable GitHub Action for PR reviews +- [Software Agent SDK](/sdk/index) - Build your own AI-powered workflows +- [GitHub Integration](/openhands/usage/cloud/github-installation) - Set up GitHub integration for OpenHands Cloud +- [Skills Documentation](/overview/skills) - Learn more about OpenHands skills + +### Dependency Upgrades +Source: https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades.md + +Keeping dependencies up to date is essential for security, performance, and access to new features. OpenHands can help you identify outdated dependencies, plan upgrades, handle breaking changes, and validate that your application still works after updates. + +## Overview + +OpenHands helps with dependency management by: + +- **Analyzing dependencies**: Identifying outdated packages and their versions +- **Planning upgrades**: Creating upgrade strategies and migration guides +- **Implementing changes**: Updating code to handle breaking changes +- **Validating results**: Running tests and verifying functionality + +## Dependency Analysis Examples + +### Identifying Outdated Dependencies + +Start by understanding your current dependency state: + +``` +Analyze the dependencies in this project and create a report: + +1. List all direct dependencies with current and latest versions +2. Identify dependencies more than 2 major versions behind +3. Flag any dependencies with known security vulnerabilities +4. Highlight dependencies that are deprecated or unmaintained +5. Prioritize which updates are most important +``` + +**Example output:** + +| Package | Current | Latest | Risk | Priority | +|---------|---------|--------|------|----------| +| lodash | 4.17.15 | 4.17.21 | Security (CVE) | High | +| react | 16.8.0 | 18.2.0 | Outdated | Medium | +| express | 4.17.1 | 4.18.2 | Minor update | Low | +| moment | 2.29.1 | 2.29.4 | Deprecated | Medium | + +### Security-Related Dependency Upgrades + +Dependency upgrades are often needed to fix security vulnerabilities in your dependencies. If you're upgrading dependencies specifically to address security issues, see our [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) guide for comprehensive guidance on: + +- Automating vulnerability detection and remediation +- Integrating with security scanners (Snyk, Dependabot, CodeQL) +- Building automated pipelines for security fixes +- Using OpenHands agents to create pull requests automatically + +### Compatibility Checking + +Check for compatibility issues before upgrading: + +``` +Check compatibility for upgrading React from 16 to 18: + +1. Review our codebase for deprecated React patterns +2. List all components using lifecycle methods +3. Identify usage of string refs or findDOMNode +4. Check third-party library compatibility with React 18 +5. Estimate the effort required for migration +``` + +**Compatibility matrix:** + +| Dependency | React 16 | React 17 | React 18 | Action Needed | +|------------|----------|----------|----------|---------------| +| react-router | v5 ✓ | v5 ✓ | v6 required | Major upgrade | +| styled-components | v5 ✓ | v5 ✓ | v5 ✓ | None | +| material-ui | v4 ✓ | v4 ✓ | v5 required | Major upgrade | + +## Automated Upgrade Examples + +### Version Updates + +Perform straightforward version updates: + + + + ``` + Update all patch and minor versions in package.json: + + 1. Review each update for changelog notes + 2. Update package.json with new versions + 3. Update package-lock.json + 4. Run the test suite + 5. List any deprecation warnings + ``` + + + ``` + Update dependencies in requirements.txt: + + 1. Check each package for updates + 2. Update requirements.txt with compatible versions + 3. Update requirements-dev.txt similarly + 4. Run tests and verify functionality + 5. Note any deprecation warnings + ``` + + + ``` + Update dependencies in pom.xml: + + 1. Check for newer versions of each dependency + 2. Update version numbers in pom.xml + 3. Run mvn dependency:tree to check conflicts + 4. Run the test suite + 5. Document any API changes encountered + ``` + + + +### Breaking Change Handling + +When major versions introduce breaking changes: + +``` +Upgrade axios from v0.x to v1.x and handle breaking changes: + +1. List all breaking changes in axios 1.0 changelog +2. Find all axios usages in our codebase +3. For each breaking change: + - Show current code + - Show updated code + - Explain the change +4. Create a git commit for each logical change +5. Verify all tests pass +``` + +**Example transformation:** + +```javascript +// Before (axios 0.x) +import axios from 'axios'; +axios.defaults.baseURL = 'https://api.example.com'; +const response = await axios.get('/users', { + cancelToken: source.token +}); + +// After (axios 1.x) +import axios from 'axios'; +axios.defaults.baseURL = 'https://api.example.com'; +const controller = new AbortController(); +const response = await axios.get('/users', { + signal: controller.signal +}); +``` + +### Code Adaptation + +Adapt code to new API patterns: + +``` +Migrate our codebase from moment.js to date-fns: + +1. List all moment.js usages in our code +2. Map moment methods to date-fns equivalents +3. Update imports throughout the codebase +4. Handle any edge cases where APIs differ +5. Remove moment.js from dependencies +6. Verify all date handling still works correctly +``` + +**Migration map:** + +| moment.js | date-fns | Notes | +|-----------|----------|-------| +| `moment()` | `new Date()` | Different return type | +| `moment().format('YYYY-MM-DD')` | `format(new Date(), 'yyyy-MM-dd')` | Different format tokens | +| `moment().add(1, 'days')` | `addDays(new Date(), 1)` | Function-based API | +| `moment().startOf('month')` | `startOfMonth(new Date())` | Separate function | + +## Testing and Validation Examples + +### Automated Test Execution + +Run comprehensive tests after upgrades: + +``` +After the dependency upgrades, validate the application: + +1. Run the full test suite (unit, integration, e2e) +2. Check test coverage hasn't decreased +3. Run type checking (if applicable) +4. Run linting with new lint rule versions +5. Build the application for production +6. Report any failures with analysis +``` + +### Integration Testing + +Verify integrations still work: + +``` +Test our integrations after upgrading the AWS SDK: + +1. Test S3 operations (upload, download, list) +2. Test DynamoDB operations (CRUD) +3. Test Lambda invocations +4. Test SQS send/receive +5. Compare behavior to before the upgrade +6. Note any subtle differences +``` + +### Regression Detection + +Detect regressions from upgrades: + +``` +Check for regressions after upgrading the ORM: + +1. Run database operation benchmarks +2. Compare query performance before and after +3. Verify all migrations still work +4. Check for any N+1 queries introduced +5. Validate data integrity in test database +6. Document any behavioral changes +``` + +## Additional Examples + +### Security-Driven Upgrade + +``` +We have a critical security vulnerability in jsonwebtoken. + +Current: jsonwebtoken@8.5.1 +Required: jsonwebtoken@9.0.0 + +Perform the upgrade: +1. Check for breaking changes in v9 +2. Find all usages of jsonwebtoken in our code +3. Update any deprecated methods +4. Update the package version +5. Verify all JWT operations work +6. Run security tests +``` + +### Framework Major Upgrade + +``` +Upgrade our Next.js application from 12 to 14: + +Key areas to address: +1. App Router migration (pages -> app) +2. New metadata API +3. Server Components by default +4. New Image component +5. Route handlers replacing API routes + +For each area: +- Show current implementation +- Show new implementation +- Test the changes +``` + +### Multi-Package Coordinated Upgrade + +``` +Upgrade our React ecosystem packages together: + +Current: +- react: 17.0.2 +- react-dom: 17.0.2 +- react-router-dom: 5.3.0 +- @testing-library/react: 12.1.2 + +Target: +- react: 18.2.0 +- react-dom: 18.2.0 +- react-router-dom: 6.x +- @testing-library/react: 14.x + +Create an upgrade plan that handles all these together, +addressing breaking changes in the correct order. +``` + +## Related Resources + +- [Vulnerability Remediation](/openhands/usage/use-cases/vulnerability-remediation) - Fix security vulnerabilities +- [Security Guide](/sdk/guides/security) - Security best practices for AI agents +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + +### Incident Triage +Source: https://docs.openhands.dev/openhands/usage/use-cases/incident-triage.md + +When production incidents occur, speed matters. OpenHands can help you quickly investigate issues, analyze logs and errors, identify root causes, and generate fixes—reducing your mean time to resolution (MTTR). + + +This guide is based on our blog post [Debugging Production Issues with AI Agents: Automating Datadog Error Analysis](https://openhands.dev/blog/debugging-production-issues-with-ai-agents-automating-datadog-error-analysis). + + +## Overview + +Running a production service is **hard**. Errors and bugs crop up due to product updates, infrastructure changes, or unexpected user behavior. When these issues arise, it's critical to identify and fix them quickly to minimize downtime and maintain user trust—but this is challenging, especially at scale. + +What if AI agents could handle the initial investigation automatically? This allows engineers to start with a detailed report of the issue, including root cause analysis and specific recommendations for fixes, dramatically speeding up the debugging process. + +OpenHands accelerates incident response by: + +- **Automated error analysis**: AI agents investigate errors and provide detailed reports +- **Root cause identification**: Connect symptoms to underlying issues in your codebase +- **Fix recommendations**: Generate specific, actionable recommendations for resolving issues +- **Integration with monitoring tools**: Work directly with platforms like Datadog + +## Automated Datadog Error Analysis + +The [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) provides powerful capabilities for building autonomous AI agents that can integrate with monitoring platforms like Datadog. A ready-to-use [GitHub Actions workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) demonstrates how to automate error analysis. + +### How It Works + +[Datadog](https://www.datadoghq.com/) is a popular monitoring and analytics platform that provides comprehensive error tracking capabilities. It aggregates logs, metrics, and traces from your applications, making it easier to identify and investigate issues in production. + +[Datadog's Error Tracking](https://www.datadoghq.com/error-tracking/) groups similar errors together and provides detailed insights into their occurrences, stack traces, and affected services. OpenHands can automatically analyze these errors and provide detailed investigation reports. + +### Triggering Automated Debugging + +The GitHub Actions workflow can be triggered in two ways: + +1. **Search Query**: Provide a search query (e.g., "JSONDecodeError") to find all recent errors matching that pattern. This is useful for investigating categories of errors. + +2. **Specific Error ID**: Provide a specific Datadog error tracking ID to deep-dive into a known issue. You can copy the error ID from DataDog's error tracking UI using the "Actions" button. + +### Automated Investigation Process + +When the workflow runs, it automatically performs the following steps: + +1. Get detailed info from the DataDog API +2. Create or find an existing GitHub issue to track the error +3. Clone all relevant repositories to get full code context +4. Run an OpenHands agent to analyze the error and investigate the code +5. Post the findings as a comment on the GitHub issue + +The agent identifies the exact file and line number where errors originate, determines root causes, and provides specific recommendations for fixes. + + +The workflow posts findings to GitHub issues for human review before any code changes are made. If you want the agent to create a fix, you can follow up using the [OpenHands GitHub integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation#github-integration) and say `@openhands go ahead and create a pull request to fix this issue based on your analysis`. + + +## Setting Up the Workflow + +To set up automated Datadog debugging in your own repository: + +1. Copy the workflow file to `.github/workflows/` in your repository +2. Configure the required secrets (Datadog API keys, LLM API key) +3. Customize the default queries and repository lists for your needs +4. Run the workflow manually or set up scheduled runs + +The workflow is fully customizable. You can modify the prompts to focus on specific types of analysis, adjust the agent's tools to fit your workflow, or extend it to integrate with other services beyond GitHub and Datadog. + +Find the [full implementation on GitHub](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging), including the workflow YAML file, Python script, and prompt template. + +## Manual Incident Investigation + +You can also use OpenHands directly to investigate incidents without the automated workflow. + +### Log Analysis + +OpenHands can analyze logs to identify patterns and anomalies: + +``` +Analyze these application logs for the incident that occurred at 14:32 UTC: + +1. Identify the first error or warning that appeared +2. Trace the sequence of events leading to the failure +3. Find any correlated errors across services +4. Identify the user or request that triggered the issue +5. Summarize the timeline of events +``` + +**Log analysis capabilities:** + +| Log Type | Analysis Capabilities | +|----------|----------------------| +| Application logs | Error patterns, exception traces, timing anomalies | +| Access logs | Traffic patterns, slow requests, error responses | +| System logs | Resource exhaustion, process crashes, system errors | +| Database logs | Slow queries, deadlocks, connection issues | + +### Stack Trace Analysis + +Deep dive into stack traces: + +``` +Analyze this stack trace from our production error: + +[paste full stack trace] + +1. Identify the exception type and message +2. Trace back to our code (not framework code) +3. Identify the likely cause +4. Check if this code path has changed recently +5. Suggest a fix +``` + +**Multi-language support:** + + + + ``` + Analyze this Java exception: + + java.lang.OutOfMemoryError: Java heap space + at java.util.Arrays.copyOf(Arrays.java:3210) + at java.util.ArrayList.grow(ArrayList.java:265) + at com.myapp.DataProcessor.loadAllRecords(DataProcessor.java:142) + + Identify: + 1. What operation is consuming memory? + 2. Is there a memory leak or just too much data? + 3. What's the fix? + ``` + + + ``` + Analyze this Python traceback: + + Traceback (most recent call last): + File "app/api/orders.py", line 45, in create_order + order = OrderService.create(data) + File "app/services/order.py", line 89, in create + inventory.reserve(item_id, quantity) + AttributeError: 'NoneType' object has no attribute 'reserve' + + What's None and why? + ``` + + + ``` + Analyze this Node.js error: + + TypeError: Cannot read property 'map' of undefined + at processItems (/app/src/handlers/items.js:23:15) + at async handleRequest (/app/src/api/router.js:45:12) + + What's undefined and how should we handle it? + ``` + + + +### Root Cause Analysis + +Identify the underlying cause of an incident: + +``` +Perform root cause analysis for this incident: + +Symptoms: +- API response times increased 5x at 14:00 +- Error rate jumped from 0.1% to 15% +- Database CPU spiked to 100% + +Available data: +- Application metrics (Grafana dashboard attached) +- Recent deployments: v2.3.1 deployed at 13:45 +- Database slow query log (attached) + +Identify the root cause using the 5 Whys technique. +``` + +## Common Incident Patterns + +OpenHands can recognize and help diagnose these common patterns: + +- **Connection pool exhaustion**: Increasing connection errors followed by complete failure +- **Memory leaks**: Gradual memory increase leading to OOM +- **Cascading failures**: One service failure triggering others +- **Thundering herd**: Simultaneous requests overwhelming a service +- **Split brain**: Inconsistent state across distributed components + +## Quick Fix Generation + +Once the root cause is identified, generate fixes: + +``` +We've identified the root cause: a missing null check in OrderProcessor.java line 156. + +Generate a fix that: +1. Adds proper null checking +2. Logs when null is encountered +3. Returns an appropriate error response +4. Includes a unit test for the edge case +5. Is minimally invasive for a hotfix +``` + +## Best Practices + +### Investigation Checklist + +Use this checklist when investigating: + +1. **Scope the impact** + - How many users affected? + - What functionality is broken? + - What's the business impact? + +2. **Establish timeline** + - When did it start? + - What changed around that time? + - Is it getting worse or stable? + +3. **Gather data** + - Application logs + - Infrastructure metrics + - Recent deployments + - Configuration changes + +4. **Form hypotheses** + - List possible causes + - Rank by likelihood + - Test systematically + +5. **Implement fix** + - Choose safest fix + - Test before deploying + - Monitor after deployment + +### Common Pitfalls + + +Avoid these common incident response mistakes: + +- **Jumping to conclusions**: Gather data before assuming the cause +- **Changing multiple things**: Make one change at a time to isolate effects +- **Not documenting**: Record all actions for the post-mortem +- **Ignoring rollback**: Always have a rollback plan before deploying fixes + + + +For production incidents, always follow your organization's incident response procedures. OpenHands is a tool to assist your investigation, not a replacement for proper incident management. + + +## Related Resources + +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [Datadog Debugging Workflow](https://github.com/OpenHands/software-agent-sdk/tree/main/examples/03_github_workflows/04_datadog_debugging) - Ready-to-use GitHub Actions workflow +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + +### Spark Migrations +Source: https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations.md + +Apache Spark is constantly evolving, and keeping your data pipelines up to date is essential for performance, security, and access to new features. OpenHands can help you analyze, migrate, and validate Spark applications. + +## Overview + +Spark version upgrades are deceptively difficult. The [Spark 3.0 migration guide](https://spark.apache.org/docs/latest/migration-guide.html) alone documents hundreds of behavioral changes, deprecated APIs, and removed features, and many of these changes are _semantic_. That means the same code compiles and runs but produces different results across different Spark versions: for example, a date parsing expression that worked correctly in Spark 2.4 may silently return different values in Spark 3.x due to the switch from the Julian calendar to the Gregorian calendar. + +Version upgrades are also made difficult due to the scale of typical enterprise Spark codebases. When you have dozens of jobs across ETL, reporting, and ML pipelines, each with its own combination of DataFrame operations, UDFs, and configuration, manual migration stops scaling well and becomes prone to subtle regressions. + +Spark migration requires careful analysis, targeted code changes, and thorough validation to ensure that migrated pipelines produce identical results. The migration needs to be driven by an experienced data engineering team, but even that isn't sufficient to ensure the job is done quickly or without regressions. This is where OpenHands comes in. + +Such migrations need to be driven by experienced data engineering teams that understand how your Spark pipelines interact, but even that isn't sufficient to ensure the job is done quickly or without regression. This is where OpenHands comes in. OpenHands assists in migrating Spark applications along every step of the process: + +1. **Understanding**: Analyze the existing codebase to identify what needs to change and why +2. **Migration**: Apply targeted code transformations that address API changes and behavioral differences +3. **Validation**: Verify that migrated pipelines produce identical results to the originals + +In this document, we will explore how OpenHands contributes to Spark migrations, with example prompts and techniques to use in your own efforts. While the examples focus on Spark 2.x to 3.x upgrades, the same principles apply to cloud platform migrations, framework conversions (MapReduce, Hive, Pig to Spark), and upgrades between Spark 3.x minor versions. + +## Understanding + +Before changin any code, it helps to build a clear picture of what is affected and where the risk is concentrated. Spark migrations touch a large surface area, between API deprecations, behavioral changes, configuration defaults, and dependency versions, and the interactions between them are hard to reason about manually. + +Apache releases detailed lists of changes between each major and minor version of Spark. OpenHands can utilize this list of changes while scanning your codebase to produce a structured inventory of everything that needs attention. This inventory becomes the foundation for the migration itself, helping you prioritize work and track progress. + +If your Spark project is in `/src` and you're migrating from 2.4 to 3.0, the following prompt will generate this inventory: + +``` +Analyze the Spark application in `/src` for a migration from Spark 2.4 to Spark 3.0. + +Examine the migration guidelines at https://spark.apache.org/docs/latest/migration-guide.html. + +Then, for each source file, identify + +1. Deprecated or removed API usages (e.g., `registerTempTable`, `unionAll`, `SQLContext`) +2. Behavioral changes that could affect output (e.g., date/time parsing, CSV parsing, CAST semantics) +3. Configuration properties that have changed defaults or been renamed +4. Dependencies that need version updates + +Save the results in `migration_inventory.json` in the following format: + +{ + ..., + "src/main/scala/etl/TransformJob.scala": { + "deprecated_apis": [ + {"line": 42, "current": "df.registerTempTable(\"temp\")", "replacement": "df.createOrReplaceTempView(\"temp\")"} + ], + "behavioral_changes": [ + {"line": 78, "description": "to_date() uses proleptic Gregorian calendar in Spark 3.x; verify date handling with test data"} + ], + "config_changes": [], + "risk": "medium" + }, + ... +} +``` + +Tools like `grep` and `find` (both used by OpenHands) are helpful for identifying where APIs are used, but the real value comes from OpenHands' ability to understand the _context_ around each usage. A simple `registerTempTable` call is migrated via a rename, but a date parsing expression requires understanding how the surrounding pipeline uses the result. This contextual analysis helps developers distinguish between mechanical fixes and changes that need careful testing. + +## Migration + +With a clear inventory of what needs to change, the next step is applying the transformations. Spark migrations involve a mix of straightforward API renames and subtler behavioral adjustments, and it's important to handle them differently. + +To handle simple renames, we prompt OpenHands to use tools like `grep` and `ast-grep` instead of manually manipulating source code. This saves tokens and also simplifies future migrations, as agents can reliably re-run the tools via a script. + +The main risk in migration is that many Spark 3.x behavioral changes are _silent_. The migrated code will compile and run without errors, but may produce different results. Date and timestamp handling is the most common source of these silent failures: Spark 3.x switched to the Gregorian calendar by default, which changes how dates before 1582-10-15 are interpreted. CSV and JSON parsing also became stricter in Spark 3.x, rejecting malformed inputs that Spark 2.x would silently accept. + +An example prompt is below: + +``` +Migrate the Spark application in `/src` from Spark 2.4 to Spark 3.0. + +Use `migration_inventory.json` to guide the changes. + +For all low-risk changes (minor syntax changes, updated APIs, etc.), use tools like `grep` or `ast-grep`. Make sure you write the invocations to a `migration.sh` script for future use. + +Requirements: +1. Replace all deprecated APIs with their Spark 3.0 equivalents +2. For behavioral changes (especially date handling and CSV parsing), add explicit configuration to preserve Spark 2.4 behavior where needed (e.g., spark.sql.legacy.timeParserPolicy=LEGACY) +3. Update build.sbt / pom.xml dependencies to Spark 3.0 compatible versions +4. Replace RDD-based operations with DataFrame/Dataset equivalents where practical +5. Replace UDFs with built-in Spark SQL functions where a direct equivalent exists +6. Update import statements for any relocated classes +7. Preserve all existing business logic and output schemas +``` + +Note the inclusion of the _known problems_ in requirement 2. We plan to catch the silent failures associated with these systems in the validation step, but including them explicitly while migrating helps avoid them altogether. + +## Validation + +Spark migrations are particularly prone to silent regressions: jobs appear to run successfully but produce subtly different output. Jobs dealing with dates, CSVs, or using CAST semantics are all vulnerable, especially when migrating between major versions of Spark. + +The most reliable way to ensure silent regressions do not exist is by _data-level comparison_, where both the new and old pipelines are run on the same input data and their outputs directly compared. This catches subtle errors that unit tests might miss, especially in complex pipelines where a behavioral change in one stage propagates through downstream transformations. + +An example prompt for data-level comparison: + +``` +Validate the migrated Spark application in `/src` against the original. + +1. For each job, run both the Spark 2.4 and 3.0 versions on the test data in `/test_data` +2. Compare outputs: + - Row counts must match exactly + - Perform column-level comparison using checksums for numeric columns and exact match for string/date columns + - Flag any NULL handling differences +3. For any discrepancies, trace them back to specific migration changes using the MIGRATION comments +4. Generate a performance comparison: job duration, shuffle bytes, and peak executor memory + +Save the results in `validation_report.json` in the following format: + +{ + "jobs": [ + { + "name": "daily_etl", + "data_match": true, + "row_count": {"v2": 1000000, "v3": 1000000}, + "column_diffs": [], + "performance": { + "duration_seconds": {"v2": 340, "v3": 285}, + "shuffle_bytes": {"v2": "2.1GB", "v3": "1.8GB"} + } + }, + ... + ] +} +``` + +Note this prompt relies on existing data in `/test_data`. This can be generated by standard fuzzing tools, but in a pinch OpenHands can also help construct synthetic data that stresses the potential corner cases in the relevant systems. + +Every migration is unique, and developer experience is crucial to ensure the testing strategy covers your organization's requirements. Pay particular attention to jobs that involve date arithmetic, decimal precision in financial calculations, or custom UDFs that may depend on Spark internals. A solid validation suite not only ensures the migrated code works as expected, but also builds the organizational confidence needed to deploy the new version to production. + +## Beyond Version Upgrades + +While this document focuses on Spark version upgrades, the same Understanding → Migration → Validation workflow applies to other Spark migration scenarios: + +- **Cloud platform migrations** (e.g., EMR to Databricks, on-premises to Dataproc): The "understanding" step inventories platform-specific code (S3 paths, IAM roles, EMR bootstrap scripts), the migration step converts them to the target platform's equivalents, and validation confirms that jobs produce identical output in the new environment. +- **Framework migrations** (MapReduce, Hive, or Pig to Spark): The "understanding" step maps the existing framework's operations to Spark equivalents, the migration step performs the conversion, and validation compares outputs between the old and new frameworks. + +In each case, the key principle is the same: build a structured inventory of what needs to change, apply targeted transformations, and validate rigorously before deploying. + +## Related Resources + +- [OpenHands SDK Repository](https://github.com/OpenHands/software-agent-sdk) - Build custom AI agents +- [Spark 3.x Migration Guide](https://spark.apache.org/docs/latest/migration-guide.html) - Official Spark migration documentation +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + +### Vulnerability Remediation +Source: https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation.md + +Security vulnerabilities are a constant challenge for software teams. Every day, new security issues are discovered—from vulnerabilities in dependencies to code security flaws detected by static analysis tools. The National Vulnerability Database (NVD) reports thousands of new vulnerabilities annually, and organizations struggle to keep up with this constant influx. + +## The Challenge + +The traditional approach to vulnerability remediation is manual and time-consuming: + +1. Scan repositories for vulnerabilities +2. Review each vulnerability and its impact +3. Research the fix (usually a version upgrade) +4. Update dependency files +5. Test the changes +6. Create pull requests +7. Get reviews and merge + +This process can take hours per vulnerability, and with hundreds or thousands of vulnerabilities across multiple repositories, it becomes an overwhelming task. Security debt accumulates faster than teams can address it. + +**What if we could automate this entire process using AI agents?** + +## Automated Vulnerability Remediation with OpenHands + +The [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) provides powerful capabilities for building autonomous AI agents capable of interacting with codebases. These agents can tackle one of the most tedious tasks in software maintenance: **security vulnerability remediation**. + +OpenHands assists with vulnerability remediation by: + +- **Identifying vulnerabilities**: Analyzing code for common security issues +- **Understanding impact**: Explaining the risk and exploitation potential +- **Implementing fixes**: Generating secure code to address vulnerabilities +- **Validating remediation**: Verifying fixes are effective and complete + +## Two Approaches to Vulnerability Fixing + +### 1. Point to a GitHub Repository + +Build a workflow where users can point to a GitHub repository, scan it for vulnerabilities, and have OpenHands AI agents automatically create pull requests with fixes—all with minimal human intervention. + +### 2. Upload Security Scanner Reports + +Enable users to upload reports from security scanners such as Snyk (as well as other third-party security scanners) where OpenHands agents automatically detect the report format, identify the issues, and apply fixes. + +This solution goes beyond automation—it focuses on making security remediation accessible, fast, and scalable. + +## Architecture Overview + +A vulnerability remediation agent can be built as a web application that orchestrates agents using the [OpenHands Software Agents SDK](https://docs.openhands.dev/sdk) and [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/key-features) to perform security scans and automate remediation fixes. + +The key architectural components include: + +- **Frontend**: Communicates directly with the OpenHands Agent Server through the [TypeScript Client](https://github.com/OpenHands/typescript-client) +- **WebSocket interface**: Enables real-time status updates on agent actions and operations +- **LLM flexibility**: OpenHands supports multiple LLMs, minimizing dependency on any single provider +- **Scalable execution**: The Agent Server can be hosted locally, with self-hosted models, or integrated with OpenHands Cloud + +This architecture allows the frontend to remain lightweight while heavy lifting happens in the agent's execution environment. + +## Example: Vulnerability Fixer Application + +An example implementation is available at [github.com/OpenHands/vulnerability-fixer](https://github.com/OpenHands/vulnerability-fixer). This React web application demonstrates the full workflow: + +1. User points to a repository or uploads a security scan report +2. Agent analyzes the vulnerabilities +3. Agent creates fixes and pull requests automatically +4. User reviews and merges the changes + +## Security Scanning Integration + +Use OpenHands to analyze security scanner output: + +``` +We ran a security scan and found these issues. Analyze each one: + +1. SQL Injection in src/api/users.py:45 +2. XSS in src/templates/profile.html:23 +3. Hardcoded credential in src/config/database.py:12 +4. Path traversal in src/handlers/files.py:67 + +For each vulnerability: +- Explain what the vulnerability is +- Show how it could be exploited +- Rate the severity (Critical/High/Medium/Low) +- Suggest a fix +``` + +## Common Vulnerability Patterns + +OpenHands can detect these common vulnerability patterns: + +| Vulnerability | Pattern | Example | +|--------------|---------|---------| +| SQL Injection | String concatenation in queries | `query = "SELECT * FROM users WHERE id=" + user_id` | +| XSS | Unescaped user input in HTML | `
${user_comment}
` | +| Path Traversal | Unvalidated file paths | `open(user_supplied_path)` | +| Command Injection | Shell commands with user input | `os.system("ping " + hostname)` | +| Hardcoded Secrets | Credentials in source code | `password = "admin123"` | + +## Automated Remediation + +### Applying Security Patches + +Fix identified vulnerabilities: + + + + ``` + Fix the SQL injection vulnerability in src/api/users.py: + + Current code: + query = f"SELECT * FROM users WHERE id = {user_id}" + cursor.execute(query) + + Requirements: + 1. Use parameterized queries + 2. Add input validation + 3. Maintain the same functionality + 4. Add a test case for the fix + ``` + + **Fixed code:** + ```python + # Using parameterized query + query = "SELECT * FROM users WHERE id = %s" + cursor.execute(query, (user_id,)) + ``` + + + ``` + Fix the XSS vulnerability in src/templates/profile.html: + + Current code: +
${user.bio}
+ + Requirements: + 1. Properly escape user content + 2. Consider Content Security Policy + 3. Handle rich text if needed + 4. Test with malicious input + ``` + + **Fixed code:** + ```html + +
{{ user.bio | escape }}
+ ``` +
+ + ``` + Fix the command injection in src/utils/network.py: + + Current code: + def ping_host(hostname): + os.system(f"ping -c 1 {hostname}") + + Requirements: + 1. Use safe subprocess calls + 2. Validate input format + 3. Avoid shell=True + 4. Handle errors properly + ``` + + **Fixed code:** + ```python + import subprocess + import re + + def ping_host(hostname): + # Validate hostname format + if not re.match(r'^[a-zA-Z0-9.-]+$', hostname): + raise ValueError("Invalid hostname") + + # Use subprocess without shell + result = subprocess.run( + ["ping", "-c", "1", hostname], + capture_output=True, + text=True + ) + return result.returncode == 0 + ``` + +
+ +### Code-Level Vulnerability Fixes + +Fix application-level security issues: + +``` +Fix the broken access control in our API: + +Issue: Users can access other users' data by changing the ID in the URL. + +Current code: +@app.get("/api/users/{user_id}/documents") +def get_documents(user_id: int): + return db.get_documents(user_id) + +Requirements: +1. Add authorization check +2. Verify requesting user matches or is admin +3. Return 403 for unauthorized access +4. Log access attempts +5. Add tests for authorization +``` + +**Fixed code:** + +```python +@app.get("/api/users/{user_id}/documents") +def get_documents(user_id: int, current_user: User = Depends(get_current_user)): + # Check authorization + if current_user.id != user_id and not current_user.is_admin: + logger.warning(f"Unauthorized access attempt: user {current_user.id} tried to access user {user_id}'s documents") + raise HTTPException(status_code=403, detail="Not authorized") + + return db.get_documents(user_id) +``` + +## Security Testing + +Test your fixes thoroughly: + +``` +Create security tests for the SQL injection fix: + +1. Test with normal input +2. Test with SQL injection payloads: + - ' OR '1'='1 + - '; DROP TABLE users; -- + - UNION SELECT * FROM passwords +3. Test with special characters +4. Test with null/empty input +5. Verify error handling doesn't leak information +``` + +## Automated Remediation Pipeline + +Create an end-to-end automated pipeline: + +``` +Create an automated vulnerability remediation pipeline: + +1. Parse Snyk/Dependabot/CodeQL alerts +2. Categorize by severity and type +3. For each vulnerability: + - Create a branch + - Apply the fix + - Run tests + - Create a PR with: + - Description of vulnerability + - Fix applied + - Test results +4. Request review from security team +5. Auto-merge low-risk fixes after tests pass +``` + +## Building Your Own Vulnerability Fixer + +The example application demonstrates that AI agents can effectively automate security maintenance at scale. Tasks that required hours of manual effort per vulnerability can now be completed in minutes with minimal human intervention. + +To build your own vulnerability remediation agent: + +1. Use the [OpenHands Software Agent SDK](https://github.com/OpenHands/software-agent-sdk) to create your agent +2. Integrate with your security scanning tools (Snyk, Dependabot, CodeQL, etc.) +3. Configure the agent to create pull requests automatically +4. Set up human review workflows for critical fixes + +As agent capabilities continue to evolve, an increasing number of repetitive and time-consuming security tasks can be automated, enabling developers to focus on higher-level design, innovation, and problem-solving rather than routine maintenance. + +## Related Resources + +- [Vulnerability Fixer Example](https://github.com/OpenHands/vulnerability-fixer) - Full implementation example +- [OpenHands SDK Documentation](https://docs.openhands.dev/sdk) - Build custom AI agents +- [Dependency Upgrades](/openhands/usage/use-cases/dependency-upgrades) - Updating vulnerable dependencies +- [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) - Write effective prompts + +### Windows Without WSL +Source: https://docs.openhands.dev/openhands/usage/windows-without-wsl.md + + + This way of running OpenHands is not officially supported. It is maintained by the community and may not work. + + +# Running OpenHands GUI on Windows Without WSL + +This guide provides step-by-step instructions for running OpenHands on a Windows machine without using WSL or Docker. + +## Prerequisites + +1. **Windows 10/11** - A modern Windows operating system +2. **PowerShell 7+** - While Windows PowerShell comes pre-installed on Windows 10/11, PowerShell 7+ is strongly recommended to avoid compatibility issues (see Troubleshooting section for "System.Management.Automation" errors) +3. **.NET Core Runtime** - Required for the PowerShell integration via pythonnet +4. **Python 3.12 or 3.13** - Python 3.12 or 3.13 is required (Python 3.14 is not supported due to pythonnet compatibility) +5. **Git** - For cloning the repository and version control +6. **Node.js and npm** - For running the frontend + +## Step 1: Install Required Software + +1. **Install Python 3.12 or 3.13** + - Download Python 3.12.x or 3.13.x from [python.org](https://www.python.org/downloads/) + - During installation, check "Add Python to PATH" + - Verify installation by opening PowerShell and running: + ```powershell + python --version + ``` + +2. **Install PowerShell 7** + - Download and install PowerShell 7 from the [official PowerShell GitHub repository](https://github.com/PowerShell/PowerShell/releases) + - Choose the MSI installer appropriate for your system (x64 for most modern computers) + - Run the installer with default options + - Verify installation by opening a new terminal and running: + ```powershell + pwsh --version + ``` + - Using PowerShell 7 (pwsh) instead of Windows PowerShell will help avoid "System.Management.Automation" errors + +3. **Install .NET Core Runtime** + - Download and install the .NET Core Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) + - Choose the latest .NET Core Runtime (not SDK) + - Verify installation by opening PowerShell and running: + ```powershell + dotnet --info + ``` + - This step is required for the PowerShell integration via pythonnet. Without it, OpenHands will fall back to a more limited PowerShell implementation. + +4. **Install Git** + - Download Git from [git-scm.com](https://git-scm.com/download/win) + - Use default installation options + - Verify installation: + ```powershell + git --version + ``` + +5. **Install Node.js and npm** + - Download Node.js from [nodejs.org](https://nodejs.org/) (LTS version recommended) + - During installation, accept the default options which will install npm as well + - Verify installation: + ```powershell + node --version + npm --version + ``` + +6. **Install Poetry** + - Open PowerShell as Administrator and run: + ```powershell + (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python - + ``` + - Add Poetry to your PATH: + ```powershell + $env:Path += ";$env:APPDATA\Python\Scripts" + ``` + - Verify installation: + ```powershell + poetry --version + ``` + +## Step 2: Clone and Set Up OpenHands + +1. **Clone the Repository** + ```powershell + git clone https://github.com/OpenHands/OpenHands.git + cd OpenHands + ``` + +2. **Install Dependencies** + ```powershell + poetry install + ``` + + This will install all required dependencies, including: + - pythonnet - Required for Windows PowerShell integration + - All other OpenHands dependencies + +## Step 3: Run OpenHands + +1. **Build the Frontend** + ```powershell + cd frontend + npm install + npm run build + cd .. + ``` + + This will build the frontend files that the backend will serve. + +2. **Start the Backend** + ```powershell + # Make sure to use PowerShell 7 (pwsh) instead of Windows PowerShell + pwsh + $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" + ``` + + This will start the OpenHands app using the local runtime with PowerShell integration, available at `localhost:3000`. + + > **Note**: If you encounter a `RuntimeError: Directory './frontend/build' does not exist` error, make sure you've built the frontend first using the command above. + + > **Important**: Using PowerShell 7 (pwsh) instead of Windows PowerShell is recommended to avoid "System.Management.Automation" errors. If you encounter this error, see the Troubleshooting section below. + +3. **Alternatively, Run the Frontend in Development Mode (in a separate PowerShell window)** + ```powershell + cd frontend + npm run dev + ``` + +4. **Access the OpenHands GUI** + + Open your browser and navigate to: + ``` + http://localhost:3000 + ``` + + > **Note**: If you're running the frontend in development mode (using `npm run dev`), use port 3001 instead: `http://localhost:3001` + +## Installing and Running the CLI + +To install and run the OpenHands CLI on Windows without WSL, follow these steps: + +### 1. Install uv (Python Package Manager) + +Open PowerShell as Administrator and run: + +```powershell +powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" +``` + +### 2. Install .NET SDK (Required) + +The OpenHands CLI **requires** the .NET Core runtime for PowerShell integration. Without it, the CLI will fail to start with a `coreclr` error. Install the .NET SDK which includes the runtime: + +```powershell +winget install Microsoft.DotNet.SDK.8 +``` + +Alternatively, you can download and install the .NET SDK from the [official Microsoft website](https://dotnet.microsoft.com/download). + +After installation, restart your PowerShell session to ensure the environment variables are updated. + +### 3. Install and Run OpenHands + +After installing the prerequisites, install OpenHands with: + +```powershell +uv tool install openhands --python 3.12 +``` + +Then run OpenHands: + +```powershell +openhands +``` + +To upgrade OpenHands in the future: + +```powershell +uv tool upgrade openhands --python 3.12 +``` + +### Troubleshooting CLI Issues + +#### CoreCLR Error + +If you encounter an error like `Failed to load CoreCLR` or `pythonnet.load('coreclr')` when running OpenHands CLI, this indicates that the .NET Core runtime is missing or not properly configured. To fix this: + +1. Install the .NET SDK as described in step 2 above +2. Verify that your system PATH includes the .NET SDK directories +3. Restart your PowerShell session completely after installing the .NET SDK +4. Make sure you're using PowerShell 7 (pwsh) rather than Windows PowerShell + +To verify your .NET installation, run: + +```powershell +dotnet --info +``` + +This should display information about your installed .NET SDKs and runtimes. If this command fails, the .NET SDK is not properly installed or not in your PATH. + +If the issue persists after installing the .NET SDK, try installing the specific .NET Runtime version 6.0 or later from the [.NET download page](https://dotnet.microsoft.com/download). + +## Limitations on Windows + +When running OpenHands on Windows without WSL or Docker, be aware of the following limitations: + +1. **Browser Tool Not Supported**: The browser tool is not currently supported on Windows. + +2. **.NET Core Requirement**: The PowerShell integration requires .NET Core Runtime to be installed. The CLI implementation attempts to load the CoreCLR at startup with `pythonnet.load('coreclr')` and will fail with an error if .NET Core is not properly installed. + +3. **Interactive Shell Commands**: Some interactive shell commands may not work as expected. The PowerShell session implementation has limitations compared to the bash session used on Linux/macOS. + +4. **Path Handling**: Windows uses backslashes (`\`) in paths, which may require adjustments when working with code examples designed for Unix-like systems. + +## Troubleshooting + +### "System.Management.Automation" Not Found Error + +If you encounter an error message stating that "System.Management.Automation" was not found, this typically indicates that you have a minimal version of PowerShell installed or that the .NET components required for PowerShell integration are missing. + +> **IMPORTANT**: This error is most commonly caused by using the built-in Windows PowerShell (powershell.exe) instead of PowerShell 7 (pwsh.exe). Even if you installed PowerShell 7 during the prerequisites, you may still be using the older Windows PowerShell by default. + +To resolve this issue: + +1. **Install the latest version of PowerShell 7** from the official Microsoft repository: + - Visit [https://github.com/PowerShell/PowerShell/releases](https://github.com/PowerShell/PowerShell/releases) + - Download and install the latest MSI package for your system architecture (x64 for most systems) + - During installation, ensure you select the following options: + - "Add PowerShell to PATH environment variable" + - "Register Windows PowerShell 7 as the default shell" + - "Enable PowerShell remoting" + - The installer will place PowerShell 7 in `C:\Program Files\PowerShell\7` by default + +2. **Restart your terminal or command prompt** to ensure the new PowerShell is available + +3. **Verify the installation** by running: + ```powershell + pwsh --version + ``` + + You should see output indicating PowerShell 7.x.x + +4. **Run OpenHands using PowerShell 7** instead of Windows PowerShell: + ```powershell + pwsh + cd path\to\openhands + $env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace" + ``` + + > **Note**: Make sure you're explicitly using `pwsh` (PowerShell 7) and not `powershell` (Windows PowerShell). The command prompt or terminal title should say "PowerShell 7" rather than just "Windows PowerShell". + +5. **If the issue persists**, ensure that you have the .NET Runtime installed: + - Download and install the latest .NET Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download) + - Choose ".NET Runtime" (not SDK) version 6.0 or later + - After installation, verify it's properly installed by running: + ```powershell + dotnet --info + ``` + - Restart your computer after installation + - Try running OpenHands again + +6. **Ensure that the .NET Framework is properly installed** on your system: + - Go to Control Panel > Programs > Programs and Features > Turn Windows features on or off + - Make sure ".NET Framework 4.8 Advanced Services" is enabled + - Click OK and restart if prompted + +This error occurs because OpenHands uses the pythonnet package to interact with PowerShell, which requires the System.Management.Automation assembly from the .NET framework. A minimal PowerShell installation or older Windows PowerShell (rather than PowerShell 7+) might not include all the necessary components for this integration. + +## OpenHands Cloud + +### Bitbucket Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation.md + +## Prerequisites + +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a Bitbucket account](/openhands/usage/cloud/openhands-cloud). + +## Adding Bitbucket Repository Access + +Upon signing into OpenHands Cloud with a Bitbucket account, OpenHands will have access to your repositories. + +## Working With Bitbucket Repos in Openhands Cloud + +After signing in with a Bitbucket account, use the `Open Repository` section to select the appropriate repository and +branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! + +![Connect Repo](/openhands/static/img/connect-repo.png) + +## IP Whitelisting + +If your Bitbucket Cloud instance has IP restrictions, you'll need to whitelist the following IP addresses to allow +OpenHands to access your repositories: + +### Core App IP +``` +34.68.58.200 +``` + +### Runtime IPs +``` +34.10.175.217 +34.136.162.246 +34.45.0.142 +34.28.69.126 +35.224.240.213 +34.70.174.52 +34.42.4.87 +35.222.133.153 +34.29.175.97 +34.60.55.59 +``` + +## Next Steps + +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. + +### Cloud API +Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-api.md + +For the available API endpoints, refer to the +[OpenHands API Reference](https://docs.openhands.dev/api-reference). + +## Obtaining an API Key + +To use the OpenHands Cloud API, you'll need to generate an API key: + +1. Log in to your [OpenHands Cloud](https://app.all-hands.dev) account. +2. Navigate to the [Settings > API Keys](https://app.all-hands.dev/settings/api-keys) page. +3. Click `Create API Key`. +4. Give your key a descriptive name (Example: "Development" or "Production") and select `Create`. +5. Copy the generated API key and store it securely. It will only be shown once. + +## API Usage Example (V1) + +### Starting a New Conversation + +To start a new conversation with OpenHands to perform a task, +make a POST request to the V1 app-conversations endpoint. + + + + ```bash + curl -X POST "https://app.all-hands.dev/api/v1/app-conversations" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_message": { + "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] + }, + "selected_repository": "yourusername/your-repo" + }' + ``` + + + ```python + import requests + + api_key = "YOUR_API_KEY" + url = "https://app.all-hands.dev/api/v1/app-conversations" + + headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } + + data = { + "initial_message": { + "content": [{"type": "text", "text": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so."}] + }, + "selected_repository": "yourusername/your-repo" + } + + response = requests.post(url, headers=headers, json=data) + result = response.json() + + # The response contains a start task with the conversation ID + conversation_id = result.get("app_conversation_id") or result.get("id") + print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation_id}") + print(f"Status: {result['status']}") + ``` + + + ```typescript + const apiKey = "YOUR_API_KEY"; + const url = "https://app.all-hands.dev/api/v1/app-conversations"; + + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; + + const data = { + initial_message: { + content: [{ type: "text", text: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so." }] + }, + selected_repository: "yourusername/your-repo" + }; + + async function startConversation() { + try { + const response = await fetch(url, { + method: "POST", + headers: headers, + body: JSON.stringify(data) + }); + + const result = await response.json(); + + // The response contains a start task with the conversation ID + const conversationId = result.app_conversation_id || result.id; + console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversationId}`); + console.log(`Status: ${result.status}`); + + return result; + } catch (error) { + console.error("Error starting conversation:", error); + } + } + + startConversation(); + ``` + + + +#### Response + +The API will return a JSON object with details about the conversation start task: + +```json +{ + "id": "550e8400-e29b-41d4-a716-446655440000", + "status": "WORKING", + "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", + "sandbox_id": "sandbox-abc123", + "created_at": "2025-01-15T10:30:00Z" +} +``` + +The `status` field indicates the current state of the conversation startup process: +- `WORKING` - Initial processing +- `WAITING_FOR_SANDBOX` - Waiting for sandbox to be ready +- `PREPARING_REPOSITORY` - Cloning and setting up the repository +- `READY` - Conversation is ready to use +- `ERROR` - An error occurred during startup + +You may receive an authentication error if: + +- You provided an invalid API key. +- You provided the wrong repository name. +- You don't have access to the repository. + +### Streaming Conversation Start (Optional) + +For real-time updates during conversation startup, you can use the streaming endpoint: + +```bash +curl -X POST "https://app.all-hands.dev/api/v1/app-conversations/stream-start" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_message": { + "content": [{"type": "text", "text": "Your task description here"}] + }, + "selected_repository": "yourusername/your-repo" + }' +``` + +#### Streaming Response + +The endpoint streams a JSON array incrementally. Each element represents a status update: + +```json +[ + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WORKING", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "WAITING_FOR_SANDBOX", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "PREPARING_REPOSITORY", "created_at": "2025-01-15T10:30:00Z"}, + {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "READY", "app_conversation_id": "660e8400-e29b-41d4-a716-446655440001", "sandbox_id": "sandbox-abc123", "created_at": "2025-01-15T10:30:00Z"} +] +``` + +Each update is streamed as it occurs, allowing you to provide real-time feedback to users about the conversation startup progress. + +## Rate Limits + +If you have too many conversations running at once, older conversations will be paused to limit the number of concurrent conversations. +If you're running into issues and need a higher limit for your use case, please contact us at [contact@all-hands.dev](mailto:contact@all-hands.dev). + +--- + +## Migrating from V0 to V1 API + + + The V0 API (`/api/conversations`) is deprecated and scheduled for removal on **April 1, 2026**. + Please migrate to the V1 API (`/api/v1/app-conversations`) as soon as possible. + + +### Key Differences + +| Feature | V0 API | V1 API | +|---------|--------|--------| +| Endpoint | `POST /api/conversations` | `POST /api/v1/app-conversations` | +| Message format | `initial_user_msg` (string) | `initial_message.content` (array of content objects) | +| Repository field | `repository` | `selected_repository` | +| Response | Immediate `conversation_id` | Start task with `status` and eventual `app_conversation_id` | + +### Migration Steps + +1. **Update the endpoint URL**: Change from `/api/conversations` to `/api/v1/app-conversations` + +2. **Update the request body**: + - Change `repository` to `selected_repository` + - Change `initial_user_msg` (string) to `initial_message` (object with content array): + ```json + // V0 format + { "initial_user_msg": "Your message here" } + + // V1 format + { "initial_message": { "content": [{"type": "text", "text": "Your message here"}] } } + ``` + +3. **Update response handling**: The V1 API returns a start task object. The conversation ID is in the `app_conversation_id` field (available when status is `READY`), or use the `id` field for the start task ID. + +--- + +## Legacy API (V0) - Deprecated + + + The V0 API is deprecated since version 1.0.0 and will be removed on **April 1, 2026**. + New integrations should use the V1 API documented above. + + +### Starting a New Conversation (V0) + + + + ```bash + curl -X POST "https://app.all-hands.dev/api/conversations" \ + -H "Authorization: Bearer YOUR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + "repository": "yourusername/your-repo" + }' + ``` + + + ```python + import requests + + api_key = "YOUR_API_KEY" + url = "https://app.all-hands.dev/api/conversations" + + headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } + + data = { + "initial_user_msg": "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + "repository": "yourusername/your-repo" + } + + response = requests.post(url, headers=headers, json=data) + conversation = response.json() + + print(f"Conversation Link: https://app.all-hands.dev/conversations/{conversation['conversation_id']}") + print(f"Status: {conversation['status']}") + ``` + + + ```typescript + const apiKey = "YOUR_API_KEY"; + const url = "https://app.all-hands.dev/api/conversations"; + + const headers = { + "Authorization": `Bearer ${apiKey}`, + "Content-Type": "application/json" + }; + + const data = { + initial_user_msg: "Check whether there is any incorrect information in the README.md file and send a PR to fix it if so.", + repository: "yourusername/your-repo" + }; + + async function startConversation() { + try { + const response = await fetch(url, { + method: "POST", + headers: headers, + body: JSON.stringify(data) + }); + + const conversation = await response.json(); + + console.log(`Conversation Link: https://app.all-hands.dev/conversations/${conversation.conversation_id}`); + console.log(`Status: ${conversation.status}`); + + return conversation; + } catch (error) { + console.error("Error starting conversation:", error); + } + } + + startConversation(); + ``` + + + +#### Response (V0) + +```json +{ + "status": "ok", + "conversation_id": "abc1234" +} +``` + +### Cloud UI +Source: https://docs.openhands.dev/openhands/usage/cloud/cloud-ui.md + +## Landing Page + +The landing page is where you can: + +- [Select a GitHub repo](/openhands/usage/cloud/github-installation#working-with-github-repos-in-openhands-cloud), + [a GitLab repo](/openhands/usage/cloud/gitlab-installation#working-with-gitlab-repos-in-openhands-cloud) or + [a Bitbucket repo](/openhands/usage/cloud/bitbucket-installation#working-with-bitbucket-repos-in-openhands-cloud) to start working on. +- Launch an empty conversation using `New Conversation`. +- See `Suggested Tasks` for repositories that OpenHands has access to. +- See your `Recent Conversations`. + +## Settings + +Settings are divided across tabs, with each tab focusing on a specific area of configuration. + +- `User` + - Change your email address. +- `Integrations` + - [Configure GitHub repository access](/openhands/usage/cloud/github-installation#modifying-repository-access) for OpenHands. + - [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). +- `Application` + - Set your preferred language, notifications and other preferences. + - Toggle task suggestions on GitHub. + - Toggle Solvability Analysis. + - [Set a maximum budget per conversation](/openhands/usage/settings/application-settings#setting-maximum-budget-per-conversation). + - [Configure the username and email that OpenHands uses for commits](/openhands/usage/settings/application-settings#git-author-settings). +- `LLM` + - [Choose to use another LLM or use different models from the OpenHands provider](/openhands/usage/settings/llm-settings). +- `Billing` + - Add credits for using the OpenHands provider. +- `Secrets` + - [Manage secrets](/openhands/usage/settings/secrets-settings). +- `API Keys` + - [Create API keys to work with OpenHands programmatically](/openhands/usage/cloud/cloud-api). +- `MCP` + - [Setup an MCP server](/openhands/usage/settings/mcp-settings) + +## Key Features + +For an overview of the key features available inside a conversation, please refer to the [Key Features](/openhands/usage/key-features) +section of the documentation. + +## Next Steps + +- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). +- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. + +### GitHub Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/github-installation.md + +## Prerequisites + +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitHub account](/openhands/usage/cloud/openhands-cloud). + +## Adding GitHub Repository Access + +You can grant OpenHands access to specific GitHub repositories: + +1. Click on `+ Add GitHub Repos` in the repository selection dropdown. +2. Select your organization and choose the specific repositories to grant OpenHands access to. + + - OpenHands requests short-lived tokens (8-hour expiration) with these permissions: + - Actions: Read and write + - Commit statuses: Read and write + - Contents: Read and write + - Issues: Read and write + - Metadata: Read-only + - Pull requests: Read and write + - Webhooks: Read and write + - Workflows: Read and write + - Repository access for a user is granted based on: + - Permission granted for the repository + - User's GitHub permissions (owner/collaborator) + + +3. Click `Install & Authorize`. + +## Modifying Repository Access + +You can modify GitHub repository access at any time by: +- Selecting `+ Add GitHub Repos` in the repository selection dropdown or +- Visiting the `Settings > Integrations` page and selecting `Configure GitHub Repositories` + +## Working With GitHub Repos in Openhands Cloud + +Once you've granted GitHub repository access, you can start working with your GitHub repository. Use the +`Open Repository` section to select the appropriate repository and branch you'd like OpenHands to work on. Then click +on `Launch` to start the conversation! + +![Connect Repo](/openhands/static/img/connect-repo.png) + +## Working on GitHub Issues and Pull Requests Using Openhands + +To allow OpenHands to work directly from GitHub directly, you must +[give OpenHands access to your repository](/openhands/usage/cloud/github-installation#modifying-repository-access). Once access is +given, you can use OpenHands by labeling the issue or by tagging `@openhands`. + +### Working with Issues + +On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: +1. Comment on the issue to let you know it is working on it. + - You can click on the link to track the progress on OpenHands Cloud. +2. Open a pull request if it determines that the issue has been successfully resolved. +3. Comment on the issue with a summary of the performed tasks and a link to the PR. + +### Working with Pull Requests + +To get OpenHands to work on pull requests, mention `@openhands` in the comments to: +- Ask questions +- Request updates +- Get code explanations + + +The `@openhands` mention functionality in pull requests only works if the pull request is both +*to* and *from* a repository that you have added through the interface. This is because OpenHands needs appropriate +permissions to access both repositories. + + + +## Next Steps + +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. + +### GitLab Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation.md + +## Prerequisites + +- Signed in to [OpenHands Cloud](https://app.all-hands.dev) with [a GitLab account](/openhands/usage/cloud/openhands-cloud). + +## Adding GitLab Repository Access + +Upon signing into OpenHands Cloud with a GitLab account, OpenHands will have access to your repositories. + +## Working With GitLab Repos in Openhands Cloud + +After signing in with a Gitlab account, use the `Open Repository` section to select the appropriate repository and +branch you'd like OpenHands to work on. Then click on `Launch` to start the conversation! + +![Connect Repo](/openhands/static/img/connect-repo.png) + +## Using Tokens with Reduced Scopes + +OpenHands requests an API-scoped token during OAuth authentication. By default, this token is provided to the agent. +To restrict the agent's permissions, [you can define a custom secret](/openhands/usage/settings/secrets-settings) `GITLAB_TOKEN`, +which will override the default token assigned to the agent. While the high-permission API token is still requested +and used for other components of the application (e.g. opening merge requests), the agent will not have access to it. + +## Working on GitLab Issues and Merge Requests Using Openhands + + +This feature works for personal projects and is available for group projects with a +[Premium or Ultimate tier subscription](https://docs.gitlab.com/user/project/integrations/webhooks/#group-webhooks). + +A webhook is automatically installed within a few minutes after the owner/maintainer of the project or group logs into +OpenHands Cloud. + + + +Giving GitLab repository access to OpenHands also allows you to work on GitLab issues and merge requests directly. + +### Working with Issues + +On your repository, label an issue with `openhands` or add a message starting with `@openhands`. OpenHands will: + +1. Comment on the issue to let you know it is working on it. + - You can click on the link to track the progress on OpenHands Cloud. +2. Open a merge request if it determines that the issue has been successfully resolved. +3. Comment on the issue with a summary of the performed tasks and a link to the PR. + +### Working with Merge Requests + +To get OpenHands to work on merge requests, mention `@openhands` in the comments to: + +- Ask questions +- Request updates +- Get code explanations + +## Managing GitLab Webhooks + +The GitLab webhook management feature allows you to view and manage webhooks for your GitLab projects and groups directly from the OpenHands Cloud Integrations page. + +### Accessing Webhook Management + +The webhook management table is available on the Integrations page when: + +- You are signed in to OpenHands Cloud with a GitLab account +- Your GitLab token is connected + +To access it: + +1. Navigate to the `Settings > Integrations` page +2. Find the GitLab section +3. If your GitLab token is connected, you'll see the webhook management table below the connection status + +### Viewing Webhook Status + +The webhook management table displays GitLab groups and individual projects (not associated with any groups) that are accessible to OpenHands. + +- **Resource**: The name and full path of the project or group +- **Type**: Whether it's a "project" or "group" +- **Status**: The current webhook installation status: + - **Installed**: The webhook is active and working + - **Not Installed**: No webhook is currently installed + - **Failed**: A previous installation attempt failed (error details are shown below the status) + +### Reinstalling Webhooks + +If a webhook is not installed or has failed, you can reinstall it: + +1. Find the resource in the webhook management table +2. Click the `Reinstall` button in the Action column +3. The button will show `Reinstalling...` while the operation is in progress +4. Once complete, the status will update to reflect the result + + + To reinstall an existing webhook, you must first delete the current webhook + from the GitLab UI before using the Reinstall button in OpenHands Cloud. + + +**Important behaviors:** + +- The Reinstall button is disabled if the webhook is already installed +- Only one reinstall operation can run at a time +- After a successful reinstall, the button remains disabled to prevent duplicate installations +- If a reinstall fails, the error message is displayed below the status badge +- The resources list automatically refreshes after a reinstall completes + +### Constraints and Limitations + +- The webhook management table only displays resources that are accessible with your connected GitLab token +- Webhook installation requires Admin or Owner permissions on the GitLab project or group + +## Next Steps + +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Use the Cloud API](/openhands/usage/cloud/cloud-api) to programmatically interact with OpenHands. + +### Getting Started +Source: https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud.md + +## Accessing OpenHands Cloud + +OpenHands Cloud is the hosted cloud version of OpenHands. To get started with OpenHands Cloud, +visit [app.all-hands.dev](https://app.all-hands.dev). + +You'll be prompted to connect with your GitHub, GitLab or Bitbucket account: + +1. Click `Log in with GitHub`, `Log in with GitLab` or `Log in with Bitbucket`. +2. Review the permissions requested by OpenHands and authorize the application. + - OpenHands will require certain permissions from your account. To read more about these permissions, + you can click the `Learn more` link on the authorization page. +3. Review and accept the `terms of service` and select `Continue`. + +## Next Steps + +Once you've connected your account, you can: + +- [Use OpenHands with your GitHub repositories](/openhands/usage/cloud/github-installation). +- [Use OpenHands with your GitLab repositories](/openhands/usage/cloud/gitlab-installation). +- [Use OpenHands with your Bitbucket repositories](/openhands/usage/cloud/bitbucket-installation). +- [Learn about the Cloud UI](/openhands/usage/cloud/cloud-ui). +- [Install the OpenHands Slack app](/openhands/usage/cloud/slack-installation). + +### Jira Data Center Integration (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration.md + +# Jira Data Center Integration + +## Platform Configuration + +### Step 1: Create Service Account + +1. **Access User Management** + - Log in to Jira Data Center as administrator + - Go to **Administration** > **User Management** + +2. **Create User** + - Click **Create User** + - Username: `openhands-agent` + - Full Name: `OpenHands Agent` + - Email: `openhands@yourcompany.com` (replace with your preferred service account email) + - Password: Set a secure password + - Click **Create** + +3. **Assign Permissions** + - Add user to appropriate groups + - Ensure access to relevant projects + - Grant necessary project permissions + +### Step 2: Generate API Token + +1. **Personal Access Tokens** + - Log in as the service account + - Go to **Profile** > **Personal Access Tokens** + - Click **Create token** + - Name: `OpenHands Cloud Integration` + - Expiry: Set appropriate expiration (recommend 1 year) + - Click **Create** + - **Important**: Copy and store the token securely + +### Step 3: Configure Webhook + +1. **Create Webhook** + - Go to **Administration** > **System** > **WebHooks** + - Click **Create a WebHook** + - **Name**: `OpenHands Cloud Integration` + - **URL**: `https://app.all-hands.dev/integration/jira-dc/events` + - Set a suitable webhook secret + - **Issue related events**: Select the following: + - Issue updated + - Comment created + - **JQL Filter**: Leave empty (or customize as needed) + - Click **Create** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) + +--- + +## Workspace Integration + +### Step 1: Log in to OpenHands Cloud + +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. + +### Step 2: Configure Jira Data Center Integration + +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Jira Data Center** section + +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The personal access token from Step 2 above + - Ensure **Active** toggle is enabled + + +Workspace name is the host name of your Jira Data Center instance. + +Eg: http://jira.all-hands.dev/projects/OH/issues/OH-77 + +Here the workspace name is **jira.all-hands.dev**. + + +3. **Complete OAuth Flow** + - You'll be redirected to Jira Data Center to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI + +### Managing Your Integration + +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view + +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. + +### Screenshots + + + +![workspace-link.png](/openhands/static/img/jira-dc-user-link.png) + + + +![workspace-link.png](/openhands/static/img/jira-dc-admin-configure.png) + + + +![workspace-link.png](/openhands/static/img/jira-dc-user-unlink.png) + + + +![workspace-link.png](/openhands/static/img/jira-dc-admin-edit.png) + + + +### Jira Cloud Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration.md + +# Jira Cloud Integration + +## Platform Configuration + +### Step 1: Create Service Account + +1. **Navigate to User Management** + - Go to [Atlassian Admin](https://admin.atlassian.com/) + - Select your organization + - Go to **Directory** > **Users** + +2. **Create OpenHands Service Account** + - Click **Service accounts** + - Click **Create a service account** + - Name: `OpenHands Agent` + - Click **Next** + - Select **User** role for Jira app + - Click **Create** + +### Step 2: Generate API Token + +1. **Access Service Account Configuration** + - Locate the created service account from above step and click on it + - Click **Create API token** + - Set the expiry to 365 days (maximum allowed value) + - Click **Next** + - In **Select token scopes** screen, filter by following values + - App: Jira + - Scope type: Classic + - Scope actions: Write, Read + - Select `read:me`, `read:jira-work`, and `write:jira-work` scopes + - Click **Next** + - Review and create API token + - **Important**: Copy and securely store the token immediately + +### Step 3: Configure Webhook + +1. **Navigate to Webhook Settings** + - Go to **Jira Settings** > **System** > **WebHooks** + - Click **Create a WebHook** + +2. **Configure Webhook** + - **Name**: `OpenHands Cloud Integration` + - **Status**: Enabled + - **URL**: `https://app.all-hands.dev/integration/jira/events` + - **Issue related events**: Select the following: + - Issue updated + - Comment created + - **JQL Filter**: Leave empty (or customize as needed) + - Click **Create** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) + +--- + +## Workspace Integration + +### Step 1: Log in to OpenHands Cloud + +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. + +### Step 2: Configure Jira Integration + +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Jira Cloud** section + +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - **Important:** Make sure you enter the full workspace name, eg: **yourcompany.atlassian.net** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The API token from Step 2 above + - Ensure **Active** toggle is enabled + + +Workspace name is the host name when accessing a resource in Jira Cloud. + +Eg: https://all-hands.atlassian.net/browse/OH-55 + +Here the workspace name is **all-hands**. + + +3. **Complete OAuth Flow** + - You'll be redirected to Jira Cloud to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI + +### Managing Your Integration + +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view + +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that workspace integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. + +### Screenshots + + + +![workspace-link.png](/openhands/static/img/jira-user-link.png) + + + +![workspace-link.png](/openhands/static/img/jira-admin-configure.png) + + + +![workspace-link.png](/openhands/static/img/jira-user-unlink.png) + + + +![workspace-link.png](/openhands/static/img/jira-admin-edit.png) + + + +### Linear Integration (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration.md + +# Linear Integration + +## Platform Configuration + +### Step 1: Create Service Account + +1. **Access Team Settings** + - Log in to Linear as a team admin + - Go to **Settings** > **Members** + +2. **Invite Service Account** + - Click **Invite members** + - Email: `openhands@yourcompany.com` (replace with your preferred service account email) + - Role: **Member** (with appropriate team access) + - Send invitation + +3. **Complete Setup** + - Accept invitation from the service account email + - Complete profile setup + - Ensure access to relevant teams/workspaces + +### Step 2: Generate API Key + +1. **Access API Settings** + - Log in as the service account + - Go to **Settings** > **Security & access** + +2. **Create Personal API Key** + - Click **Create new key** + - Name: `OpenHands Cloud Integration` + - Scopes: Select the following: + - `Read` - Read access to issues and comments + - `Create comments` - Ability to create or update comments + - Select the teams you want to provide access to, or allow access for all teams you have permissions for + - Click **Create** + - **Important**: Copy and store the API key securely + +### Step 3: Configure Webhook + +1. **Access Webhook Settings** + - Go to **Settings** > **API** > **Webhooks** + - Click **New webhook** + +2. **Configure Webhook** + - **Label**: `OpenHands Cloud Integration` + - **URL**: `https://app.all-hands.dev/integration/linear/events` + - **Resource types**: Select: + - `Comment` - For comment events + - `Issue` - For issue updates (label changes) + - Select the teams you want to provide access to, or allow access for all public teams + - Click **Create webhook** + - **Important**: Copy and store the webhook secret securely (you'll need this for workspace integration) + +--- + +## Workspace Integration + +### Step 1: Log in to OpenHands Cloud + +1. **Navigate and Authenticate** + - Go to [OpenHands Cloud](https://app.all-hands.dev/) + - Sign in with your Git provider (GitHub, GitLab, or BitBucket) + - **Important:** Make sure you're signing in with the same Git provider account that contains the repositories you want the OpenHands agent to work on. + +### Step 2: Configure Linear Integration + +1. **Access Integration Settings** + - Navigate to **Settings** > **Integrations** + - Locate **Linear** section + +2. **Configure Workspace** + - Click **Configure** button + - Enter your workspace name and click **Connect** + - If no integration exists, you'll be prompted to enter additional credentials required for the workspace integration: + - **Webhook Secret**: The webhook secret from Step 3 above + - **Service Account Email**: The service account email from Step 1 above + - **Service Account API Key**: The API key from Step 2 above + - Ensure **Active** toggle is enabled + + +Workspace name is the identifier after the host name when accessing a resource in Linear. + +Eg: https://linear.app/allhands/issue/OH-37 + +Here the workspace name is **allhands**. + + +3. **Complete OAuth Flow** + - You'll be redirected to Linear to complete OAuth verification + - Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided + - If successful, you will be redirected back to the **Integrations** settings in the OpenHands Cloud UI + +### Managing Your Integration + +**Edit Configuration:** +- Click the **Edit** button next to your configured platform +- Update any necessary credentials or settings +- Click **Update** to apply changes +- You will need to repeat the OAuth flow as before +- **Important:** Only the original user who created the integration can see the edit view + +**Unlink Workspace:** +- In the edit view, click **Unlink** next to the workspace name +- This will deactivate your workspace link +- **Important:** If the original user who configured the integration chooses to unlink their integration, any users currently linked to that integration will also be unlinked, and the workspace integration will be deactivated. The integration can only be reactivated by the original user. + +### Screenshots + + + +![workspace-link.png](/openhands/static/img/linear-user-link.png) + + + +![workspace-link.png](/openhands/static/img/linear-admin-configure.png) + + + +![workspace-link.png](/openhands/static/img/linear-admin-edit.png) + + + +![workspace-link.png](/openhands/static/img/linear-admin-edit.png) + + + +### Project Management Tool Integrations (Coming soon...) +Source: https://docs.openhands.dev/openhands/usage/cloud/project-management/overview.md + +# Project Management Tool Integrations + +## Overview + +OpenHands Cloud integrates with project management platforms (Jira Cloud, Jira Data Center, and Linear) to enable AI-powered task delegation. Users can invoke the OpenHands agent by: +- Adding `@openhands` in ticket comments +- Adding the `openhands` label to tickets + +## Prerequisites + +Integration requires two levels of setup: +1. **Platform Configuration** - Administrative setup of service accounts and webhooks on your project management platform (see individual platform documentation below) +2. **Workspace Integration** - Self-service configuration through the OpenHands Cloud UI to link your OpenHands account to the target workspace + +### Platform-Specific Setup Guides: +- [Jira Cloud Integration (Coming soon...)](./jira-integration.md) +- [Jira Data Center Integration (Coming soon...)](./jira-dc-integration.md) +- [Linear Integration (Coming soon...)](./linear-integration.md) + +## Usage + +Once both the platform configuration and workspace integration are completed, users can trigger the OpenHands agent within their project management platforms using two methods: + +### Method 1: Comment Mention +Add a comment to any issue with `@openhands` followed by your task description: +``` +@openhands Please implement the user authentication feature described in this ticket +``` + +### Method 2: Label-based Delegation +Add the label `openhands` to any issue. The OpenHands agent will automatically process the issue based on its description and requirements. + +### Git Repository Detection + +The OpenHands agent needs to identify which Git repository to work with when processing your issues. Here's how to ensure proper repository detection: + +#### Specifying the Target Repository + +**Required:** Include the target Git repository in your issue description or comment to ensure the agent works with the correct codebase. + +**Supported Repository Formats:** +- Full HTTPS URL: `https://github.com/owner/repository.git` +- GitHub URL without .git: `https://github.com/owner/repository` +- Owner/repository format: `owner/repository` + +#### Platform-Specific Behavior + +**Linear Integration:** When GitHub integration is enabled for your Linear workspace with issue sync activated, the target repository is automatically detected from the linked GitHub issue. Manual specification is not required in this configuration. + +**Jira Integrations:** Always include the repository information in your issue description or `@openhands` comment to ensure proper repository detection. + +## Troubleshooting + +### Platform Configuration Issues +- **Webhook not triggering**: Verify the webhook URL is correct and the proper event types are selected (Comment, Issue updated) +- **API authentication failing**: Check API key/token validity and ensure required scopes are granted. If your current API token is expired, make sure to update it in the respective integration settings +- **Permission errors**: Ensure the service account has access to relevant projects/teams and appropriate permissions + +### Workspace Integration Issues +- **Workspace linking requests credentials**: If there are no active workspace integrations for the workspace you specified, you need to configure it first. Contact your platform administrator that you want to integrate with (eg: Jira, Linear) +- **Integration not found**: Verify the workspace name matches exactly and that platform configuration was completed first +- **OAuth flow fails**: Make sure that you're authorizing with the correct account with proper workspace access + +### General Issues +- **Agent not responding**: Check webhook logs in your platform settings and verify service account status +- **Authentication errors**: Verify Git provider permissions and OpenHands Cloud access +- **Agent fails to identify git repo**: Ensure you're signing in with the same Git provider account that contains the repositories you want OpenHands to work on +- **Partial functionality**: Ensure both platform configuration and workspace integration are properly completed + +### Getting Help +For additional support, contact OpenHands Cloud support with: +- Your integration platform (Linear, Jira Cloud, or Jira Data Center) +- Workspace name +- Error logs from webhook/integration attempts +- Screenshots of configuration settings (without sensitive credentials) + +### Slack Integration +Source: https://docs.openhands.dev/openhands/usage/cloud/slack-installation.md + + + + +OpenHands utilizes a large language model (LLM), which may generate responses that are inaccurate or incomplete. +While we strive for accuracy, OpenHands' outputs are not guaranteed to be correct, and we encourage users to +validate critical information independently. + + +## Prerequisites + +- Access to OpenHands Cloud. + +## Installation Steps + + + + + **This step is for Slack admins/owners** + + 1. Make sure you have permissions to install Apps to your workspace. + 2. Click the button below to install OpenHands Slack App Add to Slack + 3. In the top right corner, select the workspace to install the OpenHands Slack app. + 4. Review permissions and click allow. + + + + + + **Make sure your Slack workspace admin/owner has installed OpenHands Slack App first.** + + Every user in the Slack workspace (including admins/owners) must link their OpenHands Cloud account to the OpenHands Slack App. To do this: + 1. Visit the [Settings > Integrations](https://app.all-hands.dev/settings/integrations) page in OpenHands Cloud. + 2. Click `Install OpenHands Slack App`. + 3. In the top right corner, select the workspace to install the OpenHands Slack app. + 4. Review permissions and click allow. + + Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App. + + + + + + +## Working With the Slack App + +To start a new conversation, you can mention `@openhands` in a new message or a thread inside any Slack channel. + +Once a conversation is started, all thread messages underneath it will be follow-up messages to OpenHands. + +To send follow-up messages for the same conversation, mention `@openhands` in a thread reply to the original message. +You must be the user who started the conversation. + +## Example conversation + +### Start a new conversation, and select repo + +Conversation is started by mentioning `@openhands`. + +![slack-create-conversation.png](/openhands/static/img/slack-create-conversation.png) + +### See agent response and send follow up messages + +Initial request is followed up by mentioning `@openhands` in a thread reply. + +![slack-results-and-follow-up.png](/openhands/static/img/slack-results-and-follow-up.png) + +## Pro tip + +You can mention a repo name when starting a new conversation in the following formats + +1. "My-Repo" repo (e.g `@openhands in the openhands repo ...`) +2. "OpenHands/OpenHands" (e.g `@openhands in OpenHands/OpenHands ...`) + +The repo match is case insensitive. If a repo name match is made, it will kick off the conversation. +If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list. + +![slack-pro-tip.png](/openhands/static/img/slack-pro-tip.png) + +## OpenHands Overview + +### Community +Source: https://docs.openhands.dev/overview/community.md + +# The OpenHands Community + +OpenHands is a community of engineers, academics, and enthusiasts reimagining software development for an AI-powered world. + +## Mission + +It's very clear that AI is changing software development. We want the developer community to drive that change organically, through open source. + +So we're not just building friendly interfaces for AI-driven development. We're publishing _building blocks_ that empower developers to create new experiences, tailored to your own habits, needs, and imagination. + +## Ethos + +We have two core values: **high openness** and **high agency**. While we don't expect everyone in the community to embody these values, we want to establish them as norms. + +### High Openness + +We welcome anyone and everyone into our community by default. You don't have to be a software developer to help us build. You don't have to be pro-AI to help us learn. + +Our plans, our work, our successes, and our failures are all public record. We want the world to see not just the fruits of our work, but the whole process of growing it. + +We welcome thoughtful criticism, whether it's a comment on a PR or feedback on the community as a whole. + +### High Agency + +Everyone should feel empowered to contribute to OpenHands. Whether it's by making a PR, hosting an event, sharing feedback, or just asking a question, don't hold back! + +OpenHands gives everyone the building blocks to create state-of-the-art developer experiences. We experiment constantly and love building new things. + +Coding, development practices, and communities are changing rapidly. We won't hesitate to change direction and make big bets. + +## Relationship to All Hands + +OpenHands is supported by the for-profit organization [All Hands AI, Inc](https://www.all-hands.dev/). + +All Hands was founded by three of the first major contributors to OpenHands: + +- Xingyao Wang, a UIUC PhD candidate who got OpenHands to the top of the SWE-bench leaderboards +- Graham Neubig, a CMU Professor who rallied the academic community around OpenHands +- Robert Brennan, a software engineer who architected the user-facing features of OpenHands + +All Hands is an important part of the OpenHands ecosystem. We've raised over $20M—mainly to hire developers and researchers who can work on OpenHands full-time, and to provide them with expensive infrastructure. ([Join us!](https://allhandsai.applytojob.com/apply/)) + +But we see OpenHands as much larger, and ultimately more important, than All Hands. When our financial responsibility to investors is at odds with our social responsibility to the community—as it inevitably will be, from time to time—we promise to navigate that conflict thoughtfully and transparently. + +At some point, we may transfer custody of OpenHands to an open source foundation. But for now, the [Benevolent Dictator approach](http://www.catb.org/~esr/writings/cathedral-bazaar/homesteading/ar01s16.html) helps us move forward with speed and intention. If we ever forget the "benevolent" part, please: fork us. + +### Contributing +Source: https://docs.openhands.dev/overview/contributing.md + +# Contributing to OpenHands + +Welcome to the OpenHands community! We're building the future of AI-powered software development, and we'd love for you to be part of this journey. + +## Our Vision: Free as in Freedom + +The OpenHands community is built around the belief that **AI and AI agents are going to fundamentally change the way we build software**, and if this is true, we should do everything we can to make sure that the benefits provided by such powerful technology are **accessible to everyone**. + +We believe in the power of open source to democratize access to cutting-edge AI technology. Just as the internet transformed how we share information, we envision a world where AI-powered development tools are available to every developer, regardless of their background or resources. + +If this resonates with you, we'd love to have you join us in our quest! + +## What Can You Build? + +There are countless ways to contribute to OpenHands. Whether you're a seasoned developer, a researcher, a designer, or someone just getting started, there's a place for you in our community. + +### Frontend & UI/UX +Make OpenHands more beautiful and user-friendly: +- **React & TypeScript Development** - Improve the web interface +- **UI/UX Design** - Enhance user experience and accessibility +- **Mobile Responsiveness** - Make OpenHands work great on all devices +- **Component Libraries** - Build reusable UI components + +*Small fixes are always welcome! For bigger changes, join our **#eng-ui-ux** channel in [Slack](https://openhands.dev/joinslack) first.* + +### Agent Development +Help make our AI agents smarter and more capable: +- **Prompt Engineering** - Improve how agents understand and respond +- **New Agent Types** - Create specialized agents for different tasks +- **Agent Evaluation** - Develop better ways to measure agent performance +- **Multi-Agent Systems** - Enable agents to work together + +*We use [SWE-bench](https://www.swebench.com/) to evaluate our agents. Join our [Slack](https://openhands.dev/joinslack) to learn more.* + +### Backend & Infrastructure +Build the foundation that powers OpenHands: +- **Python Development** - Core functionality and APIs +- **Runtime Systems** - Docker containers and sandboxes +- **Cloud Integrations** - Support for different cloud providers +- **Performance Optimization** - Make everything faster and more efficient + +### Testing & Quality Assurance +Help us maintain high quality: +- **Unit Testing** - Write tests for new features +- **Integration Testing** - Ensure components work together +- **Bug Hunting** - Find and report issues +- **Performance Testing** - Identify bottlenecks and optimization opportunities + +### Documentation & Education +Help others learn and contribute: +- **Technical Documentation** - API docs, guides, and tutorials +- **Video Tutorials** - Create learning content +- **Translation** - Make OpenHands accessible in more languages +- **Community Support** - Help other users and contributors + +### Research & Innovation +Push the boundaries of what's possible: +- **Academic Research** - Publish papers using OpenHands +- **Benchmarking** - Develop new evaluation methods +- **Experimental Features** - Try cutting-edge AI techniques +- **Data Analysis** - Study how developers use AI tools + +## 🚀 Getting Started + +Ready to contribute? Here's your path to making an impact: + +### 1. Quick Wins +Start with these easy contributions: +- **Use OpenHands** and [report issues](https://github.com/OpenHands/OpenHands/issues) you encounter +- **Give feedback** using the thumbs-up/thumbs-down buttons after each session +- **Star our repository** on [GitHub](https://github.com/OpenHands/OpenHands) +- **Share OpenHands** with other developers + +### 2. Set Up Your Development Environment +Follow our setup guide: +- **Requirements**: Linux/Mac/WSL, Docker, Python 3.12, Node.js 22+, Poetry 1.8+ +- **Quick setup**: `make build` to get everything ready +- **Configuration**: `make setup-config` to configure your LLM +- **Run locally**: `make run` to start the application + +*Full details in our [Development Guide](https://github.com/OpenHands/OpenHands/blob/main/Development.md)* + +### 3. Find Your First Issue +Look for beginner-friendly opportunities: +- Browse [good first issues](https://github.com/OpenHands/OpenHands/labels/good%20first%20issue) +- Check our [project boards](https://github.com/OpenHands/OpenHands/projects) for organized tasks +- Ask in [Slack](https://openhands.dev/joinslack) what needs help + +### 4. Join the Community +Connect with other contributors in our [Slack Community](https://openhands.dev/joinslack). You can connect with OpenHands contributors, maintainers, and more! + +## 📋 How to Contribute Code + +### Understanding the Codebase +Get familiar with our architecture: +- **[Frontend](https://github.com/OpenHands/OpenHands/tree/main/frontend/README.md)** - React application +- **[Backend](https://github.com/OpenHands/OpenHands/tree/main/openhands/README.md)** - Python core +- **[Agents](https://github.com/OpenHands/OpenHands/tree/main/openhands/agenthub/README.md)** - AI agent implementations +- **[Runtime](https://github.com/OpenHands/OpenHands/tree/main/openhands/runtime/README.md)** - Execution environments +- **[Evaluation](https://github.com/OpenHands/benchmarks)** - Testing and benchmarks + +### Pull Request Process +We welcome all pull requests! Here's how we evaluate them: + +#### Small Improvements +- Quick review and approval for obvious improvements +- Make sure CI tests pass +- Include clear description of changes + +#### Core Agent Changes +We're more careful with agent changes since they affect user experience: +- **Accuracy** - Does it make the agent better at solving problems? +- **Efficiency** - Does it improve speed or reduce resource usage? +- **Code Quality** - Is the code maintainable and well-tested? + +*Discuss major changes in [GitHub issues](https://github.com/OpenHands/OpenHands/issues) or [Slack](https://openhands.dev/joinslack) first!* + +### Pull Request Guidelines +We recommend the following for smooth reviews but they're not required. Just know that the more you follow these guidelines, the more likely you'll get your PR reviewed faster and reduce the quantity of revisions. + +**Title Format:** +- `feat: Add new agent capability` +- `fix: Resolve memory leak in runtime` +- `docs: Update installation guide` +- `style: Fix code formatting` +- `refactor: Simplify authentication logic` +- `test: Add unit tests for parser` + +**Description:** +- Explain what the PR does and why +- Link to related issues +- Include screenshots for UI changes +- Add changelog entry for user-facing changes + +## License + +OpenHands is released under the **MIT License**, which means: + +### You Can: +- **Use** OpenHands for any purpose, including commercial projects +- **Modify** the code to fit your needs +- **Share** your modifications +- **Distribute** or sell copies of OpenHands + +### You Must: +- **Include** the original copyright notice and license text +- **Preserve** the license in any substantial portions you use + +### No Warranty: +- OpenHands is provided "as is" without warranty +- Contributors are not liable for any damages + +*Full license text: [LICENSE](https://github.com/OpenHands/OpenHands/blob/main/LICENSE)* + +**Special Note:** Content in the `enterprise/` directory has a separate license. See `enterprise/LICENSE` for details. + +## Ready to make your first contribution? + +1. **⭐ Star** our [GitHub repository](https://github.com/OpenHands/OpenHands) +2. **🔧 Set up** your development environment using our [Development Guide](https://github.com/OpenHands/OpenHands/blob/main/Development.md) +3. **💬 Join** our [Slack community](https://openhands.dev/joinslack) to meet other contributors +4. **🎯 Find** a [good first issue](https://github.com/OpenHands/OpenHands/labels/good%20first%20issue) to work on +5. **📝 Read** our [Code of Conduct](https://github.com/OpenHands/OpenHands/blob/main/CODE_OF_CONDUCT.md) + +## Need Help? + +Don't hesitate to ask for help: +- **Slack**: [Join our community](https://openhands.dev/joinslack) for real-time support +- **GitHub Issues**: [Open an issue](https://github.com/OpenHands/OpenHands/issues) for bugs or feature requests +- **Email**: Contact us at [contact@openhands.dev](mailto:contact@openhands.dev) + +--- + +Thank you for considering contributing to OpenHands! Together, we're building tools that will democratize AI-powered software development and make it accessible to developers everywhere. Every contribution, no matter how small, helps us move closer to that vision. + +Welcome to the community! 🎉 + +### FAQs +Source: https://docs.openhands.dev/overview/faqs.md + +## Getting Started + +### I'm new to OpenHands. Where should I start? + +1. **Quick start**: Use [OpenHands Cloud](/openhands/usage/cloud/openhands-cloud) to get started quickly with + [GitHub](/openhands/usage/cloud/github-installation), [GitLab](/openhands/usage/cloud/gitlab-installation), + [Bitbucket](/openhands/usage/cloud/bitbucket-installation), + and [Slack](/openhands/usage/cloud/slack-installation) integrations. +2. **Run on your own**: If you prefer to run it on your own hardware, follow our [Getting Started guide](/openhands/usage/run-openhands/local-setup). +3. **First steps**: Read over the [first projects guidelines](/overview/first-projects) and + [prompting best practices](/openhands/usage/tips/prompting-best-practices) to learn the basics. + +### Can I use OpenHands for production workloads? + +OpenHands is meant to be run by a single user on their local workstation. It is not appropriate for multi-tenant +deployments where multiple users share the same instance. There is no built-in authentication, isolation, or scalability. + +If you're interested in running OpenHands in a multi-tenant environment, please [contact us](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform) about our enterprise deployment options. + + +Using OpenHands for work? We'd love to chat! Fill out +[this short form](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform) +to join our Design Partner program, where you'll get early access to commercial features and the opportunity to provide +input on our product roadmap. + + +## Safety and Security + +### It's doing stuff without asking, is that safe? + +**Generally yes, but with important considerations.** OpenHands runs all code in a secure, isolated Docker container +(called a "sandbox") that is separate from your host system. However, the safety depends on your configuration: + +**What's protected:** +- Your host system files and programs (unless you mount them using [this feature](/openhands/usage/sandboxes/docker#connecting-to-your-filesystem)) +- Host system resources +- Other containers and processes + +**Potential risks to consider:** +- The agent can access the internet from within the container. +- If you provide credentials (API keys, tokens), the agent can use them. +- Mounted files and directories can be modified or deleted. +- Network requests can be made to external services. + +For detailed security information, see our [Runtime Architecture](/openhands/usage/architecture/runtime), +[Security Configuration](/openhands/usage/advanced/configuration-options#security-configuration), +and [Hardened Docker Installation](/openhands/usage/sandboxes/docker#hardened-docker-installation) documentation. + +## File Storage and Access + +### Where are my files stored? + +Your files are stored in different locations depending on how you've configured OpenHands: + +**Default behavior (no file mounting):** +- Files created by the agent are stored inside the runtime Docker container. +- These files are temporary and will be lost when the container is removed. +- The agent works in the `/workspace` directory inside the runtime container. + +**When you mount your local filesystem (following [this](/openhands/usage/sandboxes/docker#connecting-to-your-filesystem)):** +- Your local files are mounted into the container's `/workspace` directory. +- Changes made by the agent are reflected in your local filesystem. +- Files persist after the container is stopped. + + +Be careful when mounting your filesystem - the agent can modify or delete any files in the mounted directory. + + +## Development Tools and Environment + +### How do I get the dev tools I need? + +OpenHands comes with a basic runtime environment that includes Python and Node.js. +It also has the ability to install any tools it needs, so usually it's sufficient to ask it to set up its environment. + +If you would like to set things up more systematically, you can: +- **Use setup.sh**: Add a [setup.sh file](/openhands/usage/customization/repository#setup-script) file to + your repository, which will be run every time the agent starts. +- **Use a custom sandbox**: Use a [custom docker image](/openhands/usage/advanced/custom-sandbox-guide) to initialize the sandbox. + +### Something's not working. Where can I get help? + +1. **Search existing issues**: Check our [GitHub issues](https://github.com/OpenHands/OpenHands/issues) to see if + others have encountered the same problem. +2. **Join our community**: Get help from other users and developers: + - [Slack community](https://openhands.dev/joinslack) +3. **Check our troubleshooting guide**: Common issues and solutions are documented in + [Troubleshooting](/openhands/usage/troubleshooting/troubleshooting). +4. **Report bugs**: If you've found a bug, please [create an issue](https://github.com/OpenHands/OpenHands/issues/new) + and fill in as much detail as possible. + +### First Projects +Source: https://docs.openhands.dev/overview/first-projects.md + +Like any tool, it works best when you know how to use it effectively. Whether you're experimenting with a small +script or making changes in a large codebase, this guide will show how to apply OpenHands in different scenarios. + +Let’s walk through a natural progression of using OpenHands: +- Try a simple prompt. +- Build a project from scratch. +- Add features to existing code. +- Refactor code. +- Debug and fix bugs. + +## First Steps: Hello World + +Start with a small task to get familiar with how OpenHands responds to prompts. + +Click `New Conversation` and try prompting: +> Write a bash script hello.sh that prints "hello world!" + +OpenHands will generate script, set the correct permissions, and even run it for you. + +Now try making small changes: + +> Modify hello.sh so that it accepts a name as the first argument, but defaults to "world". + +You can experiment in any language. For example: + +> Convert hello.sh to a Ruby script, and run it. + + + Start small and iterate. This helps you understand how OpenHands interprets and responds to different prompts. + + +## Build Something from Scratch + +Agents excel at "greenfield" tasks, where they don’t need context about existing code. +Begin with a simple task and iterate from there. Be specific about what you want and the tech stack. + +Click `New Conversation` and give it a clear goal: + +> Build a frontend-only TODO app in React. All state should be stored in localStorage. + +Once the basics are working, build on it just like you would in a real project: + +> Allow adding an optional due date to each task. + +You can also ask OpenHands to help with version control: + +> Commit the changes and push them to a new branch called "feature/due-dates". + + + Break your goals into small, manageable tasks.. Keep pushing your changes often. This makes it easier to recover + if something goes off track. + + +## Expand Existing Code + +Want to add new functionality to an existing repo? OpenHands can do that too. + + +If you're running OpenHands on your own, first add a +[GitHub token](/openhands/usage/settings/integrations-settings#github-setup), +[GitLab token](/openhands/usage/settings/integrations-settings#gitlab-setup) or +[Bitbucket token](/openhands/usage/settings/integrations-settings#bitbucket-setup). + + +Choose your repository and branch via `Open Repository`, and press `Launch`. + +Examples of adding new functionality: + +> Add a GitHub action that lints the code in this repository. + +> Modify ./backend/api/routes.js to add a new route that returns a list of all tasks. + +> Add a new React component to the ./frontend/components directory to display a list of Widgets. +> It should use the existing Widget component. + + + OpenHands can explore the codebase, but giving it context upfront makes it faster and less expensive. + + +## Refactor Code + +OpenHands does great at refactoring code in small chunks. Rather than rearchitecting the entire codebase, it's more +effective in focused refactoring tasks. Start by launching a conversation with +your repo and branch. Then guide it: + +> Rename all the single-letter variables in ./app.go. + +> Split the `build_and_deploy_widgets` function into two functions, `build_widgets` and `deploy_widgets` in widget.php. + +> Break ./api/routes.js into separate files for each route. + + + Focus on small, meaningful improvements instead of full rewrites. + + +## Debug and Fix Bugs + +OpenHands can help debug and fix issues, but it’s most effective when you’ve narrowed things down. + +Give it a clear description of the problem and the file(s) involved: + +> The email field in the `/subscribe` endpoint is rejecting .io domains. Fix this. + +> The `search_widgets` function in ./app.py is doing a case-sensitive search. Make it case-insensitive. + +For bug fixing, test-driven development can be really useful. You can ask OpenHands to write a new test and iterate +until the bug is fixed: + +> The `hello` function crashes on the empty string. Write a test that reproduces this bug, then fix the code so it passes. + + + Be as specific as possible. Include expected behavior, file names, and examples to speed things up. + + +## Using OpenHands Effectively + +OpenHands can assist with nearly any coding task, but it takes some practice to get the best results. +Keep these tips in mind: +* Keep your tasks small. +* Be clear and specific. +* Provide relevant context. +* Commit and push frequently. + +See [Prompting Best Practices](/openhands/usage/tips/prompting-best-practices) for more tips on how to get the most +out of OpenHands. + +### Introduction +Source: https://docs.openhands.dev/overview/introduction.md + +🙌 Welcome to OpenHands, a [community](/overview/community) focused on AI-driven development. We'd love for you to [join us on Slack](https://openhands.dev/joinslack). + +There are a few ways to work with OpenHands: + +## OpenHands Software Agent SDK +The SDK is a composable Python library that contains all of our agentic tech. It's the engine that powers everything else below. + +Define agents in code, then run them locally, or scale to 1000s of agents in the cloud + +[Check out the docs](https://docs.openhands.dev/sdk) or [view the source](https://github.com/All-Hands-AI/agent-sdk/) + +## OpenHands CLI +The CLI is the easiest way to start using OpenHands. The experience will be familiar to anyone who has worked +with e.g. Claude Code or Codex. You can power it with Claude, GPT, or any other LLM. + +[Check out the docs](https://docs.openhands.dev/openhands/usage/run-openhands/cli-mode) or [view the source](https://github.com/OpenHands/OpenHands-CLI) + +## OpenHands Local GUI +Use the Local GUI for running agents on your laptop. It comes with a REST API and a single-page React application. +The experience will be familiar to anyone who has used Devin or Jules. + +[Check out the docs](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup) or view the source in this repo. + +## OpenHands Cloud +This is a commercial deployment of OpenHands GUI, running on hosted infrastructure. + +You can try it with a free by [signing in with your GitHub account](https://app.all-hands.dev). + +OpenHands Cloud comes with source-available features and integrations: +- Deeper integrations with GitHub, GitLab, and Bitbucket +- Integrations with Slack, Jira, and Linear +- Multi-user support +- RBAC and permissions +- Collaboration features (e.g., conversation sharing) +- Usage reporting +- Budgeting enforcement + +## OpenHands Enterprise +Large enterprises can work with us to self-host OpenHands Cloud in their own VPC, via Kubernetes. +OpenHands Enterprise can also work with the CLI and SDK above. + +OpenHands Enterprise is source-available--you can see all the source code here in the enterprise/ directory, +but you'll need to purchase a license if you want to run it for more than one month. + +Enterprise contracts also come with extended support and access to our research team. + +Learn more at [openhands.dev/enterprise](https://openhands.dev/enterprise) + +## Everything Else + +Check out our [Product Roadmap](https://github.com/orgs/openhands/projects/1), and feel free to +[open up an issue](https://github.com/OpenHands/OpenHands/issues) if there's something you'd like to see! + +You might also be interested in our [evaluation infrastructure](https://github.com/OpenHands/benchmarks), our [chrome extension](https://github.com/OpenHands/openhands-chrome-extension/), or our [Theory-of-Mind module](https://github.com/OpenHands/ToM-SWE). + +All our work is available under the MIT license, except for the `enterprise/` directory in this repository (see the [enterprise license](https://github.com/OpenHands/OpenHands/blob/main/enterprise/LICENSE) for details). +The core `openhands` and `agent-server` Docker images are fully MIT-licensed as well. + +If you need help with anything, or just want to chat, [come find us on Slack](https://openhands.dev/joinslack). + +### Model Context Protocol (MCP) +Source: https://docs.openhands.dev/overview/model-context-protocol.md + +Model Context Protocol (MCP) is an open standard that allows OpenHands to communicate with external tool servers, extending the agent's capabilities with custom tools, specialized data processing, external API access, and more. MCP is based on the open standard defined at [modelcontextprotocol.io](https://modelcontextprotocol.io). + +## How MCP Works + +When OpenHands starts, it: + +1. Reads the MCP configuration +2. Connects to configured servers (SSE, SHTTP, or stdio) +3. Registers tools provided by these servers with the agent +4. Routes tool calls to appropriate MCP servers during execution + +## MCP Support Matrix + +| Platform | Support Level | Configuration Method | Documentation | +|----------|---------------|---------------------|---------------| +| **CLI** | ✅ Full Support | `~/.openhands/mcp.json` file | [CLI MCP Servers](/openhands/usage/cli/mcp-servers) | +| **SDK** | ✅ Full Support | Programmatic configuration | [SDK MCP Guide](/sdk/guides/mcp) | +| **Local GUI** | ✅ Full Support | Settings UI + config files | [Local GUI](/openhands/usage/run-openhands/local-setup) | +| **OpenHands Cloud** | ✅ Full Support | Cloud UI settings | [Cloud GUI](/openhands/usage/cloud/cloud-ui) | + +## Platform-Specific Differences + + + + - Configuration via `~/.openhands/mcp.json` file + - Real-time status monitoring with `/mcp` command + - Supports all MCP transport protocols (SSE, SHTTP, stdio) + - Manual configuration required + + + - Programmatic configuration in code + - Full control over MCP server lifecycle + - Dynamic server registration and management + - Integration with custom tool systems + + + - Visual configuration through Settings UI + - File-based configuration backup + - Real-time server status display + - Supports all transport protocols + + + - Cloud-based configuration management + - Managed MCP server hosting options + - Team-wide configuration sharing + - Enterprise security features + + + +## Getting Started with MCP + +- **For detailed configuration**: See [MCP Settings](/openhands/usage/settings/mcp-settings) +- **For SDK integration**: See [SDK MCP Guide](/sdk/guides/mcp) +- **For architecture details**: See [MCP Architecture](/sdk/arch/mcp) + +### Quick Start +Source: https://docs.openhands.dev/overview/quickstart.md + +Get started with OpenHands in minutes. Choose the option that works best for you. + + + + **Recommended** + + The fastest way to get started. No setup required—just sign in and start coding. + + - Free usage of MiniMax M2.5 for a limited time + - No installation needed + - Managed infrastructure + + + Use OpenHands from your terminal. Perfect for automation and scripting. + + - IDE integrations available + - Headless mode for CI/CD + - Lightweight installation + + + Run OpenHands locally with a web-based interface. Bring your own LLM and API key. + + - Full control over your environment + - Works offline + - Docker-based setup + + + +### Overview +Source: https://docs.openhands.dev/overview/skills.md + +Skills are specialized prompts that enhance OpenHands with domain-specific knowledge, expert guidance, and automated task handling. They provide consistent practices across projects and can be triggered automatically based on keywords or context. + + +OpenHands supports an **extended version** of the [AgentSkills standard](https://agentskills.io/specification) with optional keyword triggers for automatic activation. See the [SDK Skills Guide](/sdk/guides/skill) for details on the SKILL.md format. + + +## Official Skill Registry + +The official global skill registry is maintained at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). This repository contains community-shared skills that can be used by all OpenHands agents. You can browse available skills, contribute your own, and learn from examples created by the community. + +## How Skills Work + +Skills inject additional context and rules into the agent's behavior. + +At a high level, OpenHands supports two loading models: + +- **Always-on context** (e.g., `AGENTS.md`) that is injected into the system prompt at conversation start. +- **On-demand skills** that are either: + - **triggered by the user** (keyword matches), or + - **invoked by the agent** (the agent decides to look up the full skill content). + +## Permanent agent context (recommended) + +For repository-wide, always-on instructions, prefer a root-level `AGENTS.md` file. + +We also support model-specific variants: +- `GEMINI.md` for Gemini +- `CLAUDE.md` for Claude + +## Triggered and optional skills + +To add optional skills that are loaded on demand: + +- **AgentSkills standard (recommended for progressive disclosure)**: create one directory per skill and add a `SKILL.md` file. +- **Legacy/OpenHands format (simple)**: put markdown files in `.agents/skills/*.md` at the repository root. + + +Loaded skills take up space in the context window. On-demand skills help keep the system prompt smaller because the agent sees a summary first and reads the full content only when needed. + + +### Example Repository Structure + +``` +some-repository/ +├── AGENTS.md # Permanent repository guidelines (recommended) +└── .agents/ + └── skills/ + ├── rot13-encryption/ # AgentSkills standard (progressive disclosure) + │ ├── SKILL.md + │ ├── scripts/ + │ │ └── rot13.sh + │ └── references/ + │ └── README.md + ├── another-agentskill/ # AgentSkills standard (progressive disclosure) + │ ├── SKILL.md + │ └── scripts/ + │ └── placeholder.sh + └── legacy_trigger_this.md # Legacy/OpenHands format (keyword-triggered) +``` + +## Skill Loading Precedence + +For project location, paths are relative to the repository root; `.agents/skills/` is a subdirectory of the project directory. +For user home location, paths are relative to the user home: `~/` + +When multiple skills share the same name, OpenHands keeps the first match in this order: + +1. `.agents/skills/` (recommended) +2. `.openhands/skills/` (deprecated) +3. `.openhands/microagents/` (deprecated) + +Project-specific skills take precedence over user skills. + +## Skill Types + +Currently supported skill types: + +- **[Permanent Context](/overview/skills/repo)**: Repository-wide guidelines and best practices. We recommend `AGENTS.md` (and optionally `GEMINI.md` / `CLAUDE.md`). +- **[Keyword-Triggered Skills](/overview/skills/keyword)**: Guidelines activated by specific keywords in user prompts. +- **[Organization Skills](/overview/skills/org)**: Team or organization-wide standards. +- **[Global Skills](/overview/skills/public)**: Community-shared skills and templates. + +### Skills Frontmatter Requirements + +Each skill file may include frontmatter that provides additional information. In some cases, this frontmatter is required: + +| Skill Type | Required | +|-------------|----------| +| General Skills | No | +| Keyword-Triggered Skills | Yes | + +## Skills Support Matrix + +| Platform | Support Level | Configuration Method | Implementation | Documentation | +|----------|---------------|---------------------|----------------|---------------| +| **CLI** | ✅ Full Support | `~/.agents/skills/` (user-level) and `.agents/skills/` (repo-level) | File-based markdown | [Skills Overview](/overview/skills) | +| **SDK** | ✅ Full Support | Programmatic `Skill` objects | Code-based configuration | [SDK Skills Guide](/sdk/guides/skill) | +| **Local GUI** | ✅ Full Support | `.agents/skills/` + UI | File-based with UI management | [Local Setup](/openhands/usage/run-openhands/local-setup) | +| **OpenHands Cloud** | ✅ Full Support | Cloud UI + repository integration | Managed skill library | [Cloud UI](/openhands/usage/cloud/cloud-ui) | + +## Platform-Specific Differences + + + + - File-based configuration in two locations: + - `~/.agents/skills/` - User-level skills (all conversations). + - `.agents/skills/` - Repository-level skills (current directory) + - Markdown format for skill definitions + - Manual file management required + - Supports both general and keyword-triggered skills + + + - Programmatic `Skill` objects in code + - Dynamic skill creation and management + - Integration with custom workflows + - Full control over skill lifecycle + + + - Visual skill management through UI + - File-based storage with GUI editing + - Real-time skill status display + - Drag-and-drop skill organization + + + - Cloud-based skill library management + - Team-wide skill sharing and templates + - Organization-level skill policies + - Integrated skill marketplace + + + +## Learn More + +- **For SDK integration**: See [SDK Skills Guide](/sdk/guides/skill) +- **For architecture details**: See [Skills Architecture](/sdk/arch/skill) +- **For specific skill types**: See [Repository Skills](/overview/skills/repo), [Keyword Skills](/overview/skills/keyword), [Organization Skills](/overview/skills/org), and [Global Skills](/overview/skills/public) + +### Keyword-Triggered Skills +Source: https://docs.openhands.dev/overview/skills/keyword.md + +## Usage + +These skills are only loaded when a prompt includes one of the trigger words. + +## Frontmatter Syntax + +Frontmatter is required for keyword-triggered skills. It must be placed at the top of the file, +above the guidelines. + +Enclose the frontmatter in triple dashes (---) and include the following fields: + +| Field | Description | Required | Default | +|------------|--------------------------------------------------|----------|------------------| +| `triggers` | A list of keywords that activate the skill. | Yes | None | + + +## Example + +Keyword-triggered skill file example located at `.agents/skills/yummy.md`: +``` +--- +triggers: +- yummyhappy +- happyyummy +--- + +The user has said the magic word. Respond with "That was delicious!" +``` + +[See examples of keyword-triggered skills in the official OpenHands Skills Registry](https://github.com/OpenHands/extensions) + +### Organization and User Skills +Source: https://docs.openhands.dev/overview/skills/org.md + +## Usage + +These skills can be [any type of skill](/overview/skills#skill-types) and will be loaded +accordingly. However, they are applied to all repositories belonging to the organization or user. + +Add a `.agents` repository under the organization or user and create a `skills` directory and place the +skills in that directory. + +For GitLab organizations, use `openhands-config` as the repository name instead of `.agents`, since GitLab doesn't support repository names starting with non-alphanumeric characters. + +## Example + +General skill file example for organization `Great-Co` located inside the `.agents` repository: +`skills/org-skill.md`: +``` +* Use type hints and error boundaries; validate inputs at system boundaries and fail with meaningful error messages. +* Document interfaces and public APIs; use implementation comments only for non-obvious logic. +* Follow the same naming convention for variables, classes, constants, etc. already used in each repository. +``` + +For GitLab organizations, the same skill would be located inside the `openhands-config` repository. + +## User Skills When Running Openhands on Your Own + + + This works with CLI, headless and development modes. It does not work out of the box when running OpenHands using the docker command. + + +When running OpenHands on your own, you can place skills in the `~/.agents/skills` folder on your local +system and OpenHands will always load it for all your conversations. Repo-level overrides live in `.agents/skills`. + +### Global Skills +Source: https://docs.openhands.dev/overview/skills/public.md + +## Global Skill Registry + +The official global skill registry is hosted at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). This repository contains community-shared skills that can be used by all OpenHands users. + +## Contributing a Global Skill + +You can create global skills and share with the community by opening a pull request to the official skill registry. + +See the [OpenHands Skill Registry](https://github.com/OpenHands/extensions) for specific instructions on how to contribute a global skill. + +### Global Skills Best Practices + +- **Clear Scope**: Keep the skill focused on a specific domain or task. +- **Explicit Instructions**: Provide clear, unambiguous guidelines. +- **Useful Examples**: Include practical examples of common use cases. +- **Safety First**: Include necessary warnings and constraints. +- **Integration Awareness**: Consider how the skill interacts with other components. + +### Steps to Contribute a Global Skill + +#### 1. Plan the Global Skill + +Before creating a global skill, consider: + +- What specific problem or use case will it address? +- What unique capabilities or knowledge should it have? +- What trigger words make sense for activating it? +- What constraints or guidelines should it follow? + +#### 2. Create File + +Create a new Markdown file with a descriptive name in the official skill registry: +[github.com/OpenHands/extensions](https://github.com/OpenHands/extensions) + +#### 3. Testing the Global Skill + +- Test the agent with various prompts. +- Verify trigger words activate the agent correctly. +- Ensure instructions are clear and comprehensive. +- Check for potential conflicts and overlaps with existing agents. + +#### 4. Submission Process + +Submit a pull request with: + +- The new skill file. +- Updated documentation if needed. +- Description of the agent's purpose and capabilities. + +### General Skills +Source: https://docs.openhands.dev/overview/skills/repo.md + +## Usage + +These skills are always loaded as part of the context. + +## Frontmatter Syntax + +The frontmatter for this type of skill is optional. + +Frontmatter should be enclosed in triple dashes (---) and may include the following fields: + +| Field | Description | Required | Default | +|-----------|-----------------------------------------|----------|----------------| +| `agent` | The agent this skill applies to | No | 'CodeActAgent' | + +## Creating a Repository Agent + +To create an effective repository agent, you can ask OpenHands to analyze your repository with a prompt like: + +``` +Please browse the repository, look at the documentation and relevant code, and understand the purpose of this repository. + +Specifically, I want you to create an `AGENTS.md` file at the repository root. This file should contain succinct information that summarizes: +1. The purpose of this repository +2. The general setup of this repo +3. A brief description of the structure of this repo + +Read all the GitHub workflows under .github/ of the repository (if this folder exists) to understand the CI checks (e.g., linter, pre-commit), and include those in the `AGENTS.md` file. +``` + +This approach helps OpenHands capture repository context efficiently, reducing the need for repeated searches during conversations and ensuring more accurate solutions. + +## Example Content + +An `AGENTS.md` file should include: + +``` +# Repository Purpose +This project is a TODO application that allows users to track TODO items. + +# Setup Instructions +To set it up, you can run `npm run build`. + +# Repository Structure +- `/src`: Core application code +- `/tests`: Test suite +- `/docs`: Documentation +- `/.github`: CI/CD workflows + +# CI/CD Workflows +- `lint.yml`: Runs ESLint on all JavaScript files +- `test.yml`: Runs the test suite on pull requests + +# Development Guidelines +Always make sure the tests are passing before committing changes. You can run the tests by running `npm run test`. +``` + +[See more examples of general skills at OpenHands Skills registry.](https://github.com/OpenHands/extensions) diff --git a/llms.txt b/llms.txt new file mode 100644 index 00000000..a8ea9e56 --- /dev/null +++ b/llms.txt @@ -0,0 +1,182 @@ +# OpenHands Docs + +> LLM-friendly index of OpenHands documentation (V1). Legacy V0 docs pages are intentionally excluded. + +The sections below intentionally separate OpenHands product documentation (Web App Server / Cloud / CLI) +from the OpenHands Software Agent SDK. + +## OpenHands Software Agent SDK + +- [Agent](https://docs.openhands.dev/sdk/arch/agent.md): High-level architecture of the reasoning-action loop +- [Agent Server Package](https://docs.openhands.dev/sdk/arch/agent-server.md): HTTP API server for remote agent execution with workspace isolation, container orchestration, and multi-user support. +- [Agent Skills & Context](https://docs.openhands.dev/sdk/guides/skill.md): Skills add specialized behaviors, domain knowledge, and context-aware triggers to your agent through structured prompts. +- [API-based Sandbox](https://docs.openhands.dev/sdk/guides/agent-server/api-sandbox.md): Connect to hosted API-based agent server for fully managed infrastructure. +- [Apptainer Sandbox](https://docs.openhands.dev/sdk/guides/agent-server/apptainer-sandbox.md): Run agent server in rootless Apptainer containers for HPC and shared computing environments. +- [Ask Agent Questions](https://docs.openhands.dev/sdk/guides/convo-ask-agent.md): Get sidebar replies from the agent during conversation execution without interrupting the main flow. +- [Assign Reviews](https://docs.openhands.dev/sdk/guides/github-workflows/assign-reviews.md): Automate PR management with intelligent reviewer assignment and workflow notifications using OpenHands Agent +- [Browser Session Recording](https://docs.openhands.dev/sdk/guides/browser-session-recording.md): Record and replay your agent's browser sessions using rrweb. +- [Browser Use](https://docs.openhands.dev/sdk/guides/agent-browser-use.md): Enable web browsing and interaction capabilities for your agent. +- [Condenser](https://docs.openhands.dev/sdk/arch/condenser.md): High-level architecture of the conversation history compression system +- [Context Condenser](https://docs.openhands.dev/sdk/guides/context-condenser.md): Manage agent memory by condensing conversation history to save tokens. +- [Conversation](https://docs.openhands.dev/sdk/arch/conversation.md): High-level architecture of the conversation orchestration system +- [Conversation with Async](https://docs.openhands.dev/sdk/guides/convo-async.md): Use async/await for concurrent agent operations and non-blocking execution. +- [Creating Custom Agent](https://docs.openhands.dev/sdk/guides/agent-custom.md): Learn how to design specialized agents with custom tool sets +- [Critic (Experimental)](https://docs.openhands.dev/sdk/guides/critic.md): Real-time evaluation of agent actions using an LLM-based critic model, with built-in iterative refinement. +- [Custom Tools](https://docs.openhands.dev/sdk/guides/custom-tools.md): Tools define what agents can do. The SDK includes built-in tools for common operations and supports creating custom tools for specialized needs. +- [Custom Tools with Remote Agent Server](https://docs.openhands.dev/sdk/guides/agent-server/custom-tools.md): Learn how to use custom tools with a remote agent server by building a custom base image that includes your tool implementations. +- [Custom Visualizer](https://docs.openhands.dev/sdk/guides/convo-custom-visualizer.md): Customize conversation visualization by creating custom visualizers or configuring the default visualizer. +- [Design Principles](https://docs.openhands.dev/sdk/arch/design.md): Core architectural principles guiding the OpenHands Software Agent SDK's development. +- [Docker Sandbox](https://docs.openhands.dev/sdk/guides/agent-server/docker-sandbox.md): Run agent server in isolated Docker containers for security and reproducibility. +- [Events](https://docs.openhands.dev/sdk/arch/events.md): High-level architecture of the typed event framework +- [Exception Handling](https://docs.openhands.dev/sdk/guides/llm-error-handling.md): Provider‑agnostic exceptions raised by the SDK and recommended patterns for handling them. +- [FAQ](https://docs.openhands.dev/sdk/faq.md): Frequently asked questions about the OpenHands SDK +- [Getting Started](https://docs.openhands.dev/sdk/getting-started.md): Install the OpenHands SDK and build AI agents that write software. +- [Hello World](https://docs.openhands.dev/sdk/guides/hello-world.md): The simplest possible OpenHands agent - configure an LLM, create an agent, and complete a task. +- [Hooks](https://docs.openhands.dev/sdk/guides/hooks.md): Use lifecycle hooks to observe, log, and customize agent execution. +- [Image Input](https://docs.openhands.dev/sdk/guides/llm-image-input.md): Send images to multimodal agents for vision-based tasks and analysis. +- [Interactive Terminal](https://docs.openhands.dev/sdk/guides/agent-interactive-terminal.md): Enable agents to interact with terminal applications like ipython, python REPL, and other interactive CLI tools. +- [Iterative Refinement](https://docs.openhands.dev/sdk/guides/iterative-refinement.md): Implement iterative refinement workflows where agents refine their work based on critique feedback until quality thresholds are met. +- [LLM](https://docs.openhands.dev/sdk/arch/llm.md): High-level architecture of the provider-agnostic language model interface +- [LLM Fallback Strategy](https://docs.openhands.dev/sdk/guides/llm-fallback.md): Automatically try alternate LLMs when the primary model fails with a transient error. +- [LLM Profile Store](https://docs.openhands.dev/sdk/guides/llm-profile-store.md): Save, load, and manage reusable LLM configurations so you never repeat setup code again. +- [LLM Registry](https://docs.openhands.dev/sdk/guides/llm-registry.md): Dynamically select and configure language models using the LLM registry. +- [LLM Streaming](https://docs.openhands.dev/sdk/guides/llm-streaming.md): Stream LLM responses token-by-token for real-time display and interactive user experiences. +- [LLM Subscriptions](https://docs.openhands.dev/sdk/guides/llm-subscriptions.md): Use your ChatGPT Plus/Pro subscription to access Codex models without consuming API credits. +- [Local Agent Server](https://docs.openhands.dev/sdk/guides/agent-server/local-server.md): Run agents through a local HTTP server with RemoteConversation for client-server architecture. +- [MCP Integration](https://docs.openhands.dev/sdk/arch/mcp.md): High-level architecture of Model Context Protocol support +- [Metrics Tracking](https://docs.openhands.dev/sdk/guides/metrics.md): Track token usage, costs, and latency metrics for your agents. +- [Model Context Protocol](https://docs.openhands.dev/sdk/guides/mcp.md): Model Context Protocol (MCP) enables dynamic tool integration from external servers. Agents can discover and use MCP-provided tools automatically. +- [Model Routing](https://docs.openhands.dev/sdk/guides/llm-routing.md): Route agent's LLM requests to different models. +- [Observability & Tracing](https://docs.openhands.dev/sdk/guides/observability.md): Enable OpenTelemetry tracing to monitor and debug your agent's execution with tools like Laminar, Honeycomb, or any OTLP-compatible backend. +- [OpenHands Cloud Workspace](https://docs.openhands.dev/sdk/guides/agent-server/cloud-workspace.md): Connect to OpenHands Cloud for fully managed sandbox environments. +- [openhands.sdk.agent](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.agent.md): API reference for openhands.sdk.agent module +- [openhands.sdk.conversation](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.conversation.md): API reference for openhands.sdk.conversation module +- [openhands.sdk.event](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.event.md): API reference for openhands.sdk.event module +- [openhands.sdk.llm](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.llm.md): API reference for openhands.sdk.llm module +- [openhands.sdk.security](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.security.md): API reference for openhands.sdk.security module +- [openhands.sdk.tool](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.tool.md): API reference for openhands.sdk.tool module +- [openhands.sdk.utils](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.utils.md): API reference for openhands.sdk.utils module +- [openhands.sdk.workspace](https://docs.openhands.dev/sdk/api-reference/openhands.sdk.workspace.md): API reference for openhands.sdk.workspace module +- [Overview](https://docs.openhands.dev/sdk/arch/overview.md): Understanding the OpenHands Software Agent SDK's package structure, component interactions, and execution models. +- [Overview](https://docs.openhands.dev/sdk/guides/agent-server/overview.md): Run agents on remote servers with isolated workspaces for production deployments. +- [Pause and Resume](https://docs.openhands.dev/sdk/guides/convo-pause-and-resume.md): Pause agent execution, perform operations, and resume without losing state. +- [Persistence](https://docs.openhands.dev/sdk/guides/convo-persistence.md): Save and restore conversation state for multi-session workflows. +- [Plugins](https://docs.openhands.dev/sdk/guides/plugins.md): Plugins bundle skills, hooks, MCP servers, agents, and commands into reusable packages that extend agent capabilities. +- [PR Review](https://docs.openhands.dev/sdk/guides/github-workflows/pr-review.md): Use OpenHands Agent to generate meaningful pull request review +- [Reasoning](https://docs.openhands.dev/sdk/guides/llm-reasoning.md): Access model reasoning traces from Anthropic extended thinking and OpenAI responses API. +- [SDK Package](https://docs.openhands.dev/sdk/arch/sdk.md): Core framework components for building agents - the reasoning loop, state management, and extensibility system. +- [Secret Registry](https://docs.openhands.dev/sdk/guides/secrets.md): Provide environment variables and secrets to agent workspace securely. +- [Security](https://docs.openhands.dev/sdk/arch/security.md): High-level architecture of action security analysis and validation +- [Security & Action Confirmation](https://docs.openhands.dev/sdk/guides/security.md): Control agent action execution through confirmation policy and security analyzer. +- [Send Message While Running](https://docs.openhands.dev/sdk/guides/convo-send-message-while-running.md): Interrupt running agents to provide additional context or corrections. +- [Skill](https://docs.openhands.dev/sdk/arch/skill.md): High-level architecture of the reusable prompt system +- [Software Agent SDK](https://docs.openhands.dev/sdk.md): Build AI agents that write software. A clean, modular SDK with production-ready tools. +- [Stuck Detector](https://docs.openhands.dev/sdk/guides/agent-stuck-detector.md): Detect and handle stuck agents automatically with timeout mechanisms. +- [Sub-Agent Delegation](https://docs.openhands.dev/sdk/guides/agent-delegation.md): Enable parallel task execution by delegating work to multiple sub-agents that run independently and return consolidated results. +- [Theory of Mind (TOM) Agent](https://docs.openhands.dev/sdk/guides/agent-tom-agent.md): Enable your agent to understand user intent and preferences through Theory of Mind capabilities, providing personalized guidance based on user modeling. +- [TODO Management](https://docs.openhands.dev/sdk/guides/github-workflows/todo-management.md): Implement TODOs using OpenHands Agent +- [Tool System & MCP](https://docs.openhands.dev/sdk/arch/tool-system.md): High-level architecture of the action-observation tool framework +- [Workspace](https://docs.openhands.dev/sdk/arch/workspace.md): High-level architecture of the execution environment abstraction + +## OpenHands CLI + +- [Command Reference](https://docs.openhands.dev/openhands/usage/cli/command-reference.md): Complete reference for all OpenHands CLI commands and options +- [Critic (Experimental)](https://docs.openhands.dev/openhands/usage/cli/critic.md): Automatic task success prediction for OpenHands LLM Provider users +- [GUI Server](https://docs.openhands.dev/openhands/usage/cli/gui-server.md): Launch the full OpenHands web GUI using Docker +- [Headless Mode](https://docs.openhands.dev/openhands/usage/cli/headless.md): Run OpenHands without UI for scripting, automation, and CI/CD pipelines +- [IDE Integration Overview](https://docs.openhands.dev/openhands/usage/cli/ide/overview.md): Use OpenHands directly in your favorite code editor through the Agent Client Protocol +- [Installation](https://docs.openhands.dev/openhands/usage/cli/installation.md): Install the OpenHands CLI on your system +- [JetBrains IDEs](https://docs.openhands.dev/openhands/usage/cli/ide/jetbrains.md): Configure OpenHands with IntelliJ IDEA, PyCharm, WebStorm, and other JetBrains IDEs +- [MCP Servers](https://docs.openhands.dev/openhands/usage/cli/mcp-servers.md): Manage Model Context Protocol servers to extend OpenHands capabilities +- [OpenHands Cloud](https://docs.openhands.dev/openhands/usage/cli/cloud.md): Create and manage OpenHands Cloud conversations from the CLI +- [Quick Start](https://docs.openhands.dev/openhands/usage/cli/quick-start.md): Get started with OpenHands CLI in minutes +- [Resume Conversations](https://docs.openhands.dev/openhands/usage/cli/resume.md): How to resume previous conversations in the OpenHands CLI +- [Terminal (CLI)](https://docs.openhands.dev/openhands/usage/cli/terminal.md): Use OpenHands interactively in your terminal with the command-line interface +- [Toad Terminal](https://docs.openhands.dev/openhands/usage/cli/ide/toad.md): Use OpenHands with the Toad universal terminal interface for AI agents +- [VS Code](https://docs.openhands.dev/openhands/usage/cli/ide/vscode.md): Use OpenHands in Visual Studio Code with the VSCode ACP community extension +- [Web Interface](https://docs.openhands.dev/openhands/usage/cli/web-interface.md): Access the OpenHands CLI through your web browser +- [Zed IDE](https://docs.openhands.dev/openhands/usage/cli/ide/zed.md): Configure OpenHands with the Zed code editor through the Agent Client Protocol + +## OpenHands Web App Server + +- [About OpenHands](https://docs.openhands.dev/openhands/usage/about.md) +- [API Keys Settings](https://docs.openhands.dev/openhands/usage/settings/api-keys-settings.md): View your OpenHands LLM key and create API keys to work with OpenHands programmatically. +- [Application Settings](https://docs.openhands.dev/openhands/usage/settings/application-settings.md): Configure application-level settings for OpenHands. +- [Automated Code Review](https://docs.openhands.dev/openhands/usage/use-cases/code-review.md): Set up automated PR reviews using OpenHands and the Software Agent SDK +- [Azure](https://docs.openhands.dev/openhands/usage/llms/azure-llms.md): OpenHands uses LiteLLM to make calls to Azure's chat models. You can find their documentation on using Azure as a provider [here](https://docs.litellm.ai/docs/providers/azure). +- [Backend Architecture](https://docs.openhands.dev/openhands/usage/architecture/backend.md) +- [COBOL Modernization](https://docs.openhands.dev/openhands/usage/use-cases/cobol-modernization.md): Modernizing legacy COBOL systems with OpenHands +- [Configuration Options](https://docs.openhands.dev/openhands/usage/advanced/configuration-options.md): How to configure OpenHands V1 (Web UI, env vars, and sandbox settings). +- [Configure](https://docs.openhands.dev/openhands/usage/run-openhands/gui-mode.md): High level overview of configuring the OpenHands Web interface. +- [Custom LLM Configurations](https://docs.openhands.dev/openhands/usage/llms/custom-llm-configs.md): OpenHands supports defining multiple named LLM configurations in your `config.toml` file. This feature allows you to use different LLM configurations for different purposes, such as using a cheaper model for tasks that don't require high-quality responses, or using different models with different parameters for specific agents. +- [Custom Sandbox](https://docs.openhands.dev/openhands/usage/advanced/custom-sandbox-guide.md): This guide is for users that would like to use their own custom Docker image for the runtime. +- [Debugging](https://docs.openhands.dev/openhands/usage/developers/debugging.md) +- [Dependency Upgrades](https://docs.openhands.dev/openhands/usage/use-cases/dependency-upgrades.md): Automating dependency updates and upgrades with OpenHands +- [Development Overview](https://docs.openhands.dev/openhands/usage/developers/development-overview.md): This guide provides an overview of the key documentation resources available in the OpenHands repository. Whether you're looking to contribute, understand the architecture, or work on specific components, these resources will help you navigate the codebase effectively. +- [Docker Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/docker.md): The recommended sandbox provider for running OpenHands locally. +- [Environment Variables Reference](https://docs.openhands.dev/openhands/usage/environment-variables.md): Complete reference of all environment variables supported by OpenHands +- [Evaluation Harness](https://docs.openhands.dev/openhands/usage/developers/evaluation-harness.md) +- [Good vs. Bad Instructions](https://docs.openhands.dev/openhands/usage/essential-guidelines/good-vs-bad-instructions.md): Learn how to write effective instructions for OpenHands +- [Google Gemini/Vertex](https://docs.openhands.dev/openhands/usage/llms/google-llms.md): OpenHands uses LiteLLM to make calls to Google's chat models. You can find their documentation on using Google as a provider -> [Gemini - Google AI Studio](https://docs.litellm.ai/docs/providers/gemini), [VertexAI - Google Cloud Platform](https://docs.litellm.ai/docs/providers/vertex) +- [Groq](https://docs.openhands.dev/openhands/usage/llms/groq.md): OpenHands uses LiteLLM to make calls to chat models on Groq. You can find their documentation on using Groq as a provider [here](https://docs.litellm.ai/docs/providers/groq). +- [Incident Triage](https://docs.openhands.dev/openhands/usage/use-cases/incident-triage.md): Using OpenHands to investigate and resolve production incidents +- [Integrations Settings](https://docs.openhands.dev/openhands/usage/settings/integrations-settings.md): How to setup and modify the various integrations in OpenHands. +- [Key Features](https://docs.openhands.dev/openhands/usage/key-features.md) +- [Language Model (LLM) Settings](https://docs.openhands.dev/openhands/usage/settings/llm-settings.md): This page goes over how to set the LLM to use in OpenHands. As well as some additional LLM settings. +- [LiteLLM Proxy](https://docs.openhands.dev/openhands/usage/llms/litellm-proxy.md): OpenHands supports using the [LiteLLM proxy](https://docs.litellm.ai/docs/proxy/quick_start) to access various LLM providers. +- [Local LLMs](https://docs.openhands.dev/openhands/usage/llms/local-llms.md): When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience. +- [Main Agent and Capabilities](https://docs.openhands.dev/openhands/usage/agents.md) +- [Model Context Protocol (MCP)](https://docs.openhands.dev/openhands/usage/settings/mcp-settings.md): This page outlines how to configure and use the Model Context Protocol (MCP) in OpenHands, allowing you +- [Moonshot AI](https://docs.openhands.dev/openhands/usage/llms/moonshot.md): How to use Moonshot AI models with OpenHands +- [OpenAI](https://docs.openhands.dev/openhands/usage/llms/openai-llms.md): OpenHands uses LiteLLM to make calls to OpenAI's chat models. You can find their documentation on using OpenAI as a provider [here](https://docs.litellm.ai/docs/providers/openai). +- [OpenHands](https://docs.openhands.dev/openhands/usage/llms/openhands-llms.md): OpenHands LLM provider with access to state-of-the-art (SOTA) agentic coding models. +- [OpenHands GitHub Action](https://docs.openhands.dev/openhands/usage/run-openhands/github-action.md): This guide explains how to use the OpenHands GitHub Action in your own projects. +- [OpenHands in Your SDLC](https://docs.openhands.dev/openhands/usage/essential-guidelines/sdlc-integration.md): How OpenHands fits into your software development lifecycle +- [OpenRouter](https://docs.openhands.dev/openhands/usage/llms/openrouter.md): OpenHands uses LiteLLM to make calls to chat models on OpenRouter. You can find their documentation on using OpenRouter as a provider [here](https://docs.litellm.ai/docs/providers/openrouter). +- [Overview](https://docs.openhands.dev/openhands/usage/llms/llms.md): OpenHands can connect to any LLM supported by LiteLLM. However, it requires a powerful model to work. +- [Overview](https://docs.openhands.dev/openhands/usage/sandboxes/overview.md): Where OpenHands runs code in V1: Docker sandbox, Process, or Remote. +- [Process Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/process.md): Run the agent server as a local process without container isolation. +- [Prompting Best Practices](https://docs.openhands.dev/openhands/usage/tips/prompting-best-practices.md): When working with OpenHands AI software developer, providing clear and effective prompts is key to getting accurate and useful responses. This guide outlines best practices for crafting effective prompts. +- [Remote Sandbox](https://docs.openhands.dev/openhands/usage/sandboxes/remote.md): Run conversations in a remote sandbox environment. +- [Repository Customization](https://docs.openhands.dev/openhands/usage/customization/repository.md): You can customize how OpenHands interacts with your repository by creating a `.openhands` directory at the root level. +- [REST API (V1)](https://docs.openhands.dev/openhands/usage/api/v1.md): Overview of the current V1 REST endpoints used by the Web app. +- [Runtime Architecture](https://docs.openhands.dev/openhands/usage/architecture/runtime.md) +- [Search Engine Setup](https://docs.openhands.dev/openhands/usage/advanced/search-engine-setup.md): Configure OpenHands to use Tavily as a search engine. +- [Secrets Management](https://docs.openhands.dev/openhands/usage/settings/secrets-settings.md): How to manage secrets in OpenHands. +- [Setup](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup.md): Getting started with running OpenHands on your own. +- [Spark Migrations](https://docs.openhands.dev/openhands/usage/use-cases/spark-migrations.md): Migrating Apache Spark applications with OpenHands +- [Troubleshooting](https://docs.openhands.dev/openhands/usage/troubleshooting/troubleshooting.md) +- [Tutorial Library](https://docs.openhands.dev/openhands/usage/get-started/tutorials.md): Centralized hub for OpenHands tutorials and examples +- [Vulnerability Remediation](https://docs.openhands.dev/openhands/usage/use-cases/vulnerability-remediation.md): Using OpenHands to identify and fix security vulnerabilities in your codebase +- [WebSocket Connection](https://docs.openhands.dev/openhands/usage/developers/websocket-connection.md) +- [When to Use OpenHands](https://docs.openhands.dev/openhands/usage/essential-guidelines/when-to-use-openhands.md): Guidance on when OpenHands is the right tool for your task +- [Windows Without WSL](https://docs.openhands.dev/openhands/usage/windows-without-wsl.md): Running OpenHands GUI on Windows without using WSL or Docker + +## OpenHands Cloud + +- [Bitbucket Integration](https://docs.openhands.dev/openhands/usage/cloud/bitbucket-installation.md): This guide walks you through the process of installing OpenHands Cloud for your Bitbucket repositories. Once +- [Cloud API](https://docs.openhands.dev/openhands/usage/cloud/cloud-api.md): OpenHands Cloud provides a REST API that allows you to programmatically interact with OpenHands. +- [Cloud UI](https://docs.openhands.dev/openhands/usage/cloud/cloud-ui.md): The Cloud UI provides a web interface for interacting with OpenHands. This page provides references on +- [Getting Started](https://docs.openhands.dev/openhands/usage/cloud/openhands-cloud.md): Getting started with OpenHands Cloud. +- [GitHub Integration](https://docs.openhands.dev/openhands/usage/cloud/github-installation.md): This guide walks you through the process of installing OpenHands Cloud for your GitHub repositories. Once +- [GitLab Integration](https://docs.openhands.dev/openhands/usage/cloud/gitlab-installation.md) +- [Jira Cloud Integration](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-integration.md): Complete guide for setting up Jira Cloud integration with OpenHands Cloud, including service account creation, API token generation, webhook configuration, and workspace integration setup. +- [Jira Data Center Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/jira-dc-integration.md): Complete guide for setting up Jira Data Center integration with OpenHands Cloud, including service account creation, personal access token generation, webhook configuration, and workspace integration setup. +- [Linear Integration (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/linear-integration.md): Complete guide for setting up Linear integration with OpenHands Cloud, including service account creation, API key generation, webhook configuration, and workspace integration setup. +- [Project Management Tool Integrations (Coming soon...)](https://docs.openhands.dev/openhands/usage/cloud/project-management/overview.md): Overview of OpenHands Cloud integrations with project management platforms including Jira Cloud, Jira Data Center, and Linear. Learn about setup requirements, usage methods, and troubleshooting. +- [Slack Integration](https://docs.openhands.dev/openhands/usage/cloud/slack-installation.md): This guide walks you through installing the OpenHands Slack app. + +## OpenHands Overview + +- [Community](https://docs.openhands.dev/overview/community.md): Learn about the OpenHands community, mission, and values +- [Contributing](https://docs.openhands.dev/overview/contributing.md): Join us in building OpenHands and the future of AI. Learn how to contribute to make a meaningful impact. +- [FAQs](https://docs.openhands.dev/overview/faqs.md): Frequently asked questions about OpenHands. +- [First Projects](https://docs.openhands.dev/overview/first-projects.md): So you've [run OpenHands](/overview/quickstart). Now what? +- [General Skills](https://docs.openhands.dev/overview/skills/repo.md): General guidelines for OpenHands to work more effectively with the repository. +- [Global Skills](https://docs.openhands.dev/overview/skills/public.md): Global skills are [keyword-triggered skills](/overview/skills/keyword) that apply to all OpenHands users. The official global skill registry is maintained at [github.com/OpenHands/extensions](https://github.com/OpenHands/extensions). +- [Introduction](https://docs.openhands.dev/overview/introduction.md): Welcome to OpenHands, a community focused on AI-driven development +- [Keyword-Triggered Skills](https://docs.openhands.dev/overview/skills/keyword.md): Keyword-triggered skills provide OpenHands with specific instructions that are activated when certain keywords appear in the prompt. This is useful for tailoring behavior based on particular tools, languages, or frameworks. +- [Model Context Protocol (MCP)](https://docs.openhands.dev/overview/model-context-protocol.md): Model Context Protocol support across OpenHands platforms +- [Organization and User Skills](https://docs.openhands.dev/overview/skills/org.md): Organizations and users can define skills that apply to all repositories belonging to the organization or user. +- [Overview](https://docs.openhands.dev/overview/skills.md): Skills are specialized prompts that enhance OpenHands with domain-specific knowledge, expert guidance, and automated task handling. +- [Quick Start](https://docs.openhands.dev/overview/quickstart.md): Choose how you want to run OpenHands diff --git a/scripts/generate-llms-files.py b/scripts/generate-llms-files.py new file mode 100755 index 00000000..543456af --- /dev/null +++ b/scripts/generate-llms-files.py @@ -0,0 +1,302 @@ +#!/usr/bin/env python3 + +"""Generate custom `llms.txt` + `llms-full.txt` for the OpenHands docs site. + +Why this exists +-------------- +Mintlify automatically generates and hosts `/llms.txt` and `/llms-full.txt` for +Mintlify-backed documentation sites. + +For OpenHands, we want those files to provide **V1-only** context to LLMs while we +still keep some legacy V0 pages available for humans. In particular, we want to +exclude: + +- The legacy docs subtree under `openhands/usage/v0/` +- Any page whose filename starts with `V0*` + +Mintlify supports overriding the auto-generated files by committing `llms.txt` +(and/or `llms-full.txt`) to the repository root. + +References: +- Mintlify docs: https://www.mintlify.com/docs/ai/llmstxt +- llms.txt proposal: https://llmstxt.org/ + +How to use +---------- +Run from the repository root (this repo's `docs/` directory): + + ./scripts/generate-llms-files.py + +This will rewrite `./llms.txt` and `./llms-full.txt`. + +Design notes +------------ +- We only parse `title` and `description` from MDX frontmatter. +- We intentionally group OpenHands pages into sections that clearly distinguish: + - OpenHands CLI + - OpenHands Web App Server (incl. "Local GUI") + - OpenHands Cloud + - OpenHands Software Agent SDK + +""" + + +from __future__ import annotations + +import re +from dataclasses import dataclass +from pathlib import Path + +ROOT = Path(__file__).resolve().parents[1] +BASE_URL = "https://docs.openhands.dev" + +EXCLUDED_DIRS = {".git", ".github", ".agents", "tests", "openapi", "logo"} + + +@dataclass(frozen=True) +class DocPage: + rel_path: Path + route: str + title: str + description: str | None + body: str + + +_FRONTMATTER_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL) + + +def _strip_quotes(val: str) -> str: + val = val.strip() + if (val.startswith('"') and val.endswith('"')) or ( + val.startswith("'") and val.endswith("'") + ): + return val[1:-1] + return val + + +def parse_frontmatter(text: str) -> tuple[dict[str, str], str]: + m = _FRONTMATTER_RE.match(text) + if not m: + return {}, text + + fm_text = m.group(1) + body = text[m.end() :] + + fm: dict[str, str] = {} + for line in fm_text.splitlines(): + line = line.strip() + if not line or line.startswith("#"): + continue + if ":" not in line: + continue + k, v = line.split(":", 1) + k = k.strip() + v = v.strip() + if not k: + continue + fm[k] = _strip_quotes(v) + + return fm, body + + +def rel_to_route(rel_path: Path) -> str: + p = rel_path.as_posix() + if p.endswith(".mdx"): + p = p[: -len(".mdx")] + + if p.endswith("/index"): + p = p[: -len("/index")] + + return "/" + p.lstrip("/") + + +def is_v0_page(rel_path: Path) -> bool: + s = rel_path.as_posix() + if "/openhands/usage/v0/" in s: + return True + if rel_path.name.startswith("V0"): + return True + return False + + +def iter_doc_pages() -> list[DocPage]: + pages: list[DocPage] = [] + + for mdx_path in sorted(ROOT.rglob("*.mdx")): + rel_path = mdx_path.relative_to(ROOT) + + if any(part in EXCLUDED_DIRS for part in rel_path.parts): + continue + if is_v0_page(rel_path): + continue + + raw = mdx_path.read_text(encoding="utf-8") + fm, body = parse_frontmatter(raw) + + title = fm.get("title") + if not title: + continue + + description = fm.get("description") + route = rel_to_route(rel_path) + + pages.append( + DocPage( + rel_path=rel_path, + route=route, + title=title, + description=description, + body=body.strip(), + ) + ) + + return pages + + +LLMS_SECTION_ORDER = [ + "OpenHands Software Agent SDK", + "OpenHands CLI", + "OpenHands Web App Server", + "OpenHands Cloud", + "OpenHands Overview", + "Other", +] + + +def section_name(page: DocPage) -> str: + """Map a page to an `llms.txt` section. + + This is deliberately opinionated. The goal is to make it obvious to an LLM + what content is about: + + - the OpenHands CLI + - the OpenHands Web App + server (what the nav historically called "Local GUI") + - OpenHands Cloud + - the OpenHands Software Agent SDK + + """ + + route = page.route + + if route.startswith("/sdk"): + return "OpenHands Software Agent SDK" + + if route.startswith("/openhands/usage/cli"): + return "OpenHands CLI" + + if route.startswith("/openhands/usage/cloud"): + return "OpenHands Cloud" + + if route.startswith("/openhands/usage"): + return "OpenHands Web App Server" + + if route.startswith("/overview"): + return "OpenHands Overview" + + return "Other" + + +def _section_sort_key(section: str) -> tuple[int, str]: + """Stable ordering for llms sections, with a sane fallback.""" + + try: + return (LLMS_SECTION_ORDER.index(section), "") + except ValueError: + return (len(LLMS_SECTION_ORDER), section.lower()) + + +def build_llms_txt(pages: list[DocPage]) -> str: + """Generate `llms.txt`. + + The format follows the llms.txt proposal: + - One H1 + - A short blockquote summary + - Optional non-heading text + - H2 sections containing bullet lists of links + + """ + + grouped: dict[str, list[DocPage]] = {} + for page in pages: + grouped.setdefault(section_name(page), []).append(page) + + for section_pages in grouped.values(): + section_pages.sort(key=lambda p: (p.title.lower(), p.route)) + + lines: list[str] = [ + "# OpenHands Docs", + "", + "> LLM-friendly index of OpenHands documentation (V1). Legacy V0 docs pages are intentionally excluded.", + "", + "The sections below intentionally separate OpenHands product documentation (Web App Server / Cloud / CLI)", + "from the OpenHands Software Agent SDK.", + "", + ] + + for section in sorted(grouped.keys(), key=_section_sort_key): + lines.append(f"## {section}") + lines.append("") + + for page in grouped[section]: + url = f"{BASE_URL}{page.route}.md" + line = f"- [{page.title}]({url})" + if page.description: + line += f": {page.description}" + lines.append(line) + + lines.append("") + + return "\n".join(lines).rstrip() + "\n" + + +def build_llms_full_txt(pages: list[DocPage]) -> str: + """Generate `llms-full.txt`. + + This is meant to be copy/pasteable context for AI tools. + + Unlike `llms.txt`, there is no strict spec for `llms-full.txt`, but we keep a + single H1, then use H2/H3 headings to make the document navigable. + + """ + + grouped: dict[str, list[DocPage]] = {} + for page in pages: + grouped.setdefault(section_name(page), []).append(page) + + for section_pages in grouped.values(): + section_pages.sort(key=lambda p: p.route) + + lines: list[str] = [ + "# OpenHands Docs", + "", + "> Consolidated documentation context for LLMs (V1-only). Legacy V0 docs pages are intentionally excluded.", + "", + ] + + for section in sorted(grouped.keys(), key=_section_sort_key): + lines.append(f"## {section}") + lines.append("") + + for page in grouped[section]: + lines.append(f"### {page.title}") + lines.append(f"Source: {BASE_URL}{page.route}.md") + lines.append("") + if page.body: + lines.append(page.body) + lines.append("") + + return "\n".join(lines).rstrip() + "\n" + + +def main() -> None: + pages = iter_doc_pages() + + llms_txt = build_llms_txt(pages) + llms_full = build_llms_full_txt(pages) + + (ROOT / "llms.txt").write_text(llms_txt, encoding="utf-8") + (ROOT / "llms-full.txt").write_text(llms_full, encoding="utf-8") + + +if __name__ == "__main__": + main()