Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion documents/docs/mcp/action.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ UFO² provides several built-in action servers for different automation scenario

| Server | Platform | Description | Documentation |
|--------|----------|-------------|---------------|
| **[CommandLineExecutor](servers/command_line_executor.md)** | Windows | Execute shell commands and launch applications | [Full Details →](servers/command_line_executor.md) |
| **[CommandLineExecutor](servers/command_line_executor.md)** | Windows | Launch applications via direct execution (no shell) | [Full Details →](servers/command_line_executor.md) |
| **[BashExecutor](servers/bash_executor.md)** | Linux | Execute Linux commands via HTTP server | [Full Details →](servers/bash_executor.md) |

### Office Automation Servers (COM API)
Expand Down
6 changes: 3 additions & 3 deletions documents/docs/mcp/local_servers.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ UFO² includes several built-in local MCP servers organized by functionality. Th
| **UICollector** | Data Collection | Windows UI observation | **[→ Full Docs](servers/ui_collector.md)** |
| **HostUIExecutor** | Action | Desktop-level UI automation | **[→ Full Docs](servers/host_ui_executor.md)** |
| **AppUIExecutor** | Action | Application-level UI automation | **[→ Full Docs](servers/app_ui_executor.md)** |
| **CommandLineExecutor** | Action | Shell command execution | **[→ Full Docs](servers/command_line_executor.md)** |
| **CommandLineExecutor** | Action | Application launching (no shell) | **[→ Full Docs](servers/command_line_executor.md)** |
| **WordCOMExecutor** | Action | Microsoft Word COM API | **[→ Full Docs](servers/word_com_executor.md)** |
| **ExcelCOMExecutor** | Action | Microsoft Excel COM API | **[→ Full Docs](servers/excel_com_executor.md)** |
| **PowerPointCOMExecutor** | Action | Microsoft PowerPoint COM API | **[→ Full Docs](servers/ppt_com_executor.md)** |
Expand Down Expand Up @@ -58,10 +58,10 @@ UFO² includes several built-in local MCP servers organized by functionality. Th

### CommandLineExecutor

**Type**: Action (LLM-selectable, shell execution)
**Type**: Action (LLM-selectable, application launching)
**Platform**: Cross-platform
**Agent**: HostAgent, AppAgent
**Tool**: `run_shell` - Execute shell commands
**Tool**: `run_shell` - Launch applications (executes with `shell=False` to prevent shell injection)

**[→ See complete CommandLineExecutor documentation](servers/command_line_executor.md)** for security guidelines and examples.

Expand Down
82 changes: 37 additions & 45 deletions documents/docs/mcp/servers/command_line_executor.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,21 +46,21 @@ await computer.run_actions([
)
])

# Launch application with arguments
# Launch PowerPoint with a file
await computer.run_actions([
MCPToolCall(
tool_key="action::run_shell",
tool_name="run_shell",
parameters={"bash_command": "python script.py --arg value"}
parameters={"bash_command": "powerpnt \"Desktop\\test.pptx\""}
)
])

# Create directory (Windows)
# Launch File Explorer
await computer.run_actions([
MCPToolCall(
tool_key="action::run_shell",
tool_name="run_shell",
parameters={"bash_command": "mkdir C:\\temp\\newfolder"}
parameters={"bash_command": "explorer.exe"}
)
])
```
Expand All @@ -81,19 +81,22 @@ ToolError("Failed to launch application: {error_details}")

#### Implementation Details

- Uses `subprocess.Popen` with `shell=True`
- Commands are parsed into an argument list via `shlex.split()`
- Uses `subprocess.Popen` with `shell=False` to prevent shell injection
- Shell metacharacters (`|`, `&`, `;`, `` ` ``, `$()`, etc.) are **not** interpreted
- Shell built-in commands (e.g., `start`, `dir`, `cd`) are **not** available — only executable binaries can be launched
- Waits 5 seconds after launch for application to start
- Non-blocking: Returns immediately after launch

!!!danger "Security Warning"
**Arbitrary command execution risk!** Always validate commands before execution.
!!!info "Security Note"
Commands are executed **without a shell** (`shell=False`). This means:

Dangerous examples:
- `rm -rf /` (Linux)
- `del /F /S /Q C:\*` (Windows)
- `shutdown /s /t 0`
- Shell injection via metacharacters is not possible
- Only direct executable binaries can be invoked
- Shell built-ins (`start`, `dir`, `cd`, `copy`, etc.) will **not** work
- Command chaining (`&&`, `||`, `|`, `;`) has no effect

**Best Practice**: Implement command whitelist or validation.
**Best Practice**: Use an allow-list to restrict which executables may be launched.

## Configuration

Expand All @@ -119,17 +122,22 @@ AppAgent:

### 1. Validate Commands

Since `run_shell` executes commands with `shell=False`, shell injection is already mitigated. However, it is still recommended to restrict which executables can be launched:

```python
def safe_run_shell(command: str):
"""Whitelist-based command validation"""
"""Allow-list-based command validation"""
import shlex
allowed_commands = [
"notepad.exe",
"calc.exe",
"mspaint.exe",
"code", # VS Code
"notepad.exe", "notepad",
"calc.exe", "calc",
"mspaint.exe", "mspaint",
"code", "code.exe",
"explorer", "explorer.exe",
]

cmd_base = command.split()[0]
tokens = shlex.split(command)
cmd_base = tokens[0].lower()
if cmd_base not in allowed_commands:
raise ValueError(f"Command not allowed: {cmd_base}")

Expand Down Expand Up @@ -235,61 +243,45 @@ await computer.run_actions([
)
])

# Launch browser with URL
# Launch browser
await computer.run_actions([
MCPToolCall(
tool_key="action::run_shell",
parameters={"bash_command": "start https://www.example.com"}
parameters={"bash_command": "msedge.exe https://www.example.com"}
)
])
```

### 2. File Operations
### 2. Open Files with Applications

```python
# Create directory
# Open a document in Word
await computer.run_actions([
MCPToolCall(
tool_key="action::run_shell",
parameters={"bash_command": "mkdir C:\\temp\\workspace"}
parameters={"bash_command": "winword.exe report.docx"}
)
])

# Copy file
# Open a spreadsheet in Excel
await computer.run_actions([
MCPToolCall(
tool_key="action::run_shell",
parameters={"bash_command": "copy source.txt dest.txt"}
parameters={"bash_command": "excel.exe data.xlsx"}
)
])
```

### 3. Script Execution

```python
# Run Python script
await computer.run_actions([
MCPToolCall(
tool_key="action::run_shell",
parameters={"bash_command": "python automation_script.py --mode batch"}
)
])

# Run PowerShell script
await computer.run_actions([
MCPToolCall(
tool_key="action::run_shell",
parameters={"bash_command": "powershell -File script.ps1"}
)
])
```
!!!note
Shell built-in commands like `start`, `copy`, `mkdir`, and `dir` are **not available** because commands run without a shell. Only direct executable binaries (`.exe`) can be invoked.

## Limitations

- **No output capture**: Command output (stdout/stderr) is not returned
- **No exit code**: Cannot determine if command succeeded
- **Async execution**: No way to know when command completes
- **Security risk**: Arbitrary command execution
- **No shell built-ins**: Commands like `start`, `dir`, `copy`, `cd` are not available (runs with `shell=False`)
- **No shell features**: Piping (`|`), redirection (`>`), chaining (`&&`) are not supported

**Tip:** For Linux systems with output capture and better control, use **BashExecutor** server instead.

Expand Down
2 changes: 1 addition & 1 deletion galaxy/galaxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ def find_free_port(start_port=8000, max_attempts=10):
# Configure and run uvicorn server
config = uvicorn.Config(
app,
host="0.0.0.0",
host="127.0.0.1",
port=port,
log_level="info",
access_log=False,
Expand Down
25 changes: 25 additions & 0 deletions galaxy/webui/dependencies.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@
"""

import logging
import secrets
from typing import TYPE_CHECKING, Optional

from fastapi import Header, HTTPException

from galaxy.webui.websocket_observer import WebSocketObserver

if TYPE_CHECKING:
Expand All @@ -35,6 +38,9 @@ def __init__(self) -> None:
"""Initialize the application state with default values."""
self.logger: logging.Logger = logging.getLogger(__name__)

# API key for authenticating HTTP and WebSocket requests
self._api_key: Optional[str] = None

# WebSocket observer for broadcasting events to clients
self._websocket_observer: Optional[WebSocketObserver] = None

Expand All @@ -45,6 +51,16 @@ def __init__(self) -> None:
# Counter for generating unique task names in Web UI mode
self._request_counter: int = 0

@property
def api_key(self) -> Optional[str]:
"""Get the API key."""
return self._api_key

@api_key.setter
def api_key(self, key: str) -> None:
"""Set the API key."""
self._api_key = key

@property
def websocket_observer(self) -> Optional[WebSocketObserver]:
"""
Expand Down Expand Up @@ -145,3 +161,12 @@ def get_app_state() -> AppState:
:return: Application state instance
"""
return app_state


async def verify_api_key(
x_api_key: str = Header(..., alias="X-API-Key"),
) -> None:
"""FastAPI dependency that validates the X-API-Key header."""
key = app_state.api_key
if not key or not secrets.compare_digest(x_api_key, key):
raise HTTPException(status_code=401, detail="Invalid API key")
3 changes: 2 additions & 1 deletion galaxy/webui/frontend/src/services/websocket.ts
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ export class WebSocketClient {
if (!url) {
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
const host = window.location.host;
this.url = `${protocol}//${host}/ws`;
const apiKey = (window as any).__GALAXY_API_KEY__ || '';
this.url = `${protocol}//${host}/ws?token=${encodeURIComponent(apiKey)}`;
} else {
this.url = url;
}
Expand Down
6 changes: 3 additions & 3 deletions galaxy/webui/routers/devices.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@
import logging
from typing import Dict, Any

from fastapi import APIRouter, HTTPException
from fastapi import APIRouter, Depends, HTTPException

from galaxy.webui.dependencies import get_app_state
from galaxy.webui.dependencies import get_app_state, verify_api_key
from galaxy.webui.models.requests import DeviceAddRequest
from galaxy.webui.models.responses import DeviceAddResponse
from galaxy.webui.services import ConfigService, DeviceService
Expand All @@ -21,7 +21,7 @@
logger = logging.getLogger(__name__)


@router.post("/devices", response_model=DeviceAddResponse)
@router.post("/devices", response_model=DeviceAddResponse, dependencies=[Depends(verify_api_key)])
async def add_device(device: DeviceAddRequest) -> Dict[str, Any]:
"""
Add a new device to the Galaxy configuration.
Expand Down
6 changes: 3 additions & 3 deletions galaxy/webui/routers/health.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@

from typing import Dict, Any

from fastapi import APIRouter
from fastapi import APIRouter, Depends

from galaxy.webui.dependencies import get_app_state
from galaxy.webui.dependencies import get_app_state, verify_api_key
from galaxy.webui.models.responses import HealthResponse

router = APIRouter(tags=["health"])


@router.get("/health", response_model=HealthResponse)
@router.get("/health", response_model=HealthResponse, dependencies=[Depends(verify_api_key)])
async def health_check() -> Dict[str, Any]:
"""
Health check endpoint.
Expand Down
43 changes: 36 additions & 7 deletions galaxy/webui/routers/websocket.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,35 +9,64 @@
"""

import logging
import secrets

from fastapi import APIRouter, WebSocket, WebSocketDisconnect
from fastapi import APIRouter, Query, WebSocket, WebSocketDisconnect
from starlette.websockets import WebSocketState

from galaxy.webui.dependencies import get_app_state
from galaxy.webui.handlers import WebSocketMessageHandler

router = APIRouter(tags=["websocket"])
logger = logging.getLogger(__name__)

WS_1008_POLICY_VIOLATION = 1008


@router.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket) -> None:
async def websocket_endpoint(
websocket: WebSocket,
token: str = Query(default=None),
) -> None:
"""
WebSocket endpoint for real-time event streaming.

Requires a valid ``token`` query parameter that matches the server API key.

This endpoint establishes a persistent connection with clients to:
- Send welcome messages and initial state (device snapshots)
- Receive and process client messages (requests, commands)
- Broadcast Galaxy events to all connected clients in real-time

The connection lifecycle:
1. Accept the WebSocket connection
2. Register with the WebSocket observer for event broadcasting
3. Send welcome message and initial device snapshot
4. Process incoming messages until disconnection
5. Cleanup and remove from observer on disconnect
1. Validate the token query parameter
2. Accept the WebSocket connection
3. Register with the WebSocket observer for event broadcasting
4. Send welcome message and initial device snapshot
5. Process incoming messages until disconnection
6. Cleanup and remove from observer on disconnect

:param websocket: The WebSocket connection from the client
:param token: API key passed as a query parameter
"""
# Validate token before accepting the connection
app_state = get_app_state()
expected_key = app_state.api_key
if (
not expected_key
or not token
or not secrets.compare_digest(token, expected_key)
):
await websocket.close(
code=WS_1008_POLICY_VIOLATION,
reason="Invalid or missing token",
)
logger.warning(
"WebSocket connection rejected (invalid token) from %s",
websocket.client,
)
return

await websocket.accept()
logger.info(f"WebSocket connection established from {websocket.client}")

Expand Down
Loading