Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -191,4 +191,3 @@ data/

user_data_dir
.claude/
CLAUDE.md
110 changes: 110 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Workflow Use is a browser automation tool that records browser interactions and replays them as deterministic workflows, with LLM-powered self-healing when steps fail. It's a fork of `browser-use/workflow-use` with local LLM support and improved self-healing.

**Three-tier architecture:** Chrome Extension (recorder) → Python Backend (execution) → React UI (visualization)

## Build & Run Commands

### Python Backend (workflows/)
```bash
cd workflows
uv sync # Install dependencies (Python 3.11+, uses hatchling)
source .venv/bin/activate
playwright install chromium # Browser engine
cp .env.example .env # Configure LLM provider
python cli.py --help # CLI entry point
python cli.py launch-gui # Starts FastAPI (8000) + Vite (5173) + opens browser
```

### Chrome Extension (extension/)
```bash
cd extension
npm install && npm run build # Build to .output/chrome-mv3/
npm run dev # Watch mode
```

### React UI (ui/)
```bash
cd ui
npm install # postinstall auto-generates types from backend OpenAPI
npm run dev # Dev server on port 5173
npm run type-gen-update # Regenerate TS types from running backend
```

### Lint (all components)
```bash
./lint.sh # Runs ruff + ESLint + tsc for all three
# Or individually:
cd workflows && uv run ruff check && uv run ruff format --check
cd ui && npm run lint
cd extension && npm run lint
```

### Tests
```bash
cd workflows && uv run pytest tests/ # Run all tests
uv run pytest tests/test_wait_times.py # Single test file
```

## Code Style

Python: Ruff with **tabs for indentation**, single quotes, 130 char line length. Rules: ASYNC, E, F, FAST, I, PLE (see pyproject.toml `[tool.ruff]`).

## Architecture

### Communication Flow
```
Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Architecture flow in CLAUDE.md is outdated: recording endpoint/service no longer matches actual runtime path.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At CLAUDE.md, line 62:

<comment>Architecture flow in CLAUDE.md is outdated: recording endpoint/service no longer matches actual runtime path.</comment>

<file context>
@@ -0,0 +1,110 @@
+
+### Communication Flow
+```
+Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService
+UI (React :5173) → REST API :8000 → backend/routers.py → WorkflowService
+Workflow execution: Workflow.run() → _execute_step() → controller actions → Playwright
</file context>
Suggested change
Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService
Extension content.ts → background.ts → HTTP POST :8000/api/recorder/event → backend/recorder_router.py
Fix with Cubic

UI (React :5173) → REST API :8000 → backend/routers.py → WorkflowService
Workflow execution: Workflow.run() → _execute_step() → controller actions → Playwright
```

### Key Modules (workflows/workflow_use/)

**`llm/provider.py`** — LLM factory. Reads `.env` for `LLM_PROVIDER` (local/browser_use), `LLM_BASE_URL`, `LLM_MODEL`, `LLM_HEALING_MODEL`. Returns `ChatOpenAI` for local (LM Studio/Ollama) or `ChatBrowserUse` for cloud. Entry point: `get_llm(purpose)`.

**`workflow/service.py`** — `Workflow` class. Main execution orchestrator. Loads from YAML, resolves variables, runs steps sequentially. On failure, delegates to `StepHealer` for vision-based repair. Key params: `enable_self_healing`, `debug` (captures screenshots).

**`healing/step_healer.py`** — Vision-based step repair. On failure: screenshot → LLM diagnosis → corrected selectors → retry. Tracks results in `logs/healing/healing_results.tsv`. Uses snapshot/revert pattern (saves YAML before healing, reverts if fix fails). Exponential backoff between retries.

**`healing/service.py`** — `HealingService`. Generates workflows from prompts by running a browser agent, capturing element mappings, then converting to deterministic steps. Two paths: LLM-based (`create_workflow_definition`) or deterministic (`_create_workflow_deterministically`). Post-processes with pattern-based variable identification.

**`schema/views.py`** — Pydantic models for workflow steps. `SelectorWorkflowSteps` has `target_text` (primary, semantic), `selectorStrategies` (fallback list), and legacy `cssSelector`/`xpath`. Step types: navigation, click, input, select_change, key_press, scroll, extract_page_content, agent.

**`recorder/service.py`** — Spawns Chromium with extension loaded, runs FastAPI on :7331 to receive events from extension. Extension captures clicks/inputs/keypresses with semantic text, XPath, CSS selectors.

**`controller/service.py`** — Maps workflow step types to browser-use actions. `controller/utils.py` has `get_best_element_handle()` with stable selector fallback generation.

**`workflow/element_finder.py`** — Multi-strategy element finding. Priority: semantic text strategies (exact, role, aria_label, placeholder, fuzzy) → XPath JavaScript evaluation. Returns element index or XPath string.

**`mcp/service.py`** — Exposes workflows as MCP tools via FastMCP. Dynamically generates function signatures from workflow input_schema.

**`storage/service.py`** — File-based CRUD. Workflows stored as `.workflow.yaml` in `storage/workflows/`. Metadata index at `storage/metadata.json`.

### Extension (extension/src/)
- `content.ts` — Injected into pages. Captures clicks, inputs, keypresses with semantic label extraction (aria-label, label[for], sibling text, etc.). Uses RRWeb for DOM recording.
- `background.ts` — Service worker. Stores events per tab, converts to semantic format, POSTs to Python server at :7331.
- `sidepanel/` — React UI for recording controls and live event viewer.

### CLI (workflows/cli.py)
95KB typer app. Key commands: `create-workflow`, `run-workflow`, `run-workflow-no-ai`, `run-workflow-csv`, `generate-workflow`, `launch-gui`, `mcp-server`. Module-level `get_llm()` initialization — if LLM server is down, CLI still loads but LLM-dependent commands will fail.

## Fork-Specific Details

- **Origin**: `github.com/Stefz29/workflow-use` (fork), **Upstream**: `github.com/browser-use/workflow-use`
- **LLM Provider**: Defaults to local LM Studio at `localhost:1234` instead of Browser Use cloud API
- **Self-Healing**: Added `StepHealer` with autoresearch-mlx patterns (snapshot/revert, TSV results tracking, exponential backoff, fail-fast checks)
- **Spec**: Full project roadmap in `~/Downloads/workflow-use-fork-instructions.md`

## Ports
| Service | Port |
|---------|------|
| FastAPI backend | 8000 |
| Recording server | 7331 |
| Vite UI dev | 5173 |
| LM Studio (local LLM) | 1234 |
139 changes: 139 additions & 0 deletions docs/PHASE2B_BROWSER_EXECUTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Phase 2b: In-Browser Workflow Execution

## Status: STARTING
**Date:** 2026-03-17

## Problem
Workflow execution via Playwright launches a SEPARATE Chromium browser with no cookies/sessions.
This defeats the purpose for sites requiring auth (LinkedIn Recruiter, etc.) and triggers bot detection.

## Solution
The Chrome extension itself replays workflow steps in the SAME browser where they were recorded.
No Playwright needed. Same cookies, same sessions, same everything.

## Architecture

```
YOUR Chrome (same browser, same cookies, same session)
┌─────────────────────────────────────────────────────────┐
│ background.ts (ExecutionEngine - orchestrator) │
│ - Manages step queue & state machine │
│ - Survives page navigations (service worker) │
│ - Sends one step at a time to content script │
│ - Re-injects content script after navigation │
│ - Captures screenshots for self-healing │
│ │
│ content-executor.ts (NEW - executes steps on page) │
│ - Finds elements: target_text → CSS → XPath │
│ - Executes: click(), fill(), keypress via real DOM │
│ - Reports success/failure back to background │
│ - Highlights element being acted on │
│ │
│ sidepanel (execution progress UI) │
│ - "Run in Browser" button │
│ - Step-by-step progress with status indicators │
└─────────────────────────────────────────────────────────┘
│ HTTP (only when healing needed)
┌─────────────────────────────────────────────────────────┐
│ Python Backend (:8000) │
│ POST /api/ext-execute/heal │
│ - Receives screenshot + failed step │
│ - LLM diagnoses what changed on the page │
│ - Returns corrected selectors │
└─────────────────────────────────────────────────────────┘
```

## Tab/Window Tracking

The user may have multiple browsers open (2 Chrome, Firefox, Safari).
- The extension ONLY sees its own Chrome instance
- During recording, every event captures `tabId` and `windowId`
- At replay: check if original tab exists → use it. Original window? → find matching tab. Otherwise → ask user.
- `windowId` distinguishes between multiple Chrome windows

## Execution Flow

### Step-by-Step:
1. User clicks "Run in Browser" in sidepanel
2. Background resolves target tab (recorded tabId/windowId or active tab)
3. For each step in workflow:
a. **navigation**: `chrome.tabs.update(tabId, {url})` → wait for load → inject executor
b. **click/input/key_press**: send step to content-executor → wait for result
c. **scroll**: send to content-executor → `window.scrollTo()`
4. After each step that might cause navigation:
- Monitor `chrome.tabs.onUpdated` for URL change
- Wait for `status: "complete"` → re-inject executor → `EXECUTOR_READY` handshake
5. On step failure: capture screenshot → send to backend heal endpoint → retry with corrected selectors

### State Machine:
```
IDLE → LOADING → EXECUTING → WAITING_FOR_NAV → HEALING → COMPLETED/FAILED
```

## Content Executor: Element Finding

Priority order (same as recording, but in reverse):
1. **target_text (semantic)**: Scan interactive elements, match by textContent/aria-label/placeholder/label
2. **cssSelector**: `document.querySelector(cssSelector)`
3. **xpath**: `document.evaluate(xpath)`

### Action Execution (Real DOM Events):
- **click**: `element.focus()` → `element.click()` (or full MouseEvent sequence for React/Vue)
- **input**: Native value setter → InputEvent → change event
- **key_press**: KeyboardEvent('keydown') + ('keyup')
- **scroll**: `window.scrollTo(x, y)`

## Navigation Handling (Critical)

When a click causes page navigation:
1. Content script on old page DIES (Chrome destroys it)
2. Background service worker SURVIVES — it monitors `chrome.tabs.onUpdated`
3. New page finishes loading → background re-injects content-executor.ts
4. Content script sends `EXECUTOR_READY` → background sends next step

## Self-Healing Integration

When a step fails:
1. Content-executor reports failure with error to background
2. Background captures screenshot via `chrome.tabs.captureVisibleTab()`
3. Background POSTs screenshot + step context to `POST /api/ext-execute/heal`
4. Backend runs StepHealer LLM diagnosis → returns corrected selectors
5. Background sends corrected step to content-executor for retry
6. Max 3 retries before marking step as failed

## Implementation Order

### Phase 1: Content Executor
- [ ] Create `content-executor.ts` with element finding + step execution
- [ ] Register in wxt.config.ts
- [ ] Message protocol: EXECUTE_STEP / STEP_RESULT / EXECUTOR_READY

### Phase 2: Background ExecutionEngine
- [ ] Add ExecutionEngine class to background.ts
- [ ] Step queue, state machine, tab tracking
- [ ] Navigation detection + content script re-injection
- [ ] Screenshot capture for healing

### Phase 3: Backend Healing API
- [ ] Create ext_execution_router.py
- [ ] Adapt StepHealer to accept pre-captured screenshots
- [ ] Wire up in api.py

### Phase 4: Sidepanel UI
- [ ] Execution progress view
- [ ] "Run in Browser" button in dashboard
- [ ] Tab/window selector

### Phase 5: Edge Cases
- [ ] Dynamic content waits (MutationObserver)
- [ ] iframes
- [ ] New tabs/popups during execution
- [ ] Service worker sleep prevention (MV3 30s idle timeout)

## Key Files
- `extension/src/entrypoints/content.ts` — Has reusable functions: extractSemanticInfo(), getXPath(), getEnhancedCSSSelector()
- `extension/src/entrypoints/background.ts` — Orchestrator to extend
- `extension/src/lib/workflow-types.ts` — Step type definitions
- `workflows/workflow_use/healing/step_healer.py` — Self-healing to adapt
- `workflows/backend/recorder_router.py` — Pattern for new execution router
Loading