browser-use · Stefz29 · Mar 15, 2026 · Mar 15, 2026 · Mar 15, 2026 · Mar 17, 2026
diff --git a/.gitignore b/.gitignore
@@ -191,4 +191,3 @@ data/
 
 user_data_dir
 .claude/
-CLAUDE.md
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,110 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+Workflow Use is a browser automation tool that records browser interactions and replays them as deterministic workflows, with LLM-powered self-healing when steps fail. It's a fork of `browser-use/workflow-use` with local LLM support and improved self-healing.
+
+**Three-tier architecture:** Chrome Extension (recorder) → Python Backend (execution) → React UI (visualization)
+
+## Build & Run Commands
+
+### Python Backend (workflows/)
+```bash
+cd workflows
+uv sync                          # Install dependencies (Python 3.11+, uses hatchling)
+source .venv/bin/activate
+playwright install chromium       # Browser engine
+cp .env.example .env             # Configure LLM provider
+python cli.py --help             # CLI entry point
+python cli.py launch-gui         # Starts FastAPI (8000) + Vite (5173) + opens browser
+```
+
+### Chrome Extension (extension/)
+```bash
+cd extension
+npm install && npm run build     # Build to .output/chrome-mv3/
+npm run dev                      # Watch mode
+```
+
+### React UI (ui/)
+```bash
+cd ui
+npm install                      # postinstall auto-generates types from backend OpenAPI
+npm run dev                      # Dev server on port 5173
+npm run type-gen-update          # Regenerate TS types from running backend
+```
+
+### Lint (all components)
+```bash
+./lint.sh                        # Runs ruff + ESLint + tsc for all three
+# Or individually:
+cd workflows && uv run ruff check && uv run ruff format --check
+cd ui && npm run lint
+cd extension && npm run lint
+```
+
+### Tests
+```bash
+cd workflows && uv run pytest tests/       # Run all tests
+uv run pytest tests/test_wait_times.py     # Single test file
+```
+
+## Code Style
+
+Python: Ruff with **tabs for indentation**, single quotes, 130 char line length. Rules: ASYNC, E, F, FAST, I, PLE (see pyproject.toml `[tool.ruff]`).
+
+## Architecture
+
+### Communication Flow
+```
+Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService
-Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService
+Extension content.ts → background.ts → HTTP POST :8000/api/recorder/event → backend/recorder_router.py
-Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService
+Extension content.ts → background.ts → HTTP POST :8000/api/recorder/event → backend/recorder_router.py
+UI (React :5173) → REST API :8000 → backend/routers.py → WorkflowService
+Workflow execution: Workflow.run() → _execute_step() → controller actions → Playwright
+```
+
+### Key Modules (workflows/workflow_use/)
+
+**`llm/provider.py`** — LLM factory. Reads `.env` for `LLM_PROVIDER` (local/browser_use), `LLM_BASE_URL`, `LLM_MODEL`, `LLM_HEALING_MODEL`. Returns `ChatOpenAI` for local (LM Studio/Ollama) or `ChatBrowserUse` for cloud. Entry point: `get_llm(purpose)`.
+
+**`workflow/service.py`** — `Workflow` class. Main execution orchestrator. Loads from YAML, resolves variables, runs steps sequentially. On failure, delegates to `StepHealer` for vision-based repair. Key params: `enable_self_healing`, `debug` (captures screenshots).
+
+**`healing/step_healer.py`** — Vision-based step repair. On failure: screenshot → LLM diagnosis → corrected selectors → retry. Tracks results in `logs/healing/healing_results.tsv`. Uses snapshot/revert pattern (saves YAML before healing, reverts if fix fails). Exponential backoff between retries.
+
+**`healing/service.py`** — `HealingService`. Generates workflows from prompts by running a browser agent, capturing element mappings, then converting to deterministic steps. Two paths: LLM-based (`create_workflow_definition`) or deterministic (`_create_workflow_deterministically`). Post-processes with pattern-based variable identification.
+
+**`schema/views.py`** — Pydantic models for workflow steps. `SelectorWorkflowSteps` has `target_text` (primary, semantic), `selectorStrategies` (fallback list), and legacy `cssSelector`/`xpath`. Step types: navigation, click, input, select_change, key_press, scroll, extract_page_content, agent.
+
+**`recorder/service.py`** — Spawns Chromium with extension loaded, runs FastAPI on :7331 to receive events from extension. Extension captures clicks/inputs/keypresses with semantic text, XPath, CSS selectors.
+
+**`controller/service.py`** — Maps workflow step types to browser-use actions. `controller/utils.py` has `get_best_element_handle()` with stable selector fallback generation.
+
+**`workflow/element_finder.py`** — Multi-strategy element finding. Priority: semantic text strategies (exact, role, aria_label, placeholder, fuzzy) → XPath JavaScript evaluation. Returns element index or XPath string.
+
+**`mcp/service.py`** — Exposes workflows as MCP tools via FastMCP. Dynamically generates function signatures from workflow input_schema.
+
+**`storage/service.py`** — File-based CRUD. Workflows stored as `.workflow.yaml` in `storage/workflows/`. Metadata index at `storage/metadata.json`.
+
+### Extension (extension/src/)
+- `content.ts` — Injected into pages. Captures clicks, inputs, keypresses with semantic label extraction (aria-label, label[for], sibling text, etc.). Uses RRWeb for DOM recording.
+- `background.ts` — Service worker. Stores events per tab, converts to semantic format, POSTs to Python server at :7331.
+- `sidepanel/` — React UI for recording controls and live event viewer.
+
+### CLI (workflows/cli.py)
+95KB typer app. Key commands: `create-workflow`, `run-workflow`, `run-workflow-no-ai`, `run-workflow-csv`, `generate-workflow`, `launch-gui`, `mcp-server`. Module-level `get_llm()` initialization — if LLM server is down, CLI still loads but LLM-dependent commands will fail.
+
+## Fork-Specific Details
+
+- **Origin**: `github.com/Stefz29/workflow-use` (fork), **Upstream**: `github.com/browser-use/workflow-use`
+- **LLM Provider**: Defaults to local LM Studio at `localhost:1234` instead of Browser Use cloud API
+- **Self-Healing**: Added `StepHealer` with autoresearch-mlx patterns (snapshot/revert, TSV results tracking, exponential backoff, fail-fast checks)
+- **Spec**: Full project roadmap in `~/Downloads/workflow-use-fork-instructions.md`
+
+## Ports
+| Service | Port |
+|---------|------|
+| FastAPI backend | 8000 |
+| Recording server | 7331 |
+| Vite UI dev | 5173 |
+| LM Studio (local LLM) | 1234 |
diff --git a/docs/PHASE2B_BROWSER_EXECUTION.md b/docs/PHASE2B_BROWSER_EXECUTION.md
@@ -0,0 +1,139 @@
+# Phase 2b: In-Browser Workflow Execution
+
+## Status: STARTING
+**Date:** 2026-03-17
+
+## Problem
+Workflow execution via Playwright launches a SEPARATE Chromium browser with no cookies/sessions.
+This defeats the purpose for sites requiring auth (LinkedIn Recruiter, etc.) and triggers bot detection.
+
+## Solution
+The Chrome extension itself replays workflow steps in the SAME browser where they were recorded.
+No Playwright needed. Same cookies, same sessions, same everything.
+
+## Architecture
+
+```
+YOUR Chrome (same browser, same cookies, same session)
+┌─────────────────────────────────────────────────────────┐
+│  background.ts (ExecutionEngine - orchestrator)          │
+│  - Manages step queue & state machine                    │
+│  - Survives page navigations (service worker)            │
+│  - Sends one step at a time to content script            │
+│  - Re-injects content script after navigation            │
+│  - Captures screenshots for self-healing                 │
+│                                                          │
+│  content-executor.ts (NEW - executes steps on page)      │
+│  - Finds elements: target_text → CSS → XPath             │
+│  - Executes: click(), fill(), keypress via real DOM       │
+│  - Reports success/failure back to background             │
+│  - Highlights element being acted on                     │
+│                                                          │
+│  sidepanel (execution progress UI)                       │
+│  - "Run in Browser" button                               │
+│  - Step-by-step progress with status indicators           │
+└─────────────────────────────────────────────────────────┘
+         │ HTTP (only when healing needed)
+         ▼
+┌─────────────────────────────────────────────────────────┐
+│  Python Backend (:8000)                                  │
+│  POST /api/ext-execute/heal                              │
+│  - Receives screenshot + failed step                     │
+│  - LLM diagnoses what changed on the page                │
+│  - Returns corrected selectors                           │
+└─────────────────────────────────────────────────────────┘
+```
+
+## Tab/Window Tracking
+
+The user may have multiple browsers open (2 Chrome, Firefox, Safari).
+- The extension ONLY sees its own Chrome instance
+- During recording, every event captures `tabId` and `windowId`
+- At replay: check if original tab exists → use it. Original window? → find matching tab. Otherwise → ask user.
+- `windowId` distinguishes between multiple Chrome windows
+
+## Execution Flow
+
+### Step-by-Step:
+1. User clicks "Run in Browser" in sidepanel
+2. Background resolves target tab (recorded tabId/windowId or active tab)
+3. For each step in workflow:
+   a. **navigation**: `chrome.tabs.update(tabId, {url})` → wait for load → inject executor
+   b. **click/input/key_press**: send step to content-executor → wait for result
+   c. **scroll**: send to content-executor → `window.scrollTo()`
+4. After each step that might cause navigation:
+   - Monitor `chrome.tabs.onUpdated` for URL change
+   - Wait for `status: "complete"` → re-inject executor → `EXECUTOR_READY` handshake
+5. On step failure: capture screenshot → send to backend heal endpoint → retry with corrected selectors
+
+### State Machine:
+```
+IDLE → LOADING → EXECUTING → WAITING_FOR_NAV → HEALING → COMPLETED/FAILED
+```
+
+## Content Executor: Element Finding
+
+Priority order (same as recording, but in reverse):
+1. **target_text (semantic)**: Scan interactive elements, match by textContent/aria-label/placeholder/label
+2. **cssSelector**: `document.querySelector(cssSelector)`
+3. **xpath**: `document.evaluate(xpath)`
+
+### Action Execution (Real DOM Events):
+- **click**: `element.focus()` → `element.click()` (or full MouseEvent sequence for React/Vue)
+- **input**: Native value setter → InputEvent → change event
+- **key_press**: KeyboardEvent('keydown') + ('keyup')
+- **scroll**: `window.scrollTo(x, y)`
+
+## Navigation Handling (Critical)
+
+When a click causes page navigation:
+1. Content script on old page DIES (Chrome destroys it)
+2. Background service worker SURVIVES — it monitors `chrome.tabs.onUpdated`
+3. New page finishes loading → background re-injects content-executor.ts
+4. Content script sends `EXECUTOR_READY` → background sends next step
+
+## Self-Healing Integration
+
+When a step fails:
+1. Content-executor reports failure with error to background
+2. Background captures screenshot via `chrome.tabs.captureVisibleTab()`
+3. Background POSTs screenshot + step context to `POST /api/ext-execute/heal`
+4. Backend runs StepHealer LLM diagnosis → returns corrected selectors
+5. Background sends corrected step to content-executor for retry
+6. Max 3 retries before marking step as failed
+
+## Implementation Order
+
+### Phase 1: Content Executor
+- [ ] Create `content-executor.ts` with element finding + step execution
+- [ ] Register in wxt.config.ts
+- [ ] Message protocol: EXECUTE_STEP / STEP_RESULT / EXECUTOR_READY
+
+### Phase 2: Background ExecutionEngine
+- [ ] Add ExecutionEngine class to background.ts
+- [ ] Step queue, state machine, tab tracking
+- [ ] Navigation detection + content script re-injection
+- [ ] Screenshot capture for healing
+
+### Phase 3: Backend Healing API
+- [ ] Create ext_execution_router.py
+- [ ] Adapt StepHealer to accept pre-captured screenshots
+- [ ] Wire up in api.py
+
+### Phase 4: Sidepanel UI
+- [ ] Execution progress view
+- [ ] "Run in Browser" button in dashboard
+- [ ] Tab/window selector
+
+### Phase 5: Edge Cases
+- [ ] Dynamic content waits (MutationObserver)
+- [ ] iframes
+- [ ] New tabs/popups during execution
+- [ ] Service worker sleep prevention (MV3 30s idle timeout)
+
+## Key Files
+- `extension/src/entrypoints/content.ts` — Has reusable functions: extractSemanticInfo(), getXPath(), getEnhancedCSSSelector()
+- `extension/src/entrypoints/background.ts` — Orchestrator to extend
+- `extension/src/lib/workflow-types.ts` — Step type definitions
+- `workflows/workflow_use/healing/step_healer.py` — Self-healing to adapt
+- `workflows/backend/recorder_router.py` — Pattern for new execution router
Original file line number	Diff line number	Diff line change
Expand Up		@@ -191,4 +191,3 @@ data/

		user_data_dir
		.claude/
		CLAUDE.md