-
Notifications
You must be signed in to change notification settings - Fork 312
Add browser execution, recording, and LLM self-healing #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Stefz29
wants to merge
4
commits into
browser-use:main
Choose a base branch
from
Stefz29:feat/local-llm-provider
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
f30e274
Add configurable local LLM provider, replacing hardcoded ChatBrowserUse
Stefz29 59e9f85
Add LLM-powered self-healing for failed workflow steps
Stefz29 7cd7757
Upgrade self-healing with autoresearch-mlx patterns
Stefz29 f03985c
Add Chrome extension in-browser execution and independent recording
Stefz29 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -191,4 +191,3 @@ data/ | |
|
|
||
| user_data_dir | ||
| .claude/ | ||
| CLAUDE.md | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,110 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| Workflow Use is a browser automation tool that records browser interactions and replays them as deterministic workflows, with LLM-powered self-healing when steps fail. It's a fork of `browser-use/workflow-use` with local LLM support and improved self-healing. | ||
|
|
||
| **Three-tier architecture:** Chrome Extension (recorder) → Python Backend (execution) → React UI (visualization) | ||
|
|
||
| ## Build & Run Commands | ||
|
|
||
| ### Python Backend (workflows/) | ||
| ```bash | ||
| cd workflows | ||
| uv sync # Install dependencies (Python 3.11+, uses hatchling) | ||
| source .venv/bin/activate | ||
| playwright install chromium # Browser engine | ||
| cp .env.example .env # Configure LLM provider | ||
| python cli.py --help # CLI entry point | ||
| python cli.py launch-gui # Starts FastAPI (8000) + Vite (5173) + opens browser | ||
| ``` | ||
|
|
||
| ### Chrome Extension (extension/) | ||
| ```bash | ||
| cd extension | ||
| npm install && npm run build # Build to .output/chrome-mv3/ | ||
| npm run dev # Watch mode | ||
| ``` | ||
|
|
||
| ### React UI (ui/) | ||
| ```bash | ||
| cd ui | ||
| npm install # postinstall auto-generates types from backend OpenAPI | ||
| npm run dev # Dev server on port 5173 | ||
| npm run type-gen-update # Regenerate TS types from running backend | ||
| ``` | ||
|
|
||
| ### Lint (all components) | ||
| ```bash | ||
| ./lint.sh # Runs ruff + ESLint + tsc for all three | ||
| # Or individually: | ||
| cd workflows && uv run ruff check && uv run ruff format --check | ||
| cd ui && npm run lint | ||
| cd extension && npm run lint | ||
| ``` | ||
|
|
||
| ### Tests | ||
| ```bash | ||
| cd workflows && uv run pytest tests/ # Run all tests | ||
| uv run pytest tests/test_wait_times.py # Single test file | ||
| ``` | ||
|
|
||
| ## Code Style | ||
|
|
||
| Python: Ruff with **tabs for indentation**, single quotes, 130 char line length. Rules: ASYNC, E, F, FAST, I, PLE (see pyproject.toml `[tool.ruff]`). | ||
|
|
||
| ## Architecture | ||
|
|
||
| ### Communication Flow | ||
| ``` | ||
| Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService | ||
| UI (React :5173) → REST API :8000 → backend/routers.py → WorkflowService | ||
| Workflow execution: Workflow.run() → _execute_step() → controller actions → Playwright | ||
| ``` | ||
|
|
||
| ### Key Modules (workflows/workflow_use/) | ||
|
|
||
| **`llm/provider.py`** — LLM factory. Reads `.env` for `LLM_PROVIDER` (local/browser_use), `LLM_BASE_URL`, `LLM_MODEL`, `LLM_HEALING_MODEL`. Returns `ChatOpenAI` for local (LM Studio/Ollama) or `ChatBrowserUse` for cloud. Entry point: `get_llm(purpose)`. | ||
|
|
||
| **`workflow/service.py`** — `Workflow` class. Main execution orchestrator. Loads from YAML, resolves variables, runs steps sequentially. On failure, delegates to `StepHealer` for vision-based repair. Key params: `enable_self_healing`, `debug` (captures screenshots). | ||
|
|
||
| **`healing/step_healer.py`** — Vision-based step repair. On failure: screenshot → LLM diagnosis → corrected selectors → retry. Tracks results in `logs/healing/healing_results.tsv`. Uses snapshot/revert pattern (saves YAML before healing, reverts if fix fails). Exponential backoff between retries. | ||
|
|
||
| **`healing/service.py`** — `HealingService`. Generates workflows from prompts by running a browser agent, capturing element mappings, then converting to deterministic steps. Two paths: LLM-based (`create_workflow_definition`) or deterministic (`_create_workflow_deterministically`). Post-processes with pattern-based variable identification. | ||
|
|
||
| **`schema/views.py`** — Pydantic models for workflow steps. `SelectorWorkflowSteps` has `target_text` (primary, semantic), `selectorStrategies` (fallback list), and legacy `cssSelector`/`xpath`. Step types: navigation, click, input, select_change, key_press, scroll, extract_page_content, agent. | ||
|
|
||
| **`recorder/service.py`** — Spawns Chromium with extension loaded, runs FastAPI on :7331 to receive events from extension. Extension captures clicks/inputs/keypresses with semantic text, XPath, CSS selectors. | ||
|
|
||
| **`controller/service.py`** — Maps workflow step types to browser-use actions. `controller/utils.py` has `get_best_element_handle()` with stable selector fallback generation. | ||
|
|
||
| **`workflow/element_finder.py`** — Multi-strategy element finding. Priority: semantic text strategies (exact, role, aria_label, placeholder, fuzzy) → XPath JavaScript evaluation. Returns element index or XPath string. | ||
|
|
||
| **`mcp/service.py`** — Exposes workflows as MCP tools via FastMCP. Dynamically generates function signatures from workflow input_schema. | ||
|
|
||
| **`storage/service.py`** — File-based CRUD. Workflows stored as `.workflow.yaml` in `storage/workflows/`. Metadata index at `storage/metadata.json`. | ||
|
|
||
| ### Extension (extension/src/) | ||
| - `content.ts` — Injected into pages. Captures clicks, inputs, keypresses with semantic label extraction (aria-label, label[for], sibling text, etc.). Uses RRWeb for DOM recording. | ||
| - `background.ts` — Service worker. Stores events per tab, converts to semantic format, POSTs to Python server at :7331. | ||
| - `sidepanel/` — React UI for recording controls and live event viewer. | ||
|
|
||
| ### CLI (workflows/cli.py) | ||
| 95KB typer app. Key commands: `create-workflow`, `run-workflow`, `run-workflow-no-ai`, `run-workflow-csv`, `generate-workflow`, `launch-gui`, `mcp-server`. Module-level `get_llm()` initialization — if LLM server is down, CLI still loads but LLM-dependent commands will fail. | ||
|
|
||
| ## Fork-Specific Details | ||
|
|
||
| - **Origin**: `github.com/Stefz29/workflow-use` (fork), **Upstream**: `github.com/browser-use/workflow-use` | ||
| - **LLM Provider**: Defaults to local LM Studio at `localhost:1234` instead of Browser Use cloud API | ||
| - **Self-Healing**: Added `StepHealer` with autoresearch-mlx patterns (snapshot/revert, TSV results tracking, exponential backoff, fail-fast checks) | ||
| - **Spec**: Full project roadmap in `~/Downloads/workflow-use-fork-instructions.md` | ||
|
|
||
| ## Ports | ||
| | Service | Port | | ||
| |---------|------| | ||
| | FastAPI backend | 8000 | | ||
| | Recording server | 7331 | | ||
| | Vite UI dev | 5173 | | ||
| | LM Studio (local LLM) | 1234 | | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,139 @@ | ||
| # Phase 2b: In-Browser Workflow Execution | ||
|
|
||
| ## Status: STARTING | ||
| **Date:** 2026-03-17 | ||
|
|
||
| ## Problem | ||
| Workflow execution via Playwright launches a SEPARATE Chromium browser with no cookies/sessions. | ||
| This defeats the purpose for sites requiring auth (LinkedIn Recruiter, etc.) and triggers bot detection. | ||
|
|
||
| ## Solution | ||
| The Chrome extension itself replays workflow steps in the SAME browser where they were recorded. | ||
| No Playwright needed. Same cookies, same sessions, same everything. | ||
|
|
||
| ## Architecture | ||
|
|
||
| ``` | ||
| YOUR Chrome (same browser, same cookies, same session) | ||
| ┌─────────────────────────────────────────────────────────┐ | ||
| │ background.ts (ExecutionEngine - orchestrator) │ | ||
| │ - Manages step queue & state machine │ | ||
| │ - Survives page navigations (service worker) │ | ||
| │ - Sends one step at a time to content script │ | ||
| │ - Re-injects content script after navigation │ | ||
| │ - Captures screenshots for self-healing │ | ||
| │ │ | ||
| │ content-executor.ts (NEW - executes steps on page) │ | ||
| │ - Finds elements: target_text → CSS → XPath │ | ||
| │ - Executes: click(), fill(), keypress via real DOM │ | ||
| │ - Reports success/failure back to background │ | ||
| │ - Highlights element being acted on │ | ||
| │ │ | ||
| │ sidepanel (execution progress UI) │ | ||
| │ - "Run in Browser" button │ | ||
| │ - Step-by-step progress with status indicators │ | ||
| └─────────────────────────────────────────────────────────┘ | ||
| │ HTTP (only when healing needed) | ||
| ▼ | ||
| ┌─────────────────────────────────────────────────────────┐ | ||
| │ Python Backend (:8000) │ | ||
| │ POST /api/ext-execute/heal │ | ||
| │ - Receives screenshot + failed step │ | ||
| │ - LLM diagnoses what changed on the page │ | ||
| │ - Returns corrected selectors │ | ||
| └─────────────────────────────────────────────────────────┘ | ||
| ``` | ||
|
|
||
| ## Tab/Window Tracking | ||
|
|
||
| The user may have multiple browsers open (2 Chrome, Firefox, Safari). | ||
| - The extension ONLY sees its own Chrome instance | ||
| - During recording, every event captures `tabId` and `windowId` | ||
| - At replay: check if original tab exists → use it. Original window? → find matching tab. Otherwise → ask user. | ||
| - `windowId` distinguishes between multiple Chrome windows | ||
|
|
||
| ## Execution Flow | ||
|
|
||
| ### Step-by-Step: | ||
| 1. User clicks "Run in Browser" in sidepanel | ||
| 2. Background resolves target tab (recorded tabId/windowId or active tab) | ||
| 3. For each step in workflow: | ||
| a. **navigation**: `chrome.tabs.update(tabId, {url})` → wait for load → inject executor | ||
| b. **click/input/key_press**: send step to content-executor → wait for result | ||
| c. **scroll**: send to content-executor → `window.scrollTo()` | ||
| 4. After each step that might cause navigation: | ||
| - Monitor `chrome.tabs.onUpdated` for URL change | ||
| - Wait for `status: "complete"` → re-inject executor → `EXECUTOR_READY` handshake | ||
| 5. On step failure: capture screenshot → send to backend heal endpoint → retry with corrected selectors | ||
|
|
||
| ### State Machine: | ||
| ``` | ||
| IDLE → LOADING → EXECUTING → WAITING_FOR_NAV → HEALING → COMPLETED/FAILED | ||
| ``` | ||
|
|
||
| ## Content Executor: Element Finding | ||
|
|
||
| Priority order (same as recording, but in reverse): | ||
| 1. **target_text (semantic)**: Scan interactive elements, match by textContent/aria-label/placeholder/label | ||
| 2. **cssSelector**: `document.querySelector(cssSelector)` | ||
| 3. **xpath**: `document.evaluate(xpath)` | ||
|
|
||
| ### Action Execution (Real DOM Events): | ||
| - **click**: `element.focus()` → `element.click()` (or full MouseEvent sequence for React/Vue) | ||
| - **input**: Native value setter → InputEvent → change event | ||
| - **key_press**: KeyboardEvent('keydown') + ('keyup') | ||
| - **scroll**: `window.scrollTo(x, y)` | ||
|
|
||
| ## Navigation Handling (Critical) | ||
|
|
||
| When a click causes page navigation: | ||
| 1. Content script on old page DIES (Chrome destroys it) | ||
| 2. Background service worker SURVIVES — it monitors `chrome.tabs.onUpdated` | ||
| 3. New page finishes loading → background re-injects content-executor.ts | ||
| 4. Content script sends `EXECUTOR_READY` → background sends next step | ||
|
|
||
| ## Self-Healing Integration | ||
|
|
||
| When a step fails: | ||
| 1. Content-executor reports failure with error to background | ||
| 2. Background captures screenshot via `chrome.tabs.captureVisibleTab()` | ||
| 3. Background POSTs screenshot + step context to `POST /api/ext-execute/heal` | ||
| 4. Backend runs StepHealer LLM diagnosis → returns corrected selectors | ||
| 5. Background sends corrected step to content-executor for retry | ||
| 6. Max 3 retries before marking step as failed | ||
|
|
||
| ## Implementation Order | ||
|
|
||
| ### Phase 1: Content Executor | ||
| - [ ] Create `content-executor.ts` with element finding + step execution | ||
| - [ ] Register in wxt.config.ts | ||
| - [ ] Message protocol: EXECUTE_STEP / STEP_RESULT / EXECUTOR_READY | ||
|
|
||
| ### Phase 2: Background ExecutionEngine | ||
| - [ ] Add ExecutionEngine class to background.ts | ||
| - [ ] Step queue, state machine, tab tracking | ||
| - [ ] Navigation detection + content script re-injection | ||
| - [ ] Screenshot capture for healing | ||
|
|
||
| ### Phase 3: Backend Healing API | ||
| - [ ] Create ext_execution_router.py | ||
| - [ ] Adapt StepHealer to accept pre-captured screenshots | ||
| - [ ] Wire up in api.py | ||
|
|
||
| ### Phase 4: Sidepanel UI | ||
| - [ ] Execution progress view | ||
| - [ ] "Run in Browser" button in dashboard | ||
| - [ ] Tab/window selector | ||
|
|
||
| ### Phase 5: Edge Cases | ||
| - [ ] Dynamic content waits (MutationObserver) | ||
| - [ ] iframes | ||
| - [ ] New tabs/popups during execution | ||
| - [ ] Service worker sleep prevention (MV3 30s idle timeout) | ||
|
|
||
| ## Key Files | ||
| - `extension/src/entrypoints/content.ts` — Has reusable functions: extractSemanticInfo(), getXPath(), getEnhancedCSSSelector() | ||
| - `extension/src/entrypoints/background.ts` — Orchestrator to extend | ||
| - `extension/src/lib/workflow-types.ts` — Step type definitions | ||
| - `workflows/workflow_use/healing/step_healer.py` — Self-healing to adapt | ||
| - `workflows/backend/recorder_router.py` — Pattern for new execution router |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Architecture flow in CLAUDE.md is outdated: recording endpoint/service no longer matches actual runtime path.
Prompt for AI agents