Skip to content

Replace Playwright with Kernel native API in OpenAI CUA templates#124

Open
rgarcia wants to merge 2 commits intomainfrom
rgarcia/cua-native-kernel-api
Open

Replace Playwright with Kernel native API in OpenAI CUA templates#124
rgarcia wants to merge 2 commits intomainfrom
rgarcia/cua-native-kernel-api

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Feb 25, 2026

Summary

  • Replace Playwright (over CDP) with Kernel's native computer control API in both TypeScript and Python OpenAI CUA templates
  • Add batch_computer_actions function tool that executes multiple browser actions in a single API call, reducing latency
  • Add local test scripts (test.local.ts / test_local.py) that create remote Kernel browsers for testing without deploying a Kernel app

Details

New KernelComputer class (TS + Python) wraps the Kernel SDK for all computer actions:

  • captureScreenshot, clickMouse, typeText, pressKey, scroll, moveMouse, dragMouse
  • batch endpoint for batched actions
  • playwright.execute for navigation (goto, back, forward, getCurrentUrl)
  • CUA key name to X11 keysym translation map (ported from Go reference implementation)
  • Button normalization (CUA model sends numeric button values 1/2/3 in batch calls)

Batch tool: System instructions guide the model to prefer batch_computer_actions for predictable sequences (e.g., click + type + enter).

Removed dependencies: playwright-core, sharp (TS), playwright (Python). Bumped @onkernel/sdk to ^0.38.0 and kernel to >=0.38.0.

Test plan

  • TypeScript test.local.ts E2E: created remote Kernel browser, ran CUA agent (eBay search task), batch tool used successfully, browser cleaned up
  • Python test_local.py E2E: same test, batch tool used on first action (type + enter), agent completed successfully
  • TypeScript compiles cleanly (tsc --noEmit)

Made with Cursor


Note

Medium Risk
Moderate risk because it replaces the core browser-control implementation and tool-calling flow (including new batched actions) across both Python and TypeScript templates, plus dependency upgrades. Impact is limited to sample templates, but regressions could break local/deployed runs and logging/output formatting.

Overview
Migrates both the Python and TypeScript OpenAI CUA templates from Playwright-over-CDP to Kernel’s native computer control API, introducing new KernelComputer wrappers that implement screenshot/mouse/keyboard/scroll/drag via Kernel endpoints.

Adds a batch_computer_actions function tool and model instructions to encourage batching predictable action sequences, plus new event-based logging (text and jsonl) that emits prompt/reasoning/text deltas, action descriptions, screenshots, and backend SDK timing.

Updates docs and env examples to require KERNEL_API_KEY, adds local runners (run_local.py, run_local.ts) and makes app entrypoints runnable locally, and removes Playwright/sharp/pillow-related code while bumping Kernel SDK dependencies to >=0.38.0 / ^0.38.0.

Written by Cursor Bugbot for commit dcb16c7. This will update automatically on new commits. Configure here.

Both TypeScript and Python OpenAI CUA templates now use Kernel's native
computer control API (screenshot, click, type, scroll, batch, etc.)
instead of Playwright over CDP. This enables the batch_computer_actions
tool which executes multiple actions in a single API call for lower
latency.

Key changes:
- New KernelComputer class wrapping Kernel SDK for all computer actions
- Added batch_computer_actions function tool with system instructions
- Navigation (goto/back/forward) via Kernel's playwright.execute endpoint
- Local test scripts create remote Kernel browsers without app deployment
- Removed playwright-core, sharp (TS) and playwright (Python) dependencies
- Bumped @onkernel/sdk to ^0.38.0 and kernel to >=0.38.0

Made-with: Cursor
}

const currentUrl = await this.computer.getCurrentUrl();
utils.checkBlocklistedUrl(currentUrl);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeScript URL blocklist check return value silently ignored

Medium Severity

checkBlocklistedUrl returns a boolean, but agent.ts discards the return value, making the URL blocklist entirely non-functional. The Python counterpart correctly raises a ValueError to halt execution. Previously, Playwright's route-level route.abort() handler provided actual network-level blocking, but that was removed in this PR, leaving no working URL blocking in the TypeScript template.

Additional Locations (1)

Fix in Cursor Fix in Web

from .kernel_computer import KernelComputer

computers_config = {
"local-playwright": LocalPlaywrightBrowser,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated config.py is now dead code

Low Severity

config.py was updated in this PR to reference KernelComputer, but computers/__init__.py no longer imports or exports computers_config. No other file references it either, making this entire file dead code.

Fix in Cursor Fix in Web

return "left"
if isinstance(button, int):
return {1: "left", 2: "middle", 3: "right"}.get(button, "left")
return str(button)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing handling for special click button values

Medium Severity

The CUA model can send click actions with button set to "back", "forward", or "wheel". The deleted Playwright code explicitly handled these by routing to self.back(), self.forward(), or mouse.wheel(). The new _normalize_button/normalizeButton functions pass these strings through unchanged to the Kernel click_mouse API, which only accepts "left", "right", or "middle" — causing an API error when the model uses these button types.

Additional Locations (1)

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Feb 25, 2026

Bugbot Autofix prepared fixes for 3 of the 3 bugs found in the latest run.

  • ✅ Fixed: TypeScript URL blocklist check return value silently ignored
    • Changed checkBlocklistedUrl from returning a boolean to throwing an Error when a blocked URL is detected, matching the Python counterpart's ValueError behavior.
  • ✅ Fixed: Updated config.py is now dead code
    • Deleted the dead config.py file since computers_config is not imported or used anywhere in the codebase.
  • ✅ Fixed: Missing handling for special click button values
    • Added handling for 'back', 'forward', and 'wheel' button values in both Python and TypeScript KernelComputer.click() methods (routing to back/forward/scroll) and in batch translation functions (using Alt+Left/Right keypresses and scroll actions).

Create PR

Or push these changes by commenting:

@cursor push a9e2870223
Preview (a9e2870223)
diff --git a/pkg/templates/python/openai-computer-use/computers/config.py b/pkg/templates/python/openai-computer-use/computers/config.py
deleted file mode 100644
--- a/pkg/templates/python/openai-computer-use/computers/config.py
+++ /dev/null
@@ -1,5 +1,0 @@
-from .kernel_computer import KernelComputer
-
-computers_config = {
-    "kernel": KernelComputer,
-}
\ No newline at end of file

diff --git a/pkg/templates/python/openai-computer-use/computers/kernel_computer.py b/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
--- a/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
+++ b/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
@@ -74,12 +74,22 @@
 def _translate_cua_action(action: Dict[str, Any]) -> Dict[str, Any]:
     action_type = action.get("type", "")
     if action_type == "click":
+        button = action.get("button")
+        if button == "back":
+            return {"type": "press_key", "press_key": {"keys": ["Alt_L", "Left"]}}
+        if button == "forward":
+            return {"type": "press_key", "press_key": {"keys": ["Alt_L", "Right"]}}
+        if button == "wheel":
+            return {
+                "type": "scroll",
+                "scroll": {"x": action.get("x", 0), "y": action.get("y", 0), "delta_x": 0, "delta_y": 0},
+            }
         return {
             "type": "click_mouse",
             "click_mouse": {
                 "x": action.get("x", 0),
                 "y": action.get("y", 0),
-                "button": _normalize_button(action.get("button")),
+                "button": _normalize_button(button),
             },
         }
     elif action_type == "double_click":
@@ -134,6 +144,15 @@
         return base64.b64encode(resp.read()).decode("utf-8")
 
     def click(self, x: int, y: int, button="left") -> None:
+        if button == "back":
+            self.back()
+            return
+        if button == "forward":
+            self.forward()
+            return
+        if button == "wheel":
+            self.scroll(x, y, 0, 0)
+            return
         self.client.browsers.computer.click_mouse(self.session_id, x=x, y=y, button=_normalize_button(button))
 
     def double_click(self, x: int, y: int) -> None:

diff --git a/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts b/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts
--- a/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts
+++ b/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts
@@ -105,11 +105,18 @@
 
 function translateCuaAction(action: CuaAction): BatchAction {
   switch (action.type) {
-    case 'click':
+    case 'click': {
+      if (action.button === 'back')
+        return { type: 'press_key', press_key: { keys: ['Alt_L', 'Left'] } };
+      if (action.button === 'forward')
+        return { type: 'press_key', press_key: { keys: ['Alt_L', 'Right'] } };
+      if (action.button === 'wheel')
+        return { type: 'scroll', scroll: { x: action.x ?? 0, y: action.y ?? 0, delta_x: 0, delta_y: 0 } };
       return {
         type: 'click_mouse',
         click_mouse: { x: action.x ?? 0, y: action.y ?? 0, button: normalizeButton(action.button) },
       };
+    }
     case 'double_click':
       return {
         type: 'click_mouse',
@@ -168,6 +175,9 @@
   }
 
   async click(x: number, y: number, button: string | number = 'left'): Promise<void> {
+    if (button === 'back') { await this.back(); return; }
+    if (button === 'forward') { await this.forward(); return; }
+    if (button === 'wheel') { await this.scroll(x, y, 0, 0); return; }
     await this.client.browsers.computer.clickMouse(this.sessionId, {
       x,
       y,

diff --git a/pkg/templates/typescript/openai-computer-use/lib/utils.ts b/pkg/templates/typescript/openai-computer-use/lib/utils.ts
--- a/pkg/templates/typescript/openai-computer-use/lib/utils.ts
+++ b/pkg/templates/typescript/openai-computer-use/lib/utils.ts
@@ -40,12 +40,14 @@
   }
 }
 
-export function checkBlocklistedUrl(url: string): boolean {
+export function checkBlocklistedUrl(url: string): void {
   try {
     const host = new URL(url).hostname;
-    return BLOCKED_DOMAINS.some((d) => host === d || host.endsWith(`.${d}`));
-  } catch {
-    return false;
+    if (BLOCKED_DOMAINS.some((d) => host === d || host.endsWith(`.${d}`))) {
+      throw new Error(`Blocked URL: ${url}`);
+    }
+  } catch (e) {
+    if (e instanceof Error && e.message.startsWith('Blocked URL:')) throw e;
   }
 }

This adds CUA-style backend/action event rendering (with JSONL mode support), aligns dotenv/local-run behavior across TypeScript and Python templates, and renames local entry scripts to run_local for clearer usage.

Made-with: Cursor
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: TS agent omits current_url from computer_call_output
    • The TypeScript agent now conditionally fetches the browser URL, runs the blocklist check, and adds current_url to computer_call_output.output for browser environments.
  • ✅ Fixed: Duplicated _describe_action functions across Python files
    • The Python agent now imports and uses _describe_action and _describe_batch_actions from kernel_computer.py instead of duplicating that logic locally.

Create PR

Or push these changes by commenting:

@cursor push 90332375b9
Preview (90332375b9)
diff --git a/pkg/templates/python/openai-computer-use/agent/agent.py b/pkg/templates/python/openai-computer-use/agent/agent.py
--- a/pkg/templates/python/openai-computer-use/agent/agent.py
+++ b/pkg/templates/python/openai-computer-use/agent/agent.py
@@ -1,7 +1,11 @@
 import json
 import time
 from typing import Any, Callable
-from computers.kernel_computer import KernelComputer
+from computers.kernel_computer import (
+    KernelComputer,
+    _describe_action,
+    _describe_batch_actions,
+)
 from utils import (
     create_response,
     show_image,
@@ -168,47 +172,6 @@
                 parts.append(text)
         return " ".join(parts) if parts else None
 
-    def _describe_action(self, action_type: str, action_args: dict[str, Any]) -> str:
-        if action_type == "click":
-            x = int(action_args.get("x", 0))
-            y = int(action_args.get("y", 0))
-            button = action_args.get("button", "left")
-            if button in ("", "left"):
-                return f"click({x}, {y})"
-            return f"click({x}, {y}, {button})"
-        if action_type == "double_click":
-            return f"double_click({int(action_args.get('x', 0))}, {int(action_args.get('y', 0))})"
-        if action_type == "type":
-            text = str(action_args.get("text", ""))
-            if len(text) > 60:
-                text = f"{text[:57]}..."
-            return f"type({text!r})"
-        if action_type == "keypress":
-            keys = action_args.get("keys", [])
-            return f"keypress({keys})"
-        if action_type == "scroll":
-            return (
-                f"scroll({int(action_args.get('x', 0))}, {int(action_args.get('y', 0))}, "
-                f"dx={int(action_args.get('scroll_x', 0))}, dy={int(action_args.get('scroll_y', 0))})"
-            )
-        if action_type == "move":
-            return f"move({int(action_args.get('x', 0))}, {int(action_args.get('y', 0))})"
-        if action_type == "drag":
-            return "drag(...)"
-        if action_type == "wait":
-            return f"wait({int(action_args.get('ms', 1000))}ms)"
-        if action_type == "screenshot":
-            return "screenshot()"
-        return action_type
-
-    def _describe_batch_actions(self, actions: list[dict[str, Any]]) -> str:
-        pieces: list[str] = []
-        for action in actions:
-            action_type = str(action.get("type", "unknown"))
-            action_args = {k: v for k, v in action.items() if k != "type"}
-            pieces.append(self._describe_action(action_type, action_args))
-        return "batch[" + " -> ".join(pieces) + "]"
-
     def _execute_computer_action(self, action_type, action_args):
         if action_type == "click":
             self.computer.click(**action_args)
@@ -256,7 +219,7 @@
                     typed_actions = [a for a in actions if isinstance(a, dict)]
                     payload = {
                         "action_type": "batch",
-                        "description": self._describe_batch_actions(typed_actions),
+                        "description": _describe_batch_actions(typed_actions),
                         "action": {"type": "batch", "actions": typed_actions},
                     }
                     if elapsed_ms is not None:
@@ -299,7 +262,7 @@
             elapsed_ms = self._current_model_elapsed_ms()
             payload = {
                 "action_type": action_type,
-                "description": self._describe_action(action_type, action_args),
+                "description": _describe_action(action_type, action_args),
                 "action": action,
             }
             if elapsed_ms is not None:

diff --git a/pkg/templates/python/openai-computer-use/computers/kernel_computer.py b/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
--- a/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
+++ b/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
@@ -148,6 +148,8 @@
         return "drag(...)"
     if action_type == "wait":
         return f"wait({int(action_args.get('ms', 1000))}ms)"
+    if action_type == "screenshot":
+        return "screenshot()"
     return action_type
 
 

diff --git a/pkg/templates/typescript/openai-computer-use/lib/agent.ts b/pkg/templates/typescript/openai-computer-use/lib/agent.ts
--- a/pkg/templates/typescript/openai-computer-use/lib/agent.ts
+++ b/pkg/templates/typescript/openai-computer-use/lib/agent.ts
@@ -189,9 +189,6 @@
         if (!this.ackCb(msg)) throw new Error(`Safety check failed: ${msg}`);
       }
 
-      const currentUrl = await this.computer.getCurrentUrl();
-      utils.checkBlocklistedUrl(currentUrl);
-
       const out: Omit<ResponseComputerToolCallOutputItem, 'id'> = {
         type: 'computer_call_output',
         call_id: cc.call_id,
@@ -201,6 +198,11 @@
           image_url: `data:image/png;base64,${screenshot}`,
         },
       };
+      if (this.computer.getEnvironment() === 'browser') {
+        const currentUrl = await this.computer.getCurrentUrl();
+        utils.checkBlocklistedUrl(currentUrl);
+        (out.output as { current_url?: string }).current_url = currentUrl;
+      }
       return [out as ResponseItem];
     }

image_url: `data:image/png;base64,${screenshot}`,
},
};
return [out as ResponseItem];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TS agent omits current_url from computer_call_output

Medium Severity

The computer_call_output for browser environments is missing the current_url field. The Python agent correctly includes it via call_output["output"]["current_url"] = current_url, which is part of the OpenAI CUA protocol for browser environments. Without this field, the model may lose track of the browser's current page across turns, potentially degrading navigation accuracy.

Fix in Cursor Fix in Web

action_type = str(action.get("type", "unknown"))
action_args = {k: v for k, v in action.items() if k != "type"}
pieces.append(self._describe_action(action_type, action_args))
return "batch[" + " -> ".join(pieces) + "]"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated _describe_action functions across Python files

Low Severity

_describe_action and _describe_batch_actions are fully duplicated — once as module-level functions in kernel_computer.py and again as instance methods in agent.py. The TypeScript version correctly defines these once in log-events.ts and imports them in both agent.ts and kernel-computer.ts. The Python agent could import the existing functions from kernel_computer.py instead of re-implementing them.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant