Replace Playwright with Kernel native API in OpenAI CUA templates#124
Replace Playwright with Kernel native API in OpenAI CUA templates#124
Conversation
Both TypeScript and Python OpenAI CUA templates now use Kernel's native computer control API (screenshot, click, type, scroll, batch, etc.) instead of Playwright over CDP. This enables the batch_computer_actions tool which executes multiple actions in a single API call for lower latency. Key changes: - New KernelComputer class wrapping Kernel SDK for all computer actions - Added batch_computer_actions function tool with system instructions - Navigation (goto/back/forward) via Kernel's playwright.execute endpoint - Local test scripts create remote Kernel browsers without app deployment - Removed playwright-core, sharp (TS) and playwright (Python) dependencies - Bumped @onkernel/sdk to ^0.38.0 and kernel to >=0.38.0 Made-with: Cursor
| } | ||
|
|
||
| const currentUrl = await this.computer.getCurrentUrl(); | ||
| utils.checkBlocklistedUrl(currentUrl); |
There was a problem hiding this comment.
TypeScript URL blocklist check return value silently ignored
Medium Severity
checkBlocklistedUrl returns a boolean, but agent.ts discards the return value, making the URL blocklist entirely non-functional. The Python counterpart correctly raises a ValueError to halt execution. Previously, Playwright's route-level route.abort() handler provided actual network-level blocking, but that was removed in this PR, leaving no working URL blocking in the TypeScript template.
Additional Locations (1)
| from .kernel_computer import KernelComputer | ||
|
|
||
| computers_config = { | ||
| "local-playwright": LocalPlaywrightBrowser, |
| return "left" | ||
| if isinstance(button, int): | ||
| return {1: "left", 2: "middle", 3: "right"}.get(button, "left") | ||
| return str(button) |
There was a problem hiding this comment.
Missing handling for special click button values
Medium Severity
The CUA model can send click actions with button set to "back", "forward", or "wheel". The deleted Playwright code explicitly handled these by routing to self.back(), self.forward(), or mouse.wheel(). The new _normalize_button/normalizeButton functions pass these strings through unchanged to the Kernel click_mouse API, which only accepts "left", "right", or "middle" — causing an API error when the model uses these button types.
Additional Locations (1)
|
Bugbot Autofix prepared fixes for 3 of the 3 bugs found in the latest run.
Or push these changes by commenting: Preview (a9e2870223)diff --git a/pkg/templates/python/openai-computer-use/computers/config.py b/pkg/templates/python/openai-computer-use/computers/config.py
deleted file mode 100644
--- a/pkg/templates/python/openai-computer-use/computers/config.py
+++ /dev/null
@@ -1,5 +1,0 @@
-from .kernel_computer import KernelComputer
-
-computers_config = {
- "kernel": KernelComputer,
-}
\ No newline at end of file
diff --git a/pkg/templates/python/openai-computer-use/computers/kernel_computer.py b/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
--- a/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
+++ b/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
@@ -74,12 +74,22 @@
def _translate_cua_action(action: Dict[str, Any]) -> Dict[str, Any]:
action_type = action.get("type", "")
if action_type == "click":
+ button = action.get("button")
+ if button == "back":
+ return {"type": "press_key", "press_key": {"keys": ["Alt_L", "Left"]}}
+ if button == "forward":
+ return {"type": "press_key", "press_key": {"keys": ["Alt_L", "Right"]}}
+ if button == "wheel":
+ return {
+ "type": "scroll",
+ "scroll": {"x": action.get("x", 0), "y": action.get("y", 0), "delta_x": 0, "delta_y": 0},
+ }
return {
"type": "click_mouse",
"click_mouse": {
"x": action.get("x", 0),
"y": action.get("y", 0),
- "button": _normalize_button(action.get("button")),
+ "button": _normalize_button(button),
},
}
elif action_type == "double_click":
@@ -134,6 +144,15 @@
return base64.b64encode(resp.read()).decode("utf-8")
def click(self, x: int, y: int, button="left") -> None:
+ if button == "back":
+ self.back()
+ return
+ if button == "forward":
+ self.forward()
+ return
+ if button == "wheel":
+ self.scroll(x, y, 0, 0)
+ return
self.client.browsers.computer.click_mouse(self.session_id, x=x, y=y, button=_normalize_button(button))
def double_click(self, x: int, y: int) -> None:
diff --git a/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts b/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts
--- a/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts
+++ b/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts
@@ -105,11 +105,18 @@
function translateCuaAction(action: CuaAction): BatchAction {
switch (action.type) {
- case 'click':
+ case 'click': {
+ if (action.button === 'back')
+ return { type: 'press_key', press_key: { keys: ['Alt_L', 'Left'] } };
+ if (action.button === 'forward')
+ return { type: 'press_key', press_key: { keys: ['Alt_L', 'Right'] } };
+ if (action.button === 'wheel')
+ return { type: 'scroll', scroll: { x: action.x ?? 0, y: action.y ?? 0, delta_x: 0, delta_y: 0 } };
return {
type: 'click_mouse',
click_mouse: { x: action.x ?? 0, y: action.y ?? 0, button: normalizeButton(action.button) },
};
+ }
case 'double_click':
return {
type: 'click_mouse',
@@ -168,6 +175,9 @@
}
async click(x: number, y: number, button: string | number = 'left'): Promise<void> {
+ if (button === 'back') { await this.back(); return; }
+ if (button === 'forward') { await this.forward(); return; }
+ if (button === 'wheel') { await this.scroll(x, y, 0, 0); return; }
await this.client.browsers.computer.clickMouse(this.sessionId, {
x,
y,
diff --git a/pkg/templates/typescript/openai-computer-use/lib/utils.ts b/pkg/templates/typescript/openai-computer-use/lib/utils.ts
--- a/pkg/templates/typescript/openai-computer-use/lib/utils.ts
+++ b/pkg/templates/typescript/openai-computer-use/lib/utils.ts
@@ -40,12 +40,14 @@
}
}
-export function checkBlocklistedUrl(url: string): boolean {
+export function checkBlocklistedUrl(url: string): void {
try {
const host = new URL(url).hostname;
- return BLOCKED_DOMAINS.some((d) => host === d || host.endsWith(`.${d}`));
- } catch {
- return false;
+ if (BLOCKED_DOMAINS.some((d) => host === d || host.endsWith(`.${d}`))) {
+ throw new Error(`Blocked URL: ${url}`);
+ }
+ } catch (e) {
+ if (e instanceof Error && e.message.startsWith('Blocked URL:')) throw e;
}
} |
This adds CUA-style backend/action event rendering (with JSONL mode support), aligns dotenv/local-run behavior across TypeScript and Python templates, and renames local entry scripts to run_local for clearer usage. Made-with: Cursor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: TS agent omits
current_urlfromcomputer_call_output- The TypeScript agent now conditionally fetches the browser URL, runs the blocklist check, and adds
current_urltocomputer_call_output.outputfor browser environments.
- The TypeScript agent now conditionally fetches the browser URL, runs the blocklist check, and adds
- ✅ Fixed: Duplicated
_describe_actionfunctions across Python files- The Python agent now imports and uses
_describe_actionand_describe_batch_actionsfromkernel_computer.pyinstead of duplicating that logic locally.
- The Python agent now imports and uses
Or push these changes by commenting:
@cursor push 90332375b9
Preview (90332375b9)
diff --git a/pkg/templates/python/openai-computer-use/agent/agent.py b/pkg/templates/python/openai-computer-use/agent/agent.py
--- a/pkg/templates/python/openai-computer-use/agent/agent.py
+++ b/pkg/templates/python/openai-computer-use/agent/agent.py
@@ -1,7 +1,11 @@
import json
import time
from typing import Any, Callable
-from computers.kernel_computer import KernelComputer
+from computers.kernel_computer import (
+ KernelComputer,
+ _describe_action,
+ _describe_batch_actions,
+)
from utils import (
create_response,
show_image,
@@ -168,47 +172,6 @@
parts.append(text)
return " ".join(parts) if parts else None
- def _describe_action(self, action_type: str, action_args: dict[str, Any]) -> str:
- if action_type == "click":
- x = int(action_args.get("x", 0))
- y = int(action_args.get("y", 0))
- button = action_args.get("button", "left")
- if button in ("", "left"):
- return f"click({x}, {y})"
- return f"click({x}, {y}, {button})"
- if action_type == "double_click":
- return f"double_click({int(action_args.get('x', 0))}, {int(action_args.get('y', 0))})"
- if action_type == "type":
- text = str(action_args.get("text", ""))
- if len(text) > 60:
- text = f"{text[:57]}..."
- return f"type({text!r})"
- if action_type == "keypress":
- keys = action_args.get("keys", [])
- return f"keypress({keys})"
- if action_type == "scroll":
- return (
- f"scroll({int(action_args.get('x', 0))}, {int(action_args.get('y', 0))}, "
- f"dx={int(action_args.get('scroll_x', 0))}, dy={int(action_args.get('scroll_y', 0))})"
- )
- if action_type == "move":
- return f"move({int(action_args.get('x', 0))}, {int(action_args.get('y', 0))})"
- if action_type == "drag":
- return "drag(...)"
- if action_type == "wait":
- return f"wait({int(action_args.get('ms', 1000))}ms)"
- if action_type == "screenshot":
- return "screenshot()"
- return action_type
-
- def _describe_batch_actions(self, actions: list[dict[str, Any]]) -> str:
- pieces: list[str] = []
- for action in actions:
- action_type = str(action.get("type", "unknown"))
- action_args = {k: v for k, v in action.items() if k != "type"}
- pieces.append(self._describe_action(action_type, action_args))
- return "batch[" + " -> ".join(pieces) + "]"
-
def _execute_computer_action(self, action_type, action_args):
if action_type == "click":
self.computer.click(**action_args)
@@ -256,7 +219,7 @@
typed_actions = [a for a in actions if isinstance(a, dict)]
payload = {
"action_type": "batch",
- "description": self._describe_batch_actions(typed_actions),
+ "description": _describe_batch_actions(typed_actions),
"action": {"type": "batch", "actions": typed_actions},
}
if elapsed_ms is not None:
@@ -299,7 +262,7 @@
elapsed_ms = self._current_model_elapsed_ms()
payload = {
"action_type": action_type,
- "description": self._describe_action(action_type, action_args),
+ "description": _describe_action(action_type, action_args),
"action": action,
}
if elapsed_ms is not None:
diff --git a/pkg/templates/python/openai-computer-use/computers/kernel_computer.py b/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
--- a/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
+++ b/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
@@ -148,6 +148,8 @@
return "drag(...)"
if action_type == "wait":
return f"wait({int(action_args.get('ms', 1000))}ms)"
+ if action_type == "screenshot":
+ return "screenshot()"
return action_type
diff --git a/pkg/templates/typescript/openai-computer-use/lib/agent.ts b/pkg/templates/typescript/openai-computer-use/lib/agent.ts
--- a/pkg/templates/typescript/openai-computer-use/lib/agent.ts
+++ b/pkg/templates/typescript/openai-computer-use/lib/agent.ts
@@ -189,9 +189,6 @@
if (!this.ackCb(msg)) throw new Error(`Safety check failed: ${msg}`);
}
- const currentUrl = await this.computer.getCurrentUrl();
- utils.checkBlocklistedUrl(currentUrl);
-
const out: Omit<ResponseComputerToolCallOutputItem, 'id'> = {
type: 'computer_call_output',
call_id: cc.call_id,
@@ -201,6 +198,11 @@
image_url: `data:image/png;base64,${screenshot}`,
},
};
+ if (this.computer.getEnvironment() === 'browser') {
+ const currentUrl = await this.computer.getCurrentUrl();
+ utils.checkBlocklistedUrl(currentUrl);
+ (out.output as { current_url?: string }).current_url = currentUrl;
+ }
return [out as ResponseItem];
}| image_url: `data:image/png;base64,${screenshot}`, | ||
| }, | ||
| }; | ||
| return [out as ResponseItem]; |
There was a problem hiding this comment.
TS agent omits current_url from computer_call_output
Medium Severity
The computer_call_output for browser environments is missing the current_url field. The Python agent correctly includes it via call_output["output"]["current_url"] = current_url, which is part of the OpenAI CUA protocol for browser environments. Without this field, the model may lose track of the browser's current page across turns, potentially degrading navigation accuracy.
| action_type = str(action.get("type", "unknown")) | ||
| action_args = {k: v for k, v in action.items() if k != "type"} | ||
| pieces.append(self._describe_action(action_type, action_args)) | ||
| return "batch[" + " -> ".join(pieces) + "]" |
There was a problem hiding this comment.
Duplicated _describe_action functions across Python files
Low Severity
_describe_action and _describe_batch_actions are fully duplicated — once as module-level functions in kernel_computer.py and again as instance methods in agent.py. The TypeScript version correctly defines these once in log-events.ts and imports them in both agent.ts and kernel-computer.ts. The Python agent could import the existing functions from kernel_computer.py instead of re-implementing them.



Summary
batch_computer_actionsfunction tool that executes multiple browser actions in a single API call, reducing latencytest.local.ts/test_local.py) that create remote Kernel browsers for testing without deploying a Kernel appDetails
New
KernelComputerclass (TS + Python) wraps the Kernel SDK for all computer actions:captureScreenshot,clickMouse,typeText,pressKey,scroll,moveMouse,dragMousebatchendpoint for batched actionsplaywright.executefor navigation (goto,back,forward,getCurrentUrl)1/2/3in batch calls)Batch tool: System instructions guide the model to prefer
batch_computer_actionsfor predictable sequences (e.g., click + type + enter).Removed dependencies:
playwright-core,sharp(TS),playwright(Python). Bumped@onkernel/sdkto^0.38.0andkernelto>=0.38.0.Test plan
test.local.tsE2E: created remote Kernel browser, ran CUA agent (eBay search task), batch tool used successfully, browser cleaned uptest_local.pyE2E: same test, batch tool used on first action (type + enter), agent completed successfullytsc --noEmit)Made with Cursor
Note
Medium Risk
Moderate risk because it replaces the core browser-control implementation and tool-calling flow (including new batched actions) across both Python and TypeScript templates, plus dependency upgrades. Impact is limited to sample templates, but regressions could break local/deployed runs and logging/output formatting.
Overview
Migrates both the Python and TypeScript OpenAI CUA templates from Playwright-over-CDP to Kernel’s native computer control API, introducing new
KernelComputerwrappers that implement screenshot/mouse/keyboard/scroll/drag via Kernel endpoints.Adds a
batch_computer_actionsfunction tool and model instructions to encourage batching predictable action sequences, plus new event-based logging (textandjsonl) that emits prompt/reasoning/text deltas, action descriptions, screenshots, and backend SDK timing.Updates docs and env examples to require
KERNEL_API_KEY, adds local runners (run_local.py,run_local.ts) and makes app entrypoints runnable locally, and removes Playwright/sharp/pillow-related code while bumping Kernel SDK dependencies to>=0.38.0/^0.38.0.Written by Cursor Bugbot for commit dcb16c7. This will update automatically on new commits. Configure here.