feat(agent): run MCP tools as scripts by Twixes · Pull Request #2771 · PostHog/code

Twixes · 2026-06-19T07:27:01Z

Problem

The agent calls connected MCP tools one at a time. Anything that fans out (read 100 issues, comment on the stale ones) becomes 100 sequential round-trips, each paying full tool-call overhead and burning context. There was no way to express "loop over these results and act on each" in a single step.

Changes

Adds a capability that lets the agent write one JavaScript script that calls connected MCP tools as async functions:

const issues = await tools.linear.listIssues({ teamId })
for (const i of issues.filter((x) => x.stale)) {
  await tools.linear.createComment({ issueId: i.id, body: "bump" })
}
return { bumped: issues.length }

Two local tools sit alongside the existing signed-git tools:

list_mcp_tools returns .d.ts-style signatures for every tools.<server>.<tool>(args), generated from each tool's MCP input schema.
run_mcp_script takes { script, timeoutMs? } and returns { result, logs, error? }.

New module packages/agent/src/mcp-scripting/:

client-pool.ts opens and caches MCP Clients from the session config map, with listTools / callTool and a scriptableServerNames helper.
proxy.ts builds the lazy tools.<server>.<tool>(args) proxy and rejects on tool isError.
runner.ts runs the script in a constrained node:vm sandbox with a wall-clock timeout and captured console.
signatures.ts renders JSON Schema into TS-style signatures.
tools.ts holds the two local-tool definitions.

Wiring: local-tools/registry.ts carries the scriptable MCP server map on LocalToolCtx, local-tools/index.ts registers the tools, and both the Claude and Codex adapters thread the external MCP server map through.

Build, not adopt

Evaluated Cloudflare Code Mode, @utcp/code-mode / code-mode-mcp, and mcpac. Each runs as a separate process or MCP server with its own config and credentials, and several layer in a second abstraction (UTCP). None reuse the in-process McpServerConfig map with already-resolved credentials, which is the entire integration here. Building it is a thin layer over the MCP SDK Client we already depend on, with no new runtime dependencies (only @modelcontextprotocol/sdk and zod, both already present). Full rationale in the module README.

Auth and sandbox

No new auth path. The proxy dials the exact McpServerConfig map the agent's own MCP tools use, so credentials are inherited verbatim (stdio env, http/sse headers). In-process sdk servers are excluded since they have no dialable transport. A script can only reach servers the session was already authorized for; nothing the model writes can set or escalate credentials.

The node:vm sandbox runs with an explicit-allowlist global set (tools, captured console, pure helpers) and denies require, import, process, global / globalThis, Buffer, fetch, and filesystem access. codeGeneration is disabled so eval / new Function throw. This is deliberately not a hard boundary against adversarial in-process code, which is documented as a known limitation: the script author is the same agent that already calls these tools, and cloud runs sandbox the whole agent for real isolation.

How did you test this?

Automated tests only (authored by an agent):

27 new tests across mcp-scripting.test.ts (20) and client-pool.integration.test.ts (7), the latter running end-to-end against a real stdio MCP server fixture. They cover proxy generation, script execution, looping/batching, timeout enforcement, error surfacing (script throw and tool isError), blocked sandbox escapes (require / process / global / Buffer / fetch / new Function), signature rendering, tool gating, stdio env propagation, and reporting of unreachable servers.
Full packages/agent suite: 761 passing across 58 files.
Typecheck and build both clean.

Automatic notifications

Publish to changelog?
Alert Sales and Marketing teams?

🤖 Agent context

Autonomy: Human-driven (agent-assisted). Michael Matloka directed the design and scope; an agent implemented and tested it. The capability is gated behind explicit local tools and reuses existing session credentials, so it adds no new auth surface.

Lets the agent write one JS script that calls connected MCP tools as async functions instead of one tool-call at a time. Adds: - McpClientPool: opens MCP clients from the session's McpServerConfig map, inheriting auth (stdio env, http/sse headers) verbatim - buildToolsProxy: lazy tools.<server>.<tool>(args) proxy - runScript: constrained node:vm sandbox with wall-clock timeout, captured console, and no ambient fs/net/process authority - renderToolsetSignatures: JSON Schema to TS-style signatures

greptile-apps · 2026-06-19T07:32:47Z

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/agent/src/mcp-scripting/signatures.ts:104-106
`*/` sequence in tool description breaks the JSDoc comment early. Any MCP tool whose description contains `*/` (e.g. `"Computes a*/b"`) will produce output like `/** Computes a*/b */`, which closes the JSDoc at the first `*/` and leaves trailing garbage text. The model then sees malformed TypeScript that could silently misrepresent the available signatures.

```suggestion
function oneLine(text: string): string {
  return text.replace(/\s+/g, " ").trim().replace(/\*\//g, "* /");
}
```

### Issue 2 of 2
packages/agent/src/mcp-scripting/runner.ts:96-103
**Double-budget timing: total wall-clock time up to 2× `timeoutMs`**

The `run` IIFE calls `script.runInContext(context, { timeout: timeoutMs })` synchronously. This synchronous phase completes — and blocks — entirely *before* `withTimeout(run, timeoutMs)` starts its wall-clock timer. So a script with a synchronous CPU-bound loop (capped at `timeoutMs`) followed by async tool calls (capped at another `timeoutMs`) can run for nearly 2× the configured budget. At the `MAX_TIMEOUT_MS` of 120 s, that ceiling is 240 s. The `RunScriptOptions` docs and the tool's parameter description both promise a single "wall-clock budget," but the actual guarantee is two independent, sequential budgets.

_{Reviews (1): Last reviewed commit: "feat(agent): expose run_mcp_script / lis..." | Re-trigger Greptile}

greptile-apps · 2026-06-19T07:32:51Z

+function oneLine(text: string): string {
+  return text.replace(/\s+/g, " ").trim();
+}


*/ sequence in tool description breaks the JSDoc comment early. Any MCP tool whose description contains */ (e.g. "Computes a*/b") will produce output like /** Computes a*/b */, which closes the JSDoc at the first */ and leaves trailing garbage text. The model then sees malformed TypeScript that could silently misrepresent the available signatures.

Suggested change

function oneLine(text: string): string {

return text.replace(/\s+/g, " ").trim();

}

function oneLine(text: string): string {

return text.replace(/\s+/g, " ").trim().replace(/\*\//g, "* /");

}

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/agent/src/mcp-scripting/signatures.ts Line: 104-106 Comment: `*/` sequence in tool description breaks the JSDoc comment early. Any MCP tool whose description contains `*/` (e.g. `"Computes a*/b"`) will produce output like `/** Computes a*/b */`, which closes the JSDoc at the first `*/` and leaves trailing garbage text. The model then sees malformed TypeScript that could silently misrepresent the available signatures. ```suggestion function oneLine(text: string): string { return text.replace(/\s+/g, " ").trim().replace(/\*\//g, "* /"); } ``` How can I resolve this? If you propose a fix, please make it concise.

Good catch — fixed in f3a43fa. oneLine now rewrites */ to * / so a tool description can't close the generated JSDoc block early. Added a test asserting the rendered line stays well-formed.

greptile-apps · 2026-06-19T07:32:52Z

+  const run = (async (): Promise<unknown> => {
+    // `timeout` here guards synchronous spin; async work is bounded by the race.
+    const completion = script.runInContext(context, { timeout: timeoutMs });
+    return await completion;
+  })();
+
+  try {
+    const result = await withTimeout(run, timeoutMs);


Double-budget timing: total wall-clock time up to 2× timeoutMs

The run IIFE calls script.runInContext(context, { timeout: timeoutMs }) synchronously. This synchronous phase completes — and blocks — entirely before withTimeout(run, timeoutMs) starts its wall-clock timer. So a script with a synchronous CPU-bound loop (capped at timeoutMs) followed by async tool calls (capped at another timeoutMs) can run for nearly 2× the configured budget. At the MAX_TIMEOUT_MS of 120 s, that ceiling is 240 s. The RunScriptOptions docs and the tool's parameter description both promise a single "wall-clock budget," but the actual guarantee is two independent, sequential budgets.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/agent/src/mcp-scripting/runner.ts Line: 96-103 Comment: **Double-budget timing: total wall-clock time up to 2× `timeoutMs`** The `run` IIFE calls `script.runInContext(context, { timeout: timeoutMs })` synchronously. This synchronous phase completes — and blocks — entirely *before* `withTimeout(run, timeoutMs)` starts its wall-clock timer. So a script with a synchronous CPU-bound loop (capped at `timeoutMs`) followed by async tool calls (capped at another `timeoutMs`) can run for nearly 2× the configured budget. At the `MAX_TIMEOUT_MS` of 120 s, that ceiling is 240 s. The `RunScriptOptions` docs and the tool's parameter description both promise a single "wall-clock budget," but the actual guarantee is two independent, sequential budgets. How can I resolve this? If you propose a fix, please make it concise.

Fixed in f3a43fa. runScript now derives a single absolute deadline up front; the synchronous runInContext timeout is capped to the time remaining and the async race keys off the same deadline, so total wall-clock time stays within one timeoutMs budget instead of two sequential ones. Added a sync-then-async test that asserts the combined run trips the single deadline.

Registers the scripting tools in the local-tools registry and threads the session's external MCP server map into LocalToolCtx from both the Claude (claude-agent.ts) and Codex (codex-agent.ts) adapters, so a script dials the same servers with inherited auth. Tools self-disable when no external MCP servers are connected.

github-actions · 2026-06-19T07:38:53Z

React Doctor found no issues in the changed files. 🎉

_{Reviewed by React Doctor for commit f3a43fa.}

- runScript now enforces timeoutMs as one shared wall-clock deadline across the synchronous and async phases (previously up to 2x the budget) - signature rendering neutralizes */ in tool descriptions so a description can't close the generated JSDoc block early

Twixes self-assigned this Jun 19, 2026

greptile-apps Bot reviewed Jun 19, 2026

View reviewed changes

Twixes force-pushed the feat/mcp-tools-as-scripts branch from 4c0a6f6 to 49f7c85 Compare June 19, 2026 07:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): run MCP tools as scripts#2771

feat(agent): run MCP tools as scripts#2771
Twixes wants to merge 3 commits into
mainfrom
feat/mcp-tools-as-scripts

Twixes commented Jun 19, 2026

Uh oh!

greptile-apps Bot commented Jun 19, 2026

Uh oh!

greptile-apps Bot Jun 19, 2026

Uh oh!

Twixes Jun 19, 2026

Uh oh!

greptile-apps Bot Jun 19, 2026

Uh oh!

Twixes Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Twixes commented Jun 19, 2026

Problem

Changes

Build, not adopt

Auth and sandbox

How did you test this?

Automatic notifications

🤖 Agent context

Uh oh!

greptile-apps Bot commented Jun 19, 2026

Uh oh!

greptile-apps Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Twixes Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Twixes Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 19, 2026 •

edited

Loading