Skip to content

feat(chat,mcp): Cloudflare Code Mode for the AI chat and MCP server#111

Open
Makisuo wants to merge 10 commits into
mainfrom
feat/code-mode-chat-mcp
Open

feat(chat,mcp): Cloudflare Code Mode for the AI chat and MCP server#111
Makisuo wants to merge 10 commits into
mainfrom
feat/code-mode-chat-mcp

Conversation

@Makisuo

@Makisuo Makisuo commented Jun 22, 2026

Copy link
Copy Markdown
Owner

What

Applies Cloudflare's Code Mode to both the AI chat (apps/chat-flue) and the MCP server (apps/api). Instead of handing the model ~51 tools one at a time, it can write a JS snippet against a generated typed maple.* API that runs in a Cloudflare Dynamic Worker isolate (network blocked); each maple.<tool>(input) RPCs back to the existing tools. Multi-step investigations ("find the worst service → fetch a sample trace → correlate") collapse from N model round-trips into one.

Ships hybrid behind MAPLE_CODE_MODE — the direct tools stay available as a fallback.

How

  • packages/codemode/ — new source-only shared package. Pure root barrel (buildApiDeclaration JSON-schema→TS, buildHarnessModule, formatRunResult) + a ./sandbox subpath (isolating the cloudflare:workers import) with runCodeInSandbox + MapleSupervisor (RpcTarget). The harness splices the model's JS into an async IIFE module — no runtime eval.
  • Chat (apps/chat-flue/src/lib/codemode/) — a run_code Flue tool injected only when MAPLE_CODE_MODE=1 and the LOADER binding is present. Dispatch reuses the approval-gated tool execs, so mutating maple.* calls return proposals, collected into a proposed_batch envelope the web renders as one approval card each (parseToolProposalBatch). /api/chat/apply is unchanged.
  • MCP (apps/api/src/mcp/tools/run-code.ts) — a run_code tool dispatching to registry handlers under the captured request runtime (FiberSet.makeRuntimePromise + Effect.scoped), preserving org scoping. Mutating tools are blocked inside code (call them directly so the host's approval applies).
  • DeployWorkerLoader() binding added to both alchemy.run.ts, gated on MAPLE_CODE_MODE.

Reviewer notes

  • ⚠️ Worker Loader is a Cloudflare beta. The worker_loader binding only deploys when MAPLE_CODE_MODE is set, and the agent no-ops Code Mode without the LOADER binding (direct tools only) — so default behavior is unchanged and accounts without beta access are unaffected. Live isolate execution can't be verified locally (flue dev/portless has no Worker Loader); it needs a deployed stage with beta access. If beta is unavailable, the planned fallback is a QuickJS-WASM backend behind the same runCodeInSandbox seam (no rewrite of api-gen/approval/prompt layers).
  • Footguns handled: the sandbox is dynamically imported so cloudflare:workers stays out of the Node-imported tool registry; the run-code↔registry require cycle is broken via a type-only import + dynamic mapleToolDefinitions fetch.
  • The API run_code tool is always registered (runtime-gated) so it shows in external MCP clients' tool lists even when the flag is off, returning a clean "not enabled" — deliberate, since the deploy-time flag isn't readable at module load in the Worker.

Verification

  • Typecheck: 24/24 packages green.
  • Tests: codemode 23 · chat-flue 51 · api MCP 103 + run_code 7 · web 7. The harness runs end-to-end in Node (data-URL import with a fake env.MAPLE).
  • chat-flue Flue build succeeds with Code Mode + the cloudflare:workers sandbox bundled; both alchemy.run.ts files bundle; the real system prompt renders correctly.

Unrelated apps/landing/* working-tree changes were intentionally left out of this PR.

🤖 Generated with Claude Code

Instead of handing the model 51 tools one at a time, it can now write a JS
snippet against a generated typed `maple.*` API that runs in a Cloudflare
Dynamic Worker isolate (network blocked); each `maple.<tool>(input)` RPCs back
to the existing tools. Multi-step investigations collapse into one round-trip.

- packages/codemode: source-only shared package (pure root + ./sandbox subpath
  that imports cloudflare:workers). JSON-schema→TS API gen, the sandbox harness
  (splices user JS into an async IIFE module, no eval), proposed_batch
  formatting, and runCodeInSandbox + MapleSupervisor (RpcTarget).
- chat-flue: a `run_code` Flue tool, injected only when MAPLE_CODE_MODE=1 and the
  LOADER binding is present (hybrid — the 51 direct tools stay). Dispatch reuses
  the approval-gated tool execs, so mutating maple.* calls become proposals,
  collected into a proposed_batch the web renders as one approval card each.
- apps/api MCP: a `run_code` tool dispatching to registry handlers under the
  captured request runtime (FiberSet.makeRuntimePromise + Effect.scoped),
  preserving org scoping. Mutating tools are blocked inside code.
- Deploy: WorkerLoader() binding added to both alchemy.run.ts, gated on the flag.

Worker Loader is a Cloudflare beta, so the binding only deploys when
MAPLE_CODE_MODE is set and the agent no-ops Code Mode without it (direct tools
only) — default behavior is unchanged. Verified via unit tests (the harness runs
end-to-end in Node), the Flue build, prompt rendering, and a 24-package
typecheck; live isolate execution needs a deployed stage with beta access.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Inside the MCP sandbox, the hardcoded MUTATING_TOOL_NAMES set is the only thing standing between a model-written snippet and an ungated mutation — and the test suite never enforces that every mutating tool in the registry is in that set. The current set is complete, but the design fails open: a future mutating tool added to the registry but forgotten in the set will execute its real side effect inside run_code. Worth closing before this ships behind the flag.

Reviewed changes — initial review of Cloudflare Code Mode: a sandboxed run_code tool for both the MCP server and the AI chat that lets the model orchestrate existing Maple tools from a JS snippet running in a network-isolated Worker Loader isolate.

  • New @maple/codemode package — pure root barrel (buildApiDeclaration JSON-schema→TS, buildHarnessModule, formatRunResult/formatRunOutput) plus a ./sandbox subpath isolating cloudflare:workers (runCodeInSandbox, MapleSupervisor extends RpcTarget). The harness splices model code into an async IIFE — no runtime eval.
  • MCP run_code (apps/api/src/mcp/tools/run-code.ts) — dispatches read-only tool handlers under the captured request runtime (FiberSet.makeRuntimePromise + Effect.scoped), preserving org scoping; mutating tools blocked via MUTATING_TOOL_NAMES. Always registered, runtime-gated on MAPLE_CODE_MODE=1 + the LOADER binding.
  • Chat run_code (apps/chat-flue/src/lib/codemode/) — reuses the approval-gated execute, so mutating maple.* calls return proposal markers collected into a proposed_batch envelope; the web renders one ApprovalCard per proposal (parseToolProposalBatch, keyed ${toolCallId}#${i}).
  • Prompt + deploy wiringformatCodeModeBlock appends the generated maple.* API to the system prompt; WorkerLoader() added to both alchemy.run.ts files, gated on MAPLE_CODE_MODE.
  • Incidental bun.lock churn — an accepts/send/negotiator dependency relock unrelated to Code Mode is folded into this PR.

⚠️ Mutating-tool gating inside the MCP sandbox fails open

The run_code MCP tool blocks mutating tools by checking a hand-maintained MUTATING_TOOL_NAMES set. That set is the single point of enforcement: any name not in it runs its real handler under the captured request runtime, side effects and all. The set is correct today, but nothing prevents it from silently falling behind the registry.

  • The api-side test (mutating.test.ts) only asserts MUTATING_TOOL_NAMES ⊆ registry. It never asserts the safety-critical inverse — that every mutating tool in the registry is in the set.
  • There are two independent copies of the set (apps/api/src/mcp/tools/mutating.ts and apps/chat-flue/src/lib/approval.ts) kept aligned only by a "keep in sync" comment, with no shared source of truth and no equality test.

Before Code Mode, this set guarded only /chat/apply, which fails closed (an unknown tool is simply rejected). Code Mode makes the same set fail open on the MCP path, which is the behavioral change worth a guard.

Technical details
# Mutating-tool gating inside the MCP sandbox fails open

## Affected sites
- `apps/api/src/mcp/tools/run-code.ts:35``resolveCodeModeCall` blocks via `MUTATING_TOOL_NAMES.has(name)`; this is the only gate before `invoke(definition, decoded)` runs the real handler.
- `apps/api/src/mcp/tools/mutating.test.ts:6-11` — asserts only `set ⊆ registry`, not `registry-mutating ⊆ set`.
- `apps/api/src/mcp/tools/mutating.ts` and `apps/chat-flue/src/lib/approval.ts` — two hardcoded copies, "keep in sync" comment only.

## Required outcome
- A regression test (or a structural derivation) that guarantees every mutating tool registered in `registry.ts` is present in `MUTATING_TOOL_NAMES`, so adding a mutating tool without gating it fails CI rather than silently executing inside `run_code`.
- A single source of truth (or a cross-file equality test) for the api and chat-flue copies so they cannot drift.

## Suggested approach
- Cheapest: mark mutating-ness at registration (e.g. a `mutating: true` flag on `MapleToolDefinition`) and derive both the set and the gate from that, eliminating the hand-maintained list entirely.
- If the lists stay hand-maintained: add a test that fails when a registry tool whose name matches the mutating naming pattern (create_/update_/delete_/transition_/…) is absent from the set, and a test asserting the two copies are deep-equal.

## Open questions for the human
- Is the long-term plan to keep two copies, or collapse to one shared export from `@maple/domain`/a shared module both apps import?

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow run | Using Claude Opus𝕏

Comment thread apps/api/src/mcp/tools/run-code.ts
Comment thread packages/codemode/src/harness.ts Outdated
Comment thread packages/codemode/src/format.ts
CI Knip failure: packages/codemode imports `cloudflare:workers` (in the
./sandbox driver), which Knip reports as an unlisted `cloudflare` dependency
(unlisted = error). Add the workspace to knip.json with the same
`ignoreDependencies: ["cloudflare"]` the other Worker-importing workspaces use
(apps/api, apps/chat-flue, lib/effect-cloudflare).

Also stop exporting RUN_CODE_TOOL_NAME (used only internally) to clear the new
unused-export warning.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…atch cap

- run_code self-recursion: a code-mode snippet could call maple.run_code(...)
  on the MCP path (run_code is in mapleToolDefinitions and isn't mutating),
  nesting a sandbox inside the running one. resolveCodeModeCall now refuses the
  RUN_CODE_TOOL_NAME (shared const), and buildCodeModeApi skips it too (defense
  in depth for the chat path).
- Harness break-out: the model's snippet was spliced inline into the harness
  fetch scope, so code ending in `})();` could escape and tamper with the
  output cap / log capture. It now runs as its own module (user.js exporting an
  async fn of (maple, console)); a break-out just fails to parse → crashed run,
  and it can't reach __logs/__cap/env. Keeps the no-eval property.
- Unbounded proposed_batch: formatRunResult now caps proposals at
  MAX_PROPOSALS_PER_RUN (25) with a dropped-count note, bounding the envelope
  (and the number of approval cards) regardless of the run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — re-review of ca1ebf63, which addresses three of the four findings from the prior pullfrog review (e88d9a83). The MCP-self-call, harness-breakout, and unbounded-envelope concerns are all resolved cleanly; no new issues introduced.

  • run_code self-call blocked on the MCP pathresolveCodeModeCall now short-circuits when name === RUN_CODE_TOOL_NAME (a shared const promoted into packages/codemode/src/types.ts), returning a Blocked error before the dispatch lookup, and buildCodeModeApi skips it as defense-in-depth. A snippet calling maple.run_code(...) can no longer nest a sandbox on either path.
  • Harness breakout closed via module isolation — the model's snippet is now its own module (user.js exporting async (maple, console)) imported by main.js, rather than spliced into the fetch scope. A })(); break-out fails to parse, the worker fails to instantiate, and runCodeInSandbox's try/catch surfaces it as crashed: true — so the snippet can no longer reach __logs/__cap/env to subvert the output cap. buildHarnessModule was renamed to buildSandboxModules.
  • Proposal envelope boundedformatRunResult caps proposals at MAX_PROPOSALS_PER_RUN (25) with a dropped-count note, bounding both the returned envelope size and the number of approval cards rendered by the web client.
  • Tests — added the self-call rejection test (run-code.test.ts), the proposal-cap test (format.test.ts), and a break-out-isolation test plus a rewritten harness test that writes both modules to a temp dir and imports main.js to exercise the real composition.

The prior review's [!IMPORTANT] finding — the MCP mutating-tool gate fails open (no registry-mutating ⊆ MUTATING_TOOL_NAMES test, two un-synced copies of the set) — was not part of this commit and remains the one item worth closing before merge. It is unchanged code, so it is not re-anchored here.

Pullfrog  | View workflow run | Using Claude Opus𝕏

…d test

Review (fails-open): the MCP run_code sandbox blocks mutations via a
hand-maintained MUTATING_TOOL_NAMES set with no test guaranteeing every mutating
registry tool is in it, kept in two drifting copies. run_code makes the set fail
OPEN (an ungated name runs its real handler), so:

- Move MUTATING_TOOL_NAMES into @maple/codemode as the single source of truth;
  apps/api/mutating.ts re-exports it and apps/chat-flue/approval.ts imports it,
  so the two copies can no longer drift.
- Add a fail-closed regression test: every registry tool whose name looks
  mutating (create_/update_/delete_/… via shared MUTATING_TOOL_PREFIXES) must be
  in the set, so a new mutating tool can't ship ungated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Makisuo

Makisuo commented Jun 23, 2026

Copy link
Copy Markdown
Owner Author

Addressed the mutating-tool gating fails-open finding in 1d2e122:

  • Single source of truthMUTATING_TOOL_NAMES now lives in @maple/codemode; apps/api/src/mcp/tools/mutating.ts re-exports it and apps/chat-flue/src/lib/approval.ts imports it, so the two copies can no longer drift.
  • Fail-closed regression testmutating.test.ts now asserts the safety-critical inverse: every registry tool whose name looks mutating (create_/update_/delete_/… via a shared MUTATING_TOOL_PREFIXES list) must be in the set. Adding a mutating tool without gating it now fails CI instead of silently executing its side effect inside run_code.

On the open question — collapsed to one shared export (@maple/codemode) rather than keeping two copies. The three inline findings (recursion guard, harness isolation, batch cap) were addressed in ca1ebf63 and those threads are resolved.

🤖 Addressed by Claude Code

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — re-review of 1d2e1221, which closes the last open finding from the initial pullfrog review (e88d9a83): the MCP sandbox's mutating-tool gate that failed open. No new issues introduced.

  • Single source of truth for MUTATING_TOOL_NAMES — the set now lives once in packages/codemode/src/mutating.ts; both apps/api/src/mcp/tools/mutating.ts and apps/chat-flue/src/lib/approval.ts re-export it, eliminating the two hand-synced literal copies that could drift.
  • Fail-closed regression testapps/api/src/mcp/tools/mutating.test.ts now asserts the safety-critical inverse (registry-mutating ⊆ set) via looksMutating(name) && !MUTATING_TOOL_NAMES.has(name) resolving to [], so a conventionally-named mutating tool added to the registry but forgotten in the set fails CI instead of running ungated inside run_code.
  • MUTATING_TOOL_PREFIXES + looksMutating helper — a verb-prefix taxonomy (create_/update_/delete_/…/close_) added to @maple/codemode to back the heuristic, with its own unit tests asserting every gated name matches a prefix and read-only verbs don't.

I confirmed chat.http.ts's /chat/apply path still resolves MUTATING_TOOL_NAMES through the re-export (fail-closed path intact), no read-only tool name collides with a mutating prefix (which would false-fail the new test), and the full @maple/codemode suite (29 tests) is green. All four findings from the original review are now resolved.

ℹ️ Fail-closed test is bound to the verb-prefix heuristic

The new CI guard catches an ungated mutating tool only if its name starts with a verb in MUTATING_TOOL_PREFIXES. A future mutating tool named with an out-of-list verb — e.g. purge_, apply_, submit_, merge_, clear_, bulk_ — would be both ungated in the run_code sandbox and invisible to the test, quietly reintroducing the fail-open behavior this commit closed.

This is awareness-only, not a blocker: the prior review explicitly offered the prefix-heuristic as an acceptable fix, the current set is complete, and the structural alternative (a mutating: true flag on MapleToolDefinition) is a larger change. Worth keeping in mind when adding tools with unconventional verbs.

Technical details
# Fail-closed mutating-gate test is heuristic-bound to MUTATING_TOOL_PREFIXES

## Affected sites
- `packages/codemode/src/mutating.ts:50-79``MUTATING_TOOL_PREFIXES` is the closed verb list `looksMutating` checks against.
- `apps/api/src/mcp/tools/mutating.test.ts:14-26` — the fail-closed test filters registry tools by `looksMutating(name)`, so any mutating tool whose name doesn't start with a listed prefix is never asserted against the set.

## Required outcome
- A new mutating registry tool cannot ship ungated inside `run_code` regardless of its name's verb prefix.

## Suggested approach (optional)
- Cheapest incremental: when adding a tool with a new mutating verb, also add the prefix here (the existing prefix-coverage test in `packages/codemode/src/mutating.test.ts` keeps the prefix list and set aligned).
- Structural (eliminates the heuristic entirely): mark mutating-ness at registration via a `mutating: true` flag on `MapleToolDefinition` and derive both the set and the `run_code` gate from it — this was the prior review's first-choice suggestion and removes the hand-maintained list.

## Open questions for the human
- Is keeping the prefix-heuristic acceptable as the long-term guard, or is the `mutating: true`-at-registration derivation worth a follow-up issue?

Pullfrog  | View workflow run | Using Claude Opus𝕏

Review (awareness-only): the fail-closed test only catches an ungated mutating
tool if its name starts with a verb in MUTATING_TOOL_PREFIXES. Broaden the list
with the verbs called out (purge_/apply_/submit_/merge_/clear_/bulk_) plus other
common mutating verbs (none collide with current read-only tools), shrinking the
heuristic's blind spot. The structural fix (a `mutating: true` flag at tool
registration) is left as a follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Makisuo

Makisuo commented Jun 23, 2026

Copy link
Copy Markdown
Owner Author

Re: the ℹ️ heuristic-bound fail-closed test (b613a346) — and answering the open question.

Hardened the heuristic by widening MUTATING_TOOL_PREFIXES with the verbs you named (purge_/apply_/submit_/merge_/clear_/bulk_) plus other common mutating verbs (send_/sync_/cancel_/revoke_/rotate_/import_/retry_/trigger_/move_/copy_/duplicate_/upsert_/toggle_/approve_/deny_/…). None collide with current read-only tool names (the fail-closed test runs against the real registry and stays green).

Open question — I kept the prefix heuristic for this PR (the prior review accepted it, the current set is complete, and the structural change is larger), and opened a follow-up for the structural fix: a mutating: true flag declared at tool registration that derives both the set and the run_code gate exactly, removing the heuristic. That's the right long-term shape; it just doesn't need to block this PR.

🤖 Addressed by Claude Code

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — re-review of b613a346, which closes the lone informational finding from the prior pullfrog review (1d2e1221): the fail-closed mutating-gate test was heuristic-bound to MUTATING_TOOL_PREFIXES, so a future mutating tool with an out-of-list verb would slip past both the gate and the CI guard. No new issues introduced.

  • Widened MUTATING_TOOL_PREFIXES — added the exact verbs the prior review named as the gap (purge_, apply_, submit_, merge_, clear_, bulk_) plus a broader defensive set (send_, sync_, cancel_, revoke_, grant_, rotate_, import_, retry_, trigger_, dispatch_, move_, copy_, duplicate_, upsert_, toggle_, approve_, deny_, unassign_), shrinking the window where a conventionally-named mutating tool ships ungated inside run_code.
  • Documented the heuristic-vs-structural tradeoff — the doc comment now states this is a heuristic guard, not a proof, and names the structural fix (a mutating: true flag declared at tool registration that derives both the set and the gate) as the tracked follow-up.

I confirmed no read-only registry tool name collides with any of the new prefixes (which would false-fail the looksMutating(name) && !MUTATING_TOOL_NAMES.has(name) === [] guard) — the read-only tools use find_/get_/list_/search_/inspect_/mine_/diagnose_/query_/service_/describe_/compare_/explore_/run_ and the prefix-less error_detail, none of which overlap. The codemode prefix-coverage and read-only-exclusion tests (get_/run_ still excluded) remain green. The change only loosens the heuristic toward more conservative gating, so it cannot regress safety.

Pullfrog  | View workflow run | Using Claude Opus𝕏

…ration

Replaces the name-based gate with a per-tool structural flag, closing the
heuristic's blind spot (a mutating tool named with an out-of-list verb could
slip past the run_code sandbox).

- McpToolRegistrar gains `mutatingTool()` alongside `tool()`; the 19 mutating
  tools now register via `server.mutatingTool(...)`, so each declares its own
  mutating-ness at registration (and a copied tool carries the gating with it).
  MapleToolDefinition gains `mutating: boolean`.
- run_code's resolveCodeModeCall now blocks on `definition.mutating` (the
  structural flag), not just the name set — a mutating tool can't slip the gate
  regardless of its name.
- mutating.test.ts asserts the flag set EXACTLY equals the shared
  MUTATING_TOOL_NAMES (both directions) so the structural flag and the static
  list (still needed by chat-flue + /chat/apply, which see tools over MCP) can't
  drift. The verb-prefix heuristic is kept as belt-and-suspenders for an
  unflagged, conventionally-named mutating tool.

Closes the follow-up to the prior review's awareness-only finding.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Makisuo

Makisuo commented Jun 23, 2026

Copy link
Copy Markdown
Owner Author

Follow-up landed (2a4e677c): the mutating-tool gate is now structural, not just name-based.

  • McpToolRegistrar gains mutatingTool(); the 19 mutating tools register via server.mutatingTool(...), so each declares its own mutating-ness at registration (and a copied tool carries the gating). MapleToolDefinition gains mutating: boolean.
  • run_code's resolveCodeModeCall now blocks on definition.mutating — a mutating tool can't slip the sandbox gate regardless of its name's verb.
  • A test asserts the flag set exactly equals the shared MUTATING_TOOL_NAMES (both directions), so the structural flag and the static list (still needed by chat + /chat/apply, which see tools over MCP) can't drift. The verb-prefix heuristic stays as belt-and-suspenders.

This resolves the open question from the prior review — the structural derivation is now in place rather than deferred. 24/24 packages typecheck, knip clean.

🤖 Addressed by Claude Code

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — re-review of 2a4e677c, which replaces the verb-prefix heuristic from b613a346 with the structural mutating-tool gate the original review named as its first-choice fix. No new issues introduced.

  • mutatingTool registration methodMcpToolRegistrar gains a mutatingTool() alongside tool() (types.ts); all 19 state-changing tools now register via server.mutatingTool(...) instead of server.tool(...), so a copied/new mutating tool carries its own gating rather than relying on a name list.
  • Structural mutating flagMapleToolDefinition carries readonly mutating: boolean; the registry collector stamps it (add(true/false, …)), and resolveCodeModeCall now gates on definition.mutating || MUTATING_TOOL_NAMES.has(name) — the flag is the primary gate, the set is belt-and-suspenders.
  • Drift-proofing testmutating.test.ts asserts the flagged set EXACTLY equals MUTATING_TOOL_NAMES in both directions, plus retains the looksMutating ⊆ set belt-and-suspenders check; apps/api/.../mutating.ts collapses to a pure re-export of the codemode set.

I verified the 19 server.mutatingTool registrations match the 19 entries in @maple/codemode's MUTATING_TOOL_NAMES exactly; run_code itself registers via server.tool (mutating: false) and is handled by the existing RUN_CODE_TOOL_NAME recursion short-circuit; and server.ts's McpToolsLive registers every definition (including mutating ones) with the SDK, so direct invocation through the host's approval path is unaffected — the flag only gates the run_code sandbox. The api-side tests can't execute in this checkout (the @maple/* workspace symlinks are absent from node_modules, an install artifact unrelated to this PR), but @maple/codemode's own suite is green and the set/registration counts align by inspection. This is the cleaner long-term gate; nothing left open.

Pullfrog  | View workflow run | Using Claude Opus𝕏

dashboard-mutations.test.ts builds an McpToolRegistrar mock and calls
registerUpdateDashboardTool to capture the handler. update_dashboard now
registers via server.mutatingTool, so the tool()-only mock threw
"server.mutatingTool is not a function". Capture from both tool() and
mutatingTool(). Full apps/api suite green (614/614).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ristic

Code Mode is a core capability, not a flagged experiment — rework it to its
intended end state.

- Remove MAPLE_CODE_MODE everywhere. Code Mode is active whenever the Worker
  Loader sandbox (`LOADER`) is bound; the agent (chat) and run_code tool (MCP)
  key off the binding's presence, degrading to the direct tools when it's absent
  (e.g. local dev). Both alchemy deploys now attach `LOADER: WorkerLoader()`
  unconditionally — deploying requires Worker Loader beta access on the account.
- Drop the verb-prefix heuristic (MUTATING_TOOL_PREFIXES / looksMutating and its
  tests). It was a stepping stone before the structural `mutating` flag landed;
  with per-tool `server.mutatingTool(...)` registration + the exact flag<->list
  equality test, the heuristic is redundant scaffolding.

typecheck 24/24, knip clean, apps/api 613 + codemode 25 tests green, chat-flue
builds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — re-review of 1c612468, which finalizes Code Mode as a core capability by dropping the MAPLE_CODE_MODE flag and removing the verb-prefix mutating heuristic. No new issues introduced.

  • Removed the MAPLE_CODE_MODE flag entirely — activation now keys solely off the LOADER (worker_loader) binding's presence: run-code.ts gates on if (!loader), chat maple-chat.ts on if (tools.length > 0 && ctx.env.LOADER), and the prompt block on the same. Grep confirms zero remaining references to the flag.
  • WorkerLoader() now attached unconditionally in both alchemy.run.ts files (was gated on process.env.MAPLE_CODE_MODE), with runtime degradation (if (!loader)) retained as the local-dev safety net.
  • Dropped MUTATING_TOOL_PREFIXES / looksMutating and their tests — redundant scaffolding now that the structural mutating flag (set at server.mutatingTool(...) registration) is the load-bearing gate, backed by the exact flag↔list equality test in mutating.test.ts. The run_code gate (definition.mutating || MUTATING_TOOL_NAMES.has(name)) and the 19-entry shared set are intact.

The structural mutating-gate guarantee is preserved: removing the prefix heuristic only drops a belt-and-suspenders check whose failures could only ever be false positives, since gating no longer depends on tool naming.

ℹ️ Unconditional WorkerLoader() makes Worker Loader beta access a hard deploy-time requirement

Previously the worker_loader binding deployed only when MAPLE_CODE_MODE was set, so accounts without Worker Loader beta access were unaffected and the runtime no-op was a soft fallback. With the binding now attached unconditionally in both alchemy.run.ts files, every deploy of apps/api and apps/chat-flue provisions it — so any stage whose Cloudflare account lacks beta access could fail to deploy, not just degrade at runtime.

The commit message states this is deliberate ("deploying requires Worker Loader beta access on the account"), so this is awareness-only — the runtime if (!loader) guard only covers unbound environments, not failed provisioning. Worth confirming every deploy target (dev / staging / prod) has beta access before this merges.

Technical details
# Unconditional WorkerLoader() binding → hard beta-access requirement on every stage

## Affected sites
- `apps/api/alchemy.run.ts:159``LOADER: WorkerLoader()` now unconditional (was `...(process.env.MAPLE_CODE_MODE?.trim() ? { LOADER: WorkerLoader() } : {})`).
- `apps/chat-flue/alchemy.run.ts:111` — same unconditional binding.

## Required outcome
- Every Cloudflare account/stage targeted by these two deploys has Worker Loader beta access, OR the team accepts that stages without it will fail to deploy until granted.

## Open questions for the human
- Do all current deploy targets (dev / staging / prod, plus any per-PR preview stages) have Worker Loader beta access? The runtime `if (!loader)` degradation does NOT cover a stage where alchemy can't provision the binding at all.

Pullfrog  | View workflow run | Using Claude Opus𝕏

chat-flue was the only app whose dev script ran `flue dev` directly instead of
through portless, so portless never registered `chat-flue.localhost` and the web
app's `siblingUrl("chat-flue")` request got a bare portless 404 — surfacing in
the browser as a CORS error (the 404 has no CORS headers). The worker itself is
healthy on :3583 with correct CORS.

Mirror the other apps: `dev` → `portless`, `dev:app` runs `flue dev --port
${PORT:-3583}`, and a `portless` block names the host `chat-flue`. Pre-existing
gap (since the Flue rework), unrelated to Code Mode.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
chat-flue's `flue dev` loaded no env (flue excludes .dev.vars/.env*, and no
--env was passed), so the worker had no MAPLE_AUTH_MODE / CLERK_* / MAPLE_ROOT_
PASSWORD — every /agents/* request 401'd while the web app sent a Clerk token.
Pass `--env ../../.env.local` (the shared dev secrets the other apps load via
--env-file), which flue injects into the worker runtime env. Verified: with it,
the internal-token /workflows guard passes (404 on unknown workflow) vs 401
without — so MAPLE_AUTH_MODE=clerk + CLERK_SECRET_KEY + INTERNAL_SERVICE_TOKEN +
MAPLE_API_URL now reach the worker (fixes both /agents auth and the chat->MCP
connection).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant