feat(runner): fail-closed security parity on the pi backend (#525) by gewenyu99 · Pull Request #695 · PostHog/wizard

gewenyu99 · 2026-06-19T20:32:28Z

Epic #520 · implements #525 (fail-closed security parity) + #526 (Task/todo + controlled subagents). Top of the pi stack (#692 ← #693 ← #694 ← #695).

⚠️ Draft — pi is NOT at parity with the anthropic backend yet

Done in this stack:

Runner seam + wizard-runner flag (01 — Runner seam + multivariate wizard-runner flag (anthropic | pi) #521 · feat(runner): agent-backend seam + multivariate wizard-runner flag (#521) #692)
pi.dev backend through the gateway (04 — pi.dev runner (pi) #524 · feat(runner): pi.dev backend behind wizard-runner=pi (#524) #693)
Wizard tools as pi custom tools (feat(runner): wizard tools as pi custom tools — real integration on pi (#5) #694)
Fail-closed security — canUseTool allowlist + YARA, fence inherited by subagents; unit-tested + verified blocking live (05 — Security parity (canUseTool + YARA, fail-closed) for pi #525)
Task/todo tools + controlled subagents (read-only, fence-inherited, depth-capped) (06 — Subagent / Task dispatch parity for pi #526)

Open parity gaps (tracked, not yet done):

PostHog data/MCP tools — pi can't create dashboards/insights; the anthropic flow does
[STATUS]/[DASHBOARD_URL]/[NOTEBOOK_URL] marker parsing (outro link)
wizard_ask (interactive questions)
Task-list phrasing + status/log presentation parity
Fence ergonomics (agent retries blocked bash ls/find → slow runs)
Per-variant telemetry (07 — Rollout + per-variant observability (canary split) #527) + cross-runner parity suite (08 — Cross-runner parity test suite #528)

pi has no permission layer, so attach an extension that intercepts EVERY tool call — built-in (bash/read/edit/write/grep) and custom — via pi's tool_call hook and reuses the exact anthropic policy: wizardCanUseTool (bash allowlist + .env fencing + disallowedTools) plus the YARA content scan (bash command, written content with the same wizard-doc posthog_pii suppression). A tool_result hook post-scans read/bash output for prompt injection. Everything fails closed: a scanner error blocks, and a critical post-scan violation latches so every later call is blocked and the run ends as AgentErrorType.YARA_VIOLATION. Plus a runaway tool-call cap. extensionFactories load even with noExtensions:true, so the fence is always on while the target project can't inject its own extensions. Subagents reuse the same factory so a child can't escape it. Proven by unit test (no live key needed): the blocked-action corpus (cat .env, rm -rf, curl exfil, shell operators, direct .env read/write/edit/grep) is blocked; install/build + source edits + the sanctioned env tools are allowed; the post-scan latch and runaway guard fire. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…#526) Task/todo (#526): TaskCreate/Update/Get/List as pi tools backed by a shared store; every mutation pushes the list to the TUI via getUI().syncTodos, so the todo panel updates live under pi — the parity that was missing. Controlled subagents (#526): pi has no native subagents, so dispatch_agent spawns a nested createAgentSession WE construct, which closes the leak the claude-agent-sdk path warns about. Every child inherits: the SAME security extension (canUseTool + YARA, shared cap + violation latch); a read-only built-in toolset (read/grep/find/ls + allowlisted bash) — no write/edit; and no custom tools, so no .env writes and no dispatch_agent (depth hard-capped at 1). A child can research but never mutate the project or escape the fence. Logging parity: log assistant turns ([pi] assistant: …) on message_end and tool I/O on tool_execution_*, and drive the single run spinner with one stable status at a time (no overlapping/garbled messages). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-19T20:32:37Z

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

/wizard-ci all

Test all apps in a directory:

/wizard-ci basic-integration
/wizard-ci error-tracking-upload-source-maps
/wizard-ci misc
/wizard-ci revenue

Test an individual app:

/wizard-ci basic-integration/android
/wizard-ci basic-integration/angular
/wizard-ci basic-integration/astro

Show more apps

/wizard-ci basic-integration/django
/wizard-ci basic-integration/fastapi
/wizard-ci basic-integration/flask
/wizard-ci basic-integration/javascript-node
/wizard-ci basic-integration/javascript-web
/wizard-ci basic-integration/laravel
/wizard-ci basic-integration/next-js
/wizard-ci basic-integration/nuxt
/wizard-ci basic-integration/python
/wizard-ci basic-integration/rails
/wizard-ci basic-integration/react-native
/wizard-ci basic-integration/react-router
/wizard-ci basic-integration/sveltekit
/wizard-ci basic-integration/swift
/wizard-ci basic-integration/tanstack-router
/wizard-ci basic-integration/tanstack-start
/wizard-ci basic-integration/vue
/wizard-ci error-tracking-upload-source-maps/android
/wizard-ci error-tracking-upload-source-maps/cicd-docker-node-raw
/wizard-ci error-tracking-upload-source-maps/cicd-github-actions-docker-node-raw
/wizard-ci error-tracking-upload-source-maps/cicd-github-actions-nested-docker-node-raw
/wizard-ci error-tracking-upload-source-maps/cicd-github-actions-node-raw
/wizard-ci error-tracking-upload-source-maps/cicd-gitlab-node-raw
/wizard-ci error-tracking-upload-source-maps/cicd-ssh-vps-node-raw
/wizard-ci error-tracking-upload-source-maps/flutter
/wizard-ci error-tracking-upload-source-maps/ios
/wizard-ci error-tracking-upload-source-maps/next
/wizard-ci error-tracking-upload-source-maps/next-no-posthog
/wizard-ci error-tracking-upload-source-maps/node-raw
/wizard-ci error-tracking-upload-source-maps/node-rollup
/wizard-ci error-tracking-upload-source-maps/node-rollup-typescript-plugin
/wizard-ci error-tracking-upload-source-maps/node-webpack
/wizard-ci error-tracking-upload-source-maps/nuxt-3-6
/wizard-ci error-tracking-upload-source-maps/nuxt-4-3
/wizard-ci error-tracking-upload-source-maps/react-native
/wizard-ci error-tracking-upload-source-maps/react-vite
/wizard-ci error-tracking-upload-source-maps/rust
/wizard-ci misc/quack-quack
/wizard-ci revenue/stripe

Results will be posted here when complete.

gewenyu99 · 2026-06-19T20:32:41Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

gewenyu99 and others added 2 commits June 19, 2026 16:23

This was referenced Jun 19, 2026

feat(runner): agent-backend seam + multivariate wizard-runner flag (#521) #692

Draft

feat(runner): pi.dev backend behind wizard-runner=pi (#524) #693

Draft

gewenyu99 mentioned this pull request Jun 19, 2026

feat(runner): wizard tools as pi custom tools — real integration on pi (#5) #694

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runner): fail-closed security parity on the pi backend (#525)#695

feat(runner): fail-closed security parity on the pi backend (#525)#695
gewenyu99 wants to merge 2 commits into
pi/03-wizard-tools-on-pifrom
pi/04-security-subagents

gewenyu99 commented Jun 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

gewenyu99 commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gewenyu99 commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Draft — pi is NOT at parity with the anthropic backend yet

Uh oh!

github-actions Bot commented Jun 19, 2026

🧙 Wizard CI

Uh oh!

gewenyu99 commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gewenyu99 commented Jun 19, 2026 •

edited

Loading

gewenyu99 commented Jun 19, 2026 •

edited

Loading