Skip to content

feat: storyboard-driven testing with stateless CLI#424

Merged
bokelley merged 12 commits intomainfrom
bokelley/issue-423
Apr 7, 2026
Merged

feat: storyboard-driven testing with stateless CLI#424
bokelley merged 12 commits intomainfrom
bokelley/issue-423

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented Apr 7, 2026

Summary

  • Adds src/lib/testing/storyboard/ module — a YAML-driven testing engine where each step maps directly to a SingleAgentClient method
  • Adds adcp storyboard CLI with list, show, run, and step subcommands — the step command is stateless and designed for LLM consumption (context in, result + next preview out)
  • Bundles 12 storyboard YAMLs from the AdCP spec repo with platform_types tags for backwards compat with PlatformType
  • Storyboards become the primary testing concept; platform types become tags on storyboards

Closes #423

Architecture

storyboard/
├── types.ts        — Storyboard, Phase, Step, Result, Context types
├── task-map.ts     — Maps 30 AdCP task names → SingleAgentClient methods
├── context.ts      — Convention-based context extraction & $context.<key> injection
├── validations.ts  — response_schema, field_present, field_value, status_code checks
├── loader.ts       — YAML parsing, bundled storyboard loading, platform type filtering
├── runner.ts       — runStoryboard() and runStoryboardStep() (stateless core primitive)
└── index.ts        — Public exports

CLI Examples

adcp storyboard list --platform-type retail_media --json
adcp storyboard show media_buy_seller
adcp storyboard step test-mcp media_buy_seller sync_accounts --json
adcp storyboard step test-mcp media_buy_seller get_products_brief \
  --context '{"account_id":"abc123"}' --json

Test plan

  • npm run build:lib compiles clean
  • All 76 existing tests pass
  • adcp storyboard list shows 11 storyboards
  • adcp storyboard list --platform-type retail_media filters correctly
  • adcp storyboard show media_buy_seller displays phases and steps
  • npm pack --dry-run includes storyboard YAMLs and compiled module
  • CI passes

🤖 Generated with Claude Code

bokelley and others added 3 commits April 7, 2026 06:27
Storyboards become the primary testing concept, replacing scenario-based
comply retrofitting. Each YAML step maps directly to a SingleAgentClient
method via a stateless execution engine designed for LLM consumption.

- Add src/lib/testing/storyboard/ module (types, runner, loader, validations, context, task-map)
- Add CLI: adcp storyboard {list, show, run, step} with --json output
- Bundle 12 storyboard YAMLs from adcontextprotocol/adcp spec repo
- Tag storyboards with platform_types for backwards compat with PlatformType
- Add yaml as runtime dependency for YAML parsing

Closes #423

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Storyboards are now the single testing surface for comply(). The
compliance engine runs storyboard YAMLs instead of hand-written
scenario functions, while maintaining the existing ComplianceResult
interface for backwards compatibility.

YAML format extensions:
- expect_error: inverts pass/fail for negative testing
- requires_tool: skip steps when agent lacks a tool
- context_outputs/context_inputs: explicit data flow between steps
- error_code validation: check error codes in error responses
- track/required_tools: map storyboards to compliance tracks

New compliance storyboards (10):
- governance_property_lists, governance_content_standards
- si_session, brand_rights, media_buy_state_machine
- error_compliance, schema_validation, behavioral_analysis
- audience_sync, deterministic_testing

Deprecates SCENARIO_REQUIREMENTS, DEFAULT_SCENARIOS, testAllScenarios()
in favor of storyboard execution. Old exports remain functional.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bokelley and others added 6 commits April 7, 2026 09:29
Block __proto__, constructor, and prototype keys in setPath() to
prevent prototype-polluting assignments from crafted YAML paths.
Resolves CodeQL alerts #33 and #34.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build valid requests from discovered context (products, accounts,
formats) instead of sending raw YAML sample_request payloads. Each
task has a builder that mirrors how hand-written scenarios construct
requests — selecting products with pricing options, building proper
asset records, generating valid date ranges, etc.

sample_request from YAML becomes documentation/fallback only. The
runner priority is: --request override > request builder > sample_request.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Initialize MCP session in runner before executing steps (fixes
  Streamable HTTP → SSE degradation after a few calls)
- Close MCP connections after standalone storyboard/step runs
- Register 20+ missing response schemas in TOOL_RESPONSE_SCHEMAS
  (sync_accounts, list_accounts, governance, SI, capabilities, etc.)
- Fix get_media_buy_delivery validation path: media_buys → media_buy_deliveries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Bump @modelcontextprotocol/sdk from 1.27.1 to 1.29.0
- Retry StreamableHTTP on any StreamableHTTPError before falling back
  to SSE (workaround for SDK #1708 and #1852 — session expiry)
- Fix sync_creatives validation path: results[0].action → creatives[0].action
  (matches actual SyncCreativesSuccess schema)

Test agent media_buy_seller: 8/9 pass (only sync_governance fails due to
missing spec schema — adcontextprotocol/adcp#1978)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resync schemas from spec to pick up the new sync_governance response
schema (adcontextprotocol/adcp#1978). Register it in TOOL_RESPONSE_SCHEMAS.

Test agent media_buy_seller: 9/9 pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bokelley and others added 3 commits April 7, 2026 14:45
Replace bracket assignment with Object.defineProperty so CodeQL's
static analysis recognizes the prototype pollution guard. The
FORBIDDEN_KEYS check was already correct but CodeQL doesn't trace
through Set.has().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deduplicate parsePath (was in both context.ts and validations.ts).
Move resolvePath and setPath to path.ts as the canonical location.
Add isPlainObject type guard before Object.defineProperty in setPath.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every storyboard now has a track field mapping it to a ComplianceTrack,
so comply() can discover and run storyboards for all 11 tracks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bokelley bokelley merged commit ea93508 into main Apr 7, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate storyboard-driven testing into @adcp/client

2 participants