Skip to content

feat(evaluator): make skill-trigger target-agnostic #613

@christso

Description

@christso

Objective

Make the skill-trigger evaluator work across all agent providers (pi, copilot, codex, etc.), not just Claude Code.

Currently, skill-trigger (in packages/core/src/evaluation/evaluators/skill-trigger.ts) hardcodes detection for two Claude Code-specific tool names:

  • Skill tool — checks input.skill for skill name substring
  • Read tool — checks input.file_path for skill name substring

Any other provider (pi, copilot, codex) never emits these tool names, so skill-trigger always fails with "No tool calls recorded" or "First tool was X — not Skill/Read".

Architecture Boundary

core-runtime — this is a built-in evaluator in packages/core/src/evaluation/evaluators/.

Design Latitude

Implementation approach is open. Some options to consider:

  1. Provider-specific tool name mappings — each provider declares which tool names correspond to "invoke skill" and "read file" semantics, and the evaluator checks against the active provider's mapping.
  2. Configurable tool matchers in EVAL.yaml — let the eval author specify which tool name + input field to match (e.g., tool: "Skill", input_field: "skill" as defaults, overridable per-provider or per-test).
  3. Generic first-tool-call pattern matcher — generalize beyond skill detection to "did the first tool call match pattern X", making skill-trigger a special case of a broader tool-match evaluator.

The solution should preserve backward compatibility with existing skill-trigger assertions that assume Claude Code semantics.

Acceptance Signals

  • skill-trigger assertions pass when run against a pi or copilot target that invokes the equivalent skill/file-reading action
  • Existing Claude Code skill-trigger evals continue to work without modification
  • examples/features/agent-skills-evals/ demonstrates cross-provider skill trigger detection

Non-Goals

  • Changing how providers emit tool calls (that's provider-side, not evaluator-side)
  • Adding new provider implementations
  • Modifying the OTEL trace export pipeline

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions