-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Objective
Make the skill-trigger evaluator work across all agent providers (pi, copilot, codex, etc.), not just Claude Code.
Currently, skill-trigger (in packages/core/src/evaluation/evaluators/skill-trigger.ts) hardcodes detection for two Claude Code-specific tool names:
Skilltool — checksinput.skillfor skill name substringReadtool — checksinput.file_pathfor skill name substring
Any other provider (pi, copilot, codex) never emits these tool names, so skill-trigger always fails with "No tool calls recorded" or "First tool was X — not Skill/Read".
Architecture Boundary
core-runtime — this is a built-in evaluator in packages/core/src/evaluation/evaluators/.
Design Latitude
Implementation approach is open. Some options to consider:
- Provider-specific tool name mappings — each provider declares which tool names correspond to "invoke skill" and "read file" semantics, and the evaluator checks against the active provider's mapping.
- Configurable tool matchers in EVAL.yaml — let the eval author specify which tool name + input field to match (e.g.,
tool: "Skill",input_field: "skill"as defaults, overridable per-provider or per-test). - Generic first-tool-call pattern matcher — generalize beyond skill detection to "did the first tool call match pattern X", making skill-trigger a special case of a broader
tool-matchevaluator.
The solution should preserve backward compatibility with existing skill-trigger assertions that assume Claude Code semantics.
Acceptance Signals
skill-triggerassertions pass when run against a pi or copilot target that invokes the equivalent skill/file-reading action- Existing Claude Code
skill-triggerevals continue to work without modification examples/features/agent-skills-evals/demonstrates cross-provider skill trigger detection
Non-Goals
- Changing how providers emit tool calls (that's provider-side, not evaluator-side)
- Adding new provider implementations
- Modifying the OTEL trace export pipeline
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels