-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Objective
Add an --assertion-type filter to agentv eval run so users can selectively run only specific assertion types during an evaluation. This enables running cheap deterministic judges without invoking expensive LLM judges.
Motivation
Currently agentv eval run executes ALL assertions in a test's assertions: array. The agentv-bench run_eval.py script works around this by implementing its own trigger detection outside of agentv's eval pipeline. With --assertion-type, users can achieve the same selective execution natively:
# Only run code-judge assertions (deterministic, zero cost)
agentv eval run EVAL.yaml --assertion-type code-judge
# Only run skill-trigger assertions
agentv eval run EVAL.yaml --assertion-type skill-trigger
# Run everything except LLM judges
agentv eval run EVAL.yaml --exclude-assertion-type llm-judgeDesign latitude
- Flag naming:
--assertion-typevs--judge-typevs--filter-assertion - Whether to support include-only, exclude-only, or both
- Whether filtering applies per-test or globally
Acceptance signals
agentv eval run EVAL.yaml --assertion-type code-judgeonly executes code-judge assertions, skipping llm-judge/contains/etc.- Tests with no matching assertions are skipped (or report N/A)
- Existing behavior unchanged when no filter is specified
Non-goals
- Changing the orchestrator's assertion execution model beyond filtering
- Supporting regex or glob patterns for assertion types in v1
Related
- Feat: support custom judges in transpile to evals.json #610 — Custom judges in transpiler (provides
run-judgefor individual execution)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels