feat: add --assertion-type filter to agentv eval run

## Objective

Add an `--assertion-type` filter to `agentv eval run` so users can selectively run only specific assertion types during an evaluation. This enables running cheap deterministic judges without invoking expensive LLM judges.

## Motivation

Currently `agentv eval run` executes ALL assertions in a test's `assertions:` array. The agentv-bench `run_eval.py` script works around this by implementing its own trigger detection outside of agentv's eval pipeline. With `--assertion-type`, users can achieve the same selective execution natively:

```bash
# Only run code-judge assertions (deterministic, zero cost)
agentv eval run EVAL.yaml --assertion-type code-judge

# Only run skill-trigger assertions
agentv eval run EVAL.yaml --assertion-type skill-trigger

# Run everything except LLM judges
agentv eval run EVAL.yaml --exclude-assertion-type llm-judge
```

## Design latitude

- Flag naming: `--assertion-type` vs `--judge-type` vs `--filter-assertion`
- Whether to support include-only, exclude-only, or both
- Whether filtering applies per-test or globally

## Acceptance signals

- `agentv eval run EVAL.yaml --assertion-type code-judge` only executes code-judge assertions, skipping llm-judge/contains/etc.
- Tests with no matching assertions are skipped (or report N/A)
- Existing behavior unchanged when no filter is specified

## Non-goals

- Changing the orchestrator's assertion execution model beyond filtering
- Supporting regex or glob patterns for assertion types in v1

## Related

- #610 — Custom judges in transpiler (provides `run-judge` for individual execution)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --assertion-type filter to agentv eval run #616

Objective

Motivation

Design latitude

Acceptance signals

Non-goals

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: add --assertion-type filter to agentv eval run #616

Description

Objective

Motivation

Design latitude

Acceptance signals

Non-goals

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions