Skip to content

Commit 2428b15

Browse files
brannnclaude
andcommitted
Fix domain count in README (19 → 18 built-in)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent ffaa8da commit 2428b15

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ agent-evals check ./agents/
4242
agent-evals test ./agents/ --provider anthropic
4343
```
4444

45-
The `check` command extracts domains from each agent's system prompt, computes pairwise overlap using Jaccard similarity and LCS-based prompt comparison, flags conflicts between overlapping agents, identifies coverage gaps across 19 recognized domain categories, and scores boundary awareness. It requires no API keys or network access.
45+
The `check` command extracts domains from each agent's system prompt, computes pairwise overlap using Jaccard similarity and LCS-based prompt comparison, flags conflicts between overlapping agents, identifies coverage gaps across 18 built-in domain categories (extensible via config), and scores boundary awareness. It requires no API keys or network access.
4646

4747
The `test` command runs everything in `check`, then generates boundary questions tailored to each agent and sends them through your LLM provider. It measures whether agents hedge on out-of-scope questions, whether their self-reported confidence tracks actual capability, and whether responses stay consistent across repeated stochastic runs.
4848

0 commit comments

Comments
 (0)