Skip to content

refactor: rename judge → grader across codebase #618

@christso

Description

@christso

Objective

Rename all user-facing "judge" terminology to "grader" across the agentv codebase, adopting the three-layer evaluation taxonomy:

Layer Term Role Example
Config Assertion What to check (YAML declaration) type: llm-grader, type: contains
Engine Evaluator How to dispatch (runtime interface) CodeEvaluator, LlmEvaluator
Scoring Grader Who scores (script or LLM) .agentv/graders/format-check.ts

Why

  1. Pipeline alignment — AgentV transpiles EVAL.yaml to agentskills evals.json. Both agentskills and skill-creator use "grader." The downstream consumer already chose the term.
  2. No framework uses "judge" as primary term — across 14 eval frameworks studied: Metric (4), Scorer (4), Grader (3), Evaluator (2), Eval (1). "Judge" is always secondary or informal.
  3. Internal inconsistency — agentv's own codebase uses Evaluator as the canonical interface (all classes are *Evaluator), but user-facing config says code-judge / llm-judge.
  4. Semantic mismatch — the "LLM-as-a-Judge" paper scoped "judge" to LLMs specifically, making code-judge semantically wrong for a TypeScript script.

Scope

User-facing renames

Before After
type: code-judge type: code-grader
type: llm-judge type: llm-grader
.agentv/judges/ .agentv/graders/
judge_target (targets.yaml) grader_target
--judge-target (CLI flag) --grader-target
defineCodeJudge() (SDK) defineCodeGrader()

Internal renames

Before After
LlmJudgeEvaluator LlmGraderEvaluatorLlmEvaluator
LlmJudgeEvaluatorConfig LlmGraderEvaluatorConfigLlmEvaluatorConfig
judge-discovery.ts grader-discovery.ts
llm-judge.ts llm-grader.ts
discoverJudges() discoverGraders()

Backward compatibility

Accept old names with deprecation warnings (same pattern as assert:assertions: in #604):

  • type: code-judge → accepted, warns "use code-grader"
  • type: llm-judge → accepted, warns "use llm-grader"
  • .agentv/judges/ → still discovered alongside .agentv/graders/, warns once
  • judge_target → accepted in targets.yaml, normalized to grader_target
  • defineCodeJudge() → re-exported as deprecated alias

Unchanged

  • assertions: YAML key (already correct)
  • agentv eval assert CLI command (config-layer verb, not scoring-layer)
  • Evaluator interface and EvaluatorRegistry (already correct)
  • Deterministic assertion types (contains, regex, equals — not graders)

Implementation plan

16 tasks in dependency order: types → evaluators → schemas → discovery → registry → orchestrator → targets → loaders → SDK → CLI → tests → examples → docs → deprecation → validation.

Full plan: agentevals-research/docs/plans/2026-03-15-eval-taxonomy-plan.md

Acceptance signals

  • All tests pass with new grader terminology
  • Old judge names accepted with deprecation warnings
  • agentv eval assert works with .agentv/graders/ directory
  • CLI --grader-target replaces --judge-target
  • Docs updated across all MDX pages
  • Examples updated across all EVAL.yaml files

Non-goals

  • Renaming EvaluatorResult or other internal result types
  • Changing the agentv eval assert CLI command name
  • Renaming deterministic assertion types
  • Cross-repo changes (agentskills already uses "grader")

Design latitude

  • Exact deprecation warning wording is flexible
  • File rename order can be adjusted if it helps avoid intermediate breakage
  • Whether .agentv/assertions/ merges into .agentv/graders/ can be deferred

Research

Based on cross-framework taxonomy research of 14 eval frameworks:

Framework Primary Term
Promptfoo Grader
agentskills Grader
skill-creator Grader
DeepEval Metric
RAGAS Metric
TruLens Metric
lm-eval-harness Metric
Braintrust Scorer
Mastra Scorer
inspect-ai Scorer
convex-evals Scorer
LangWatch Evaluator
Arize Phoenix Evaluator
OpenAI Evals Eval

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions