Skip to content

EPIC: Adapter-led integrations and benchmark runner boundaries #1467

Description

@christso

Track the public roadmap for integrations where AgentV stays the repo-native eval authoring, gating, and artifact surface while external systems provide complementary execution, observability, or import/export workflows.

Scope:

  • Keep AgentV YAML, grading semantics, run bundles, and CI exit behavior authoritative for AgentV-authored evals.
  • Add narrow adapters or documented recipes for external observability and evaluation systems when they consume completed AgentV artifacts or emit trace data AgentV can correlate with.
  • Delegate benchmark-grade execution to purpose-built runners where appropriate, then import or gate on their results through AgentV artifacts.
  • Preserve privacy and portability by making raw content export opt-in and by keeping metadata/provenance explicit.

Initial themes:

  • Opik export/import recipes over completed AgentV runs and traces.
  • Harbor-backed benchmark execution and result import boundaries.
  • Promptfoo interoperability where format conversion helps users migrate or share eval inputs without adding runtime coupling.
  • Phoenix as a boundary/reference/read-through integration, tracked separately in EPIC: Phoenix boundary notes and non-goals #1422 where the current decision is narrower.

Non-goals:

  • Do not replace AgentV with an external platform for CI authority.
  • Do not absorb benchmark-specific task schemas into AgentV core.
  • Do not create vendor-specific artifact formats when a generic projection or adapter boundary is enough.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions