EPIC: Adapter-led integrations and benchmark runner boundaries

Track the public roadmap for integrations where AgentV stays the repo-native eval authoring, gating, and artifact surface while external systems provide complementary execution, observability, or import/export workflows.

Scope:
- Keep AgentV YAML, grading semantics, run bundles, and CI exit behavior authoritative for AgentV-authored evals.
- Add narrow adapters or documented recipes for external observability and evaluation systems when they consume completed AgentV artifacts or emit trace data AgentV can correlate with.
- Delegate benchmark-grade execution to purpose-built runners where appropriate, then import or gate on their results through AgentV artifacts.
- Preserve privacy and portability by making raw content export opt-in and by keeping metadata/provenance explicit.

Initial themes:
- Opik export/import recipes over completed AgentV runs and traces.
- Harbor-backed benchmark execution and result import boundaries.
- Promptfoo interoperability where format conversion helps users migrate or share eval inputs without adding runtime coupling.
- Phoenix as a boundary/reference/read-through integration, tracked separately in #1422 where the current decision is narrower.

Non-goals:
- Do not replace AgentV with an external platform for CI authority.
- Do not absorb benchmark-specific task schemas into AgentV core.
- Do not create vendor-specific artifact formats when a generic projection or adapter boundary is enough.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EPIC: Adapter-led integrations and benchmark runner boundaries #1467

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

EPIC: Adapter-led integrations and benchmark runner boundaries #1467

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions