Defines trace export functionality for sending AgentV evaluation data to external observability platforms. Enables debugging, monitoring, and analysis of agent execution through industry-standard tooling.
The system SHALL support exporting evaluation traces to Langfuse when enabled via CLI flag.
- WHEN the user runs
agentv run eval.yaml --langfuse - AND
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYenvironment variables are set - THEN the system creates a Langfuse trace for each completed eval case
- AND the trace includes the
eval_idas the trace name - AND the trace includes metadata for
target,dataset, andscore
- WHEN the user runs
agentv run eval.yamlwithout--langfuseflag - THEN no traces are sent to Langfuse
- AND the evaluation proceeds normally without observability overhead
- WHEN the user runs
agentv run eval.yaml --langfuse - AND
LANGFUSE_PUBLIC_KEYorLANGFUSE_SECRET_KEYis not set - THEN the system emits a warning message
- AND evaluation proceeds without Langfuse export
The system SHALL convert output_messages to Langfuse-compatible trace structure.
- WHEN an
OutputMessagehasrole: "assistant"andcontent - THEN a Langfuse Generation is created with the content as output
- AND the Generation includes
gen_ai.request.modelif available from target
- WHEN an
OutputMessagecontainstoolCallsarray - THEN each
ToolCallbecomes a Langfuse Span withtype: "tool" - AND the Span includes
gen_ai.tool.nameattribute set to the tool name - AND the Span includes
gen_ai.tool.call.idif the tool call has anid
- WHEN an
EvaluationResultis exported - THEN the trace includes a Langfuse Score with
name: "eval_score"andvalueset to the result score - AND the Score includes
commentwith the evaluation reasoning if available
The system SHALL respect privacy settings when exporting trace content.
- WHEN
LANGFUSE_CAPTURE_CONTENTis not set or set to"false" - THEN message content is replaced with placeholder text
"[content hidden]" - AND tool call inputs are replaced with
{} - AND tool call outputs are replaced with
"[output hidden]"
- WHEN
LANGFUSE_CAPTURE_CONTENTis set to"true" - THEN full message content is included in Generations
- AND full tool call inputs and outputs are included in Spans
The system SHALL support self-hosted Langfuse instances.
- WHEN
LANGFUSE_HOSTenvironment variable is set - THEN the exporter sends traces to the specified host URL
- AND authentication uses the same
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEY
- WHEN
LANGFUSE_HOSTis not set - THEN the exporter uses the default Langfuse cloud endpoint
The system SHALL handle export errors without disrupting evaluation.
- WHEN sending a trace to Langfuse fails due to network error
- THEN the system logs a warning with the error details
- AND the evaluation result is still written to the output file
- AND subsequent eval cases continue to attempt export
- WHEN all eval cases have completed
- THEN the system flushes any pending traces to Langfuse
- AND waits for flush to complete before exiting (with timeout)