Skip to content

test(tracing): add round-trip tests for ag.metrics.duration.cumulativ…#4575

Open
justin212407 wants to merge 1 commit into
Agenta-AI:mainfrom
justin212407:test-tracing-fix
Open

test(tracing): add round-trip tests for ag.metrics.duration.cumulativ…#4575
justin212407 wants to merge 1 commit into
Agenta-AI:mainfrom
justin212407:test-tracing-fix

Conversation

@justin212407
Copy link
Copy Markdown
Contributor

@justin212407 justin212407 commented Jun 7, 2026

Summary:

Fixes #4172 - ag.metrics.duration.cumulative query↔ingest schema asymmetry that caused spans to be silently dropped on re-ingest.

During ingestion, parsing.py computed wall-clock duration and wrote it as a raw scalar float. The original Pydantic model (AgMetricEntryAttributes) declared cumulative as Optional[Metrics] where Metrics = Dict[str, NumericJson] - a dict, not a scalar. This meant any span queried via /api/tracing/spans/query and re-submitted to /api/tracing/spans/ingest failed Pydantic validation silently, returning {"count": 0, "links": []} with no error surfaced to the caller.

Per the team's documented decision in docs/design/homomorphic-tracing-io/findings.md, the canonical shape for duration.cumulative is a scalar float. The production fixes (AgScalarMetricEntryAttributes in tracing.py, _coerce_scalar_metric_value in attributes.py, scalar write in parsing.py) were already present in the codebase. This PR adds the missing test coverage for those changes.

Testing:

Verified locally

# New tests only
uv run pytest oss/tests/pytest/unit/tracing/utils/test_parsing.py \
  -v -k "duration_cumulative" --no-header
# 2 passed

# Full tracing suite
uv run pytest oss/tests/pytest/unit/tracing/ -v --no-header -p no:warnings
# 74 passed

# Full unit suite
uv run pytest oss/tests/pytest/unit/ -q --no-header -p no:warnings
# 525 passed

# Linting and formatting
uv run ruff check oss/src/core/tracing/ oss/tests/pytest/unit/tracing/
uv run ruff format --check oss/tests/pytest/unit/tracing/utils/test_parsing.py
# All checks passed

Added or updated tests:

Two new unit tests added to

  1. api/oss/tests/pytest/unit/tracing/utils/test_parsing.py:
    test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest - Verifies that a span with metrics.duration.cumulative set as a scalar float passes through the full parse_spans_from_request pipeline without error and the output value is a plain float, not a dict. Covers the canonical ingest path.

  2. test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar - Verifies backward compatibility: a span carrying the old dict shape duration.cumulative = {"total": 1922.653} (as written by older ingestion code
    or replayed from a query response) is normalized to the canonical scalar 1922.653 by _coerce_scalar_metric_value inside initialize_ag_attributes. No timestamps are supplied so the computed-duration override path is skipped,
    isolating the normalization behavior.

QA follow-up:

  • Verify that existing traces in production with duration.cumulative as a scalar are still returned correctly by /api/tracing/spans/query.
  • Verify that a full query→re-ingest round-trip (the exact repro from the issue) no longer returns {"count": 0, "links": []}.
  • Verify that traces ingested by older SDK versions (which may have written duration.cumulative = {"total": v}) are normalized correctly on the way through ingest without requiring a backfill.

Demo:

N/A - no UI changes.

Checklist:

  • I have included a video or screen recording for UI changes, or marked Demo as N/A.
  • Relevant tests pass locally.
  • Relevant linting and formatting pass locally.
  • I have signed the CLA, or I will sign it when the bot prompts me.

Contributor Resources:

…e scalar shape

Signed-off-by: justin212407 <charlesjustin2124@gmail.com>
Copilot AI review requested due to automatic review settings June 7, 2026 20:02
@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 7, 2026
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 7, 2026

@justin212407 is attempting to deploy a commit to the agenta projects Team on Vercel.

A member of the Team first needs to authorize it.

@dosubot dosubot Bot added the tests label Jun 7, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Two pytest unit tests are added to test_parsing.py to validate parse_spans_from_request metric duration normalization. The first test verifies that scalar ag.metrics.duration.cumulative values are accepted and overwritten by computed wall-clock duration when start_time and end_time are present. The second test confirms that legacy dict-shaped duration metrics ({"total": ...}) are normalized into scalar form when timestamps are absent.

Changes

Metric duration ingestion normalization tests

Layer / File(s) Summary
Scalar and dict metric duration handling
api/oss/tests/pytest/unit/tracing/utils/test_parsing.py
parse_spans_from_request accepts canonical scalar ag.metrics.duration.cumulative values and overwrites them with computed wall-clock duration when timestamps are present, keeping the result scalar; it also normalizes legacy dict shape ({"total": ...}) into scalar when timestamps are absent.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title correctly identifies the main change: adding test coverage for ag.metrics.duration.cumulative round-trip behavior.
Linked Issues check ✅ Passed The PR adds test coverage for the production fixes that resolve issue #4172's schema asymmetry: it verifies scalar duration.cumulative acceptance through ingest and legacy dict-to-scalar normalization.
Out of Scope Changes check ✅ Passed The PR contains only test additions to verify existing production code behavior; no out-of-scope changes to production code are present.
Description check ✅ Passed The pull request description is clearly related to the changeset. It explains the issue being fixed, the production code changes already in place, and documents the test coverage being added for schema asymmetry in ag.metrics.duration.cumulative.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds unit tests around parse_spans_from_request to ensure duration metric normalization and precedence rules are stable across ingest/parsing.

Changes:

  • Adds a test ensuring scalar duration.cumulative is accepted and overridden by computed duration when timestamps are present.
  • Adds a test ensuring legacy {"total": v} dict shape for duration.cumulative is normalized to a scalar when timestamps are absent.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# The computed duration (1000ms from timestamps) overwrites the pre-set value;
# the key contract is that the result is a plain scalar, not a dict.
assert isinstance(metrics["duration"]["cumulative"], (int, float))
assert metrics["duration"]["cumulative"] == 1000.0
metrics = parsed[0].attributes["ag"]["metrics"]
# Legacy {"total": v} dict is collapsed to the canonical scalar by
# _coerce_scalar_metric_value inside initialize_ag_attributes.
assert metrics["duration"]["cumulative"] == 1922.653
assert len(response[0].span_id) == 16


def test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest():
assert metrics["duration"]["cumulative"] == 1000.0


def test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar():
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XS This PR changes 0-9 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants