test(tracing): add round-trip tests for ag.metrics.duration.cumulativ…#4575
test(tracing): add round-trip tests for ag.metrics.duration.cumulativ…#4575justin212407 wants to merge 1 commit into
Conversation
…e scalar shape Signed-off-by: justin212407 <charlesjustin2124@gmail.com>
|
@justin212407 is attempting to deploy a commit to the agenta projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughTwo pytest unit tests are added to ChangesMetric duration ingestion normalization tests
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds unit tests around parse_spans_from_request to ensure duration metric normalization and precedence rules are stable across ingest/parsing.
Changes:
- Adds a test ensuring scalar
duration.cumulativeis accepted and overridden by computed duration when timestamps are present. - Adds a test ensuring legacy
{"total": v}dict shape forduration.cumulativeis normalized to a scalar when timestamps are absent.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # The computed duration (1000ms from timestamps) overwrites the pre-set value; | ||
| # the key contract is that the result is a plain scalar, not a dict. | ||
| assert isinstance(metrics["duration"]["cumulative"], (int, float)) | ||
| assert metrics["duration"]["cumulative"] == 1000.0 |
| metrics = parsed[0].attributes["ag"]["metrics"] | ||
| # Legacy {"total": v} dict is collapsed to the canonical scalar by | ||
| # _coerce_scalar_metric_value inside initialize_ag_attributes. | ||
| assert metrics["duration"]["cumulative"] == 1922.653 |
| assert len(response[0].span_id) == 16 | ||
|
|
||
|
|
||
| def test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest(): |
| assert metrics["duration"]["cumulative"] == 1000.0 | ||
|
|
||
|
|
||
| def test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar(): |
Summary:
Fixes #4172 -
ag.metrics.duration.cumulativequery↔ingest schema asymmetry that caused spans to be silently dropped on re-ingest.During ingestion,
parsing.pycomputed wall-clock duration and wrote it as a raw scalar float. The original Pydantic model (AgMetricEntryAttributes) declaredcumulativeasOptional[Metrics]whereMetrics = Dict[str, NumericJson]- a dict, not a scalar. This meant any span queried via/api/tracing/spans/queryand re-submitted to/api/tracing/spans/ingestfailed Pydantic validation silently, returning{"count": 0, "links": []}with no error surfaced to the caller.Per the team's documented decision in
docs/design/homomorphic-tracing-io/findings.md, the canonical shape forduration.cumulativeis a scalar float. The production fixes (AgScalarMetricEntryAttributesintracing.py,_coerce_scalar_metric_valueinattributes.py, scalar write inparsing.py) were already present in the codebase. This PR adds the missing test coverage for those changes.Testing:
Verified locally
Added or updated tests:
Two new unit tests added to
api/oss/tests/pytest/unit/tracing/utils/test_parsing.py:test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest- Verifies that a span withmetrics.duration.cumulativeset as a scalar float passes through the fullparse_spans_from_requestpipeline without error and the output value is a plainfloat, not a dict. Covers the canonical ingest path.test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar- Verifies backward compatibility: a span carrying the old dict shapeduration.cumulative = {"total": 1922.653}(as written by older ingestion codeor replayed from a query response) is normalized to the canonical scalar
1922.653by_coerce_scalar_metric_valueinsideinitialize_ag_attributes. No timestamps are supplied so the computed-duration override path is skipped,isolating the normalization behavior.
QA follow-up:
duration.cumulativeas a scalar are still returned correctly by/api/tracing/spans/query.{"count": 0, "links": []}.duration.cumulative = {"total": v}) are normalized correctly on the way through ingest without requiring a backfill.Demo:
N/A - no UI changes.
Checklist:
Contributor Resources: