fix(unit-only): Re-tune the dataset-eval e2e timing budget and export SPA... (#1522)#52
Draft
aidandaly24 wants to merge 1 commit into
Draft
fix(unit-only): Re-tune the dataset-eval e2e timing budget and export SPA... (#1522)#52aidandaly24 wants to merge 1 commit into
aidandaly24 wants to merge 1 commit into
Conversation
…N_INGESTION_DELAY_MS (aws#1522) Export SPAN_INGESTION_DELAY_MS so the progress log and the per-it timeout derive from the real 180s span-ingestion floor instead of stale hardcoded values. Raise the per-it ceiling 300000 -> 420000 (within the 600000 suite cap), lower retries 18 -> 2, and add an in-test guard asserting the timeout covers the full retry budget. Correct the misleading 'Waiting for span ingestion (15s)...' log to render the real 180s wait. Refs aws#1522
Coverage Report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refs aws#1522
Issues
e2e: dataset eval integration > runs evaluation using dataset as inputaws/agentcore-cli#1522 — The e2e test "dataset eval integration > runs evaluation using dataset as input" deterministically times out at the 300sitceiling whenever the firstagentcore run eval --datasetattempt does not succeed, because the retry budget cannot fit a second attempt. This is a flaky/broken CI test only; no user-facing product hang exists.Root cause
Timing-budget mismatch: hardcoded 180s collectSpans sleep (span-collector.ts:16,84) means each
run eval --datasetattempt has a ~180s+ floor, but the test gives 300000ms (test:185) and retries 18x with 10s gaps (test:161,181-183) — two attempts (180+10+180=370s) exceed 300s, so any non-first-try failure deterministically times out at the observed 300083ms. No product hang; all waits in the path are bounded.The fix
Re-tune the test, not product code: lower retry 18 -> 1-2 and raise the per-
ittimeout 300000 -> ~420000 (suite cap already 600000 at vitest.config.ts:68-69). Also fix the misleadingWaiting for span ingestion (15s)...log at dataset-session-provider.ts:141 (real wait is 180s) to serve the issue's logging request; optionally make SPAN_INGESTION_DELAY_MS env-overridable for e2e.Files touched: e2e-tests/dataset-eval-integration.test.ts:158-186 (retry count 18 -> 1-2; it-timeout 300000 -> ~420000). Secondary: src/cli/operations/eval/shared/dataset-session-provider.ts:141 (correct the stale
(15s)message). Optional: src/cli/operations/eval/shared/span-collector.ts:16 (make SPAN_INGESTION_DELAY_MS env-overridable for e2e). Suite-level timeouts already sufficient at vitest.config.ts:68-69.Validation evidence
The fix was verified by reproducing the original symptom and re-running after the change:
Test suite: green.
Staged on the fork as a draft for human review. Promote to aws/agentcore-cli after vetting.