Skip to content

[CRE-4716] vault system tests more resiliant to intermittent failures#22808

Open
Tofel wants to merge 2 commits into
developfrom
cre-4716-fix-vault-test
Open

[CRE-4716] vault system tests more resiliant to intermittent failures#22808
Tofel wants to merge 2 commits into
developfrom
cre-4716-fix-vault-test

Conversation

@Tofel

@Tofel Tofel commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • identifier_validation flake — under concurrent vault DON load, the gateway's allowlist-auth retry loop (up to 10×3s) can exceed the HTTP server's 14s RequestTimeoutMillis, returning 503 "Request timed out" before the gateway validates the invalid identifier. Since invalid-identifier requests are rejected at the gateway before fanOutToVaultNodes, the DON never receives them and there is no replay-guard risk. Fixed by wrapping both assertion sites in retry.Do (avast retry-go v4, 8 attempts, 3s fixed delay) so the test retries after the syncer has caught up.
  • parallel_independent_crud + jwt_must_not_flip flakes — valid vault create/update/delete/list requests time out at the gateway when the OCR queue is saturated by the two parallel test suites sharing the same vault DON. sendVaultSignedOCRRequestToGateway used to call require.Equal(t, http.StatusOK, statusCode) immediately on 503, killing the test before workflow-phase verification could confirm state. Fixed by returning a zero-value sentinel response on 503 "Request timed out" and adding if jsonResponse.ID == "" guards in all 5 callers, allowing the test to proceed to the authoritative workflow-phase checks.

Should fix flakes like this one: https://github.com/smartcontractkit/chainlink/actions/runs/27160638734/job/80181874164?pr=22766

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

✅ No conflicts with other open PRs targeting develop

@Tofel Tofel force-pushed the cre-4716-fix-vault-test branch from 2e2c5d8 to ecc191a Compare June 11, 2026 13:36
@trunk-io

trunk-io Bot commented Jun 11, 2026

Copy link
Copy Markdown

Static BadgeStatic BadgeStatic BadgeStatic Badge

Failed Test Failure Summary Logs
TestIntegration_MercuryV2_Plugin The test failed without a specific error message, indicating an unspecified failure during the integration test. Logs ↗︎

View Full Report ↗︎Docs

@Tofel Tofel marked this pull request as ready for review June 12, 2026 11:58
@Tofel Tofel requested review from a team as code owners June 12, 2026 11:58
@Tofel Tofel requested a review from prashantkumar1982 June 12, 2026 11:58
@Tofel Tofel changed the title vault system tests more resiliant to intermittent failures [CRE-4716] vault system tests more resiliant to intermittent failures Jun 12, 2026
Comment thread system-tests/tests/smoke/cre/vault_don_test.go Outdated
russell-stern
russell-stern previously approved these changes Jun 12, 2026
@cl-sonarqube-production

Copy link
Copy Markdown

Quality Gate failed Quality Gate failed

Failed conditions
14.8% Duplication on New Code (required ≤ 10%)

See analysis details on SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants