Skip to content

fix(e2e): widen TUI chat correlation retry guard to cover partial missing replies#4572

Draft
hunglp6d wants to merge 1 commit into
mainfrom
fix/nightly-e2e-tui-chat-partial-retry-a25b393
Draft

fix(e2e): widen TUI chat correlation retry guard to cover partial missing replies#4572
hunglp6d wants to merge 1 commit into
mainfrom
fix/nightly-e2e-tui-chat-partial-retry-a25b393

Conversation

@hunglp6d
Copy link
Copy Markdown
Contributor

Summary

The openclaw-tui-chat-correlation-e2e job in nightly run #26698759656 failed because the looksLikeEventCaptureFailure retry guard was too narrow: it only triggered when all three replies were missing (chatEvents.length === 0), but this nightly saw a partial missing-reply scenario — prompts A and B received replies while prompt C timed out with zero chat events.

Changes

  • Widen looksLikeEventCaptureFailure to fire whenever missingReplies.length > 0 (not just === sentRuns.length), while keeping the correlation-bug checks (no empty finals, no duplicates, no uncorrelated replies) so real regressions still fail immediately.
  • Remove the chatEvents.length === 0 requirement that blocked retries on partial failures.
  • Update the retry log message and unit test to reflect the widened guard.

Root Cause

The LLM inside the sandbox timed out on the third rapid-fire prompt (C2603) within the 120-second polling deadline. Since 2 of 3 replies arrived, the retry guard saw non-zero chatEvents and missingReplies.length (1) !== sentRuns.length (3), so it did not retry. The test failed with expected [ 'C2603-REPLY' ] to deeply equal [].

Validation

The GITHUB_TOKEN used by this CI run does not include the workflow scope, so the -custom-e2e validation branch could not be pushed. To validate manually:

gh workflow run nightly-e2e.yaml --repo NVIDIA/NemoClaw \
  --ref fix/nightly-e2e-tui-chat-partial-retry-a25b393 \
  -f jobs=openclaw-tui-chat-correlation-e2e

Signed-off-by: Hung Le hple@nvidia.com

…sing replies

The looksLikeEventCaptureFailure guard only triggered a retry when ALL
three replies were missing (chatEvents.length === 0). In nightly run
26698759656, prompts A and B received replies but prompt C timed out —
a partial missing-reply scenario that bypassed the retry guard entirely.

Widen the condition to fire whenever missingReplies.length > 0 (not just
=== sentRuns.length) while keeping the correlation-bug checks (no empty
finals, no duplicates, no uncorrelated replies) so real regressions still
fail immediately.

Signed-off-by: Hung Le <hple@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 31, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5263009d-b62a-4bb6-a78f-6110941597a4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/nightly-e2e-tui-chat-partial-retry-a25b393

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: openclaw-tui-chat-correlation-e2e

Dispatch hint: openclaw-tui-chat-correlation-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No required E2E: the PR is tests-only and changes the retry/classification behavior of an existing E2E harness, not NemoClaw runtime or user-flow implementation. The affected existing E2E is listed as optional for harness validation.

Optional E2E

  • openclaw-tui-chat-correlation-e2e (high; provisions a fresh cloud-backed OpenClaw sandbox and may take up to 75 minutes): Optional validation of the modified live TUI/webchat correlation harness and retry behavior. The PR changes only test logic, so this should not be merge-blocking unless maintainers specifically want to prove the adjusted flake guard against a real sandbox.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: openclaw-tui-chat-correlation-e2e

@github-actions
Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. Changed file is a non-scenario test outside test/e2e-scenario/ and does not affect scenario E2E workflows, metadata, runtime, suites, or scenario-relevant helpers.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions
Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Top item: Retry guard omits missing/duplicate user-turn checks

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: Live TUI chat correlation retry workaround: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: The widened predicate is a fallback/recovery path. It does not include `missingUserTurns` or `duplicateUserTurns`, despite comments saying it preserves duplicate-turn signal and despite final live assertions checking both fields.
  • Retry guard omits missing/duplicate user-turn checks (test/openclaw-tui-chat-correlation.test.ts:537): The broadened fallback now retries whenever any reply is missing and there are no empty finals, duplicate replies, or uncorrelated replies. However, the same regression test explicitly treats missing and duplicate user turns as failures. A first live attempt that has missing replies plus a missing or duplicate chat.history user turn would now be classified as a transient capture failure and retried, potentially hiding the original [Linux][Agent&Skills] TUI chat previous message disappears from UI after reconnect or scroll #2603 signal if the second attempt passes.
    • Recommendation: Keep the widened missing-reply retry, but also require `analysis.missingUserTurns.length === 0` and `analysis.duplicateUserTurns.length === 0` before retrying. Add negative unit coverage showing that missing replies do not trigger a retry when history has missing or duplicate submitted prompts.
    • Evidence: `looksLikeEventCaptureFailure` checks `analysis.missingReplies.length > 0`, `emptyFinalsForSubmittedRuns.length === 0`, `duplicateReplies.length === 0`, and `uncorrelatedReplies.length === 0`, but omits `missingUserTurns` and `duplicateUserTurns`. The live assertions later require both `analysis.missingUserTurns` and `analysis.duplicateUserTurns` to equal `[]`, and `ISSUE_2603_FIX_EXPECTATIONS` includes `chat.history contains exactly one user turn per submitted prompt`.

🌱 Nice ideas

  • None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant