fix(e2e): widen TUI chat correlation retry guard to cover partial missing replies by hunglp6d · Pull Request #4572 · NVIDIA/NemoClaw

hunglp6d · 2026-05-31T01:11:57Z

Summary

The openclaw-tui-chat-correlation-e2e job in nightly run #26698759656 failed because the looksLikeEventCaptureFailure retry guard was too narrow: it only triggered when all three replies were missing (chatEvents.length === 0), but this nightly saw a partial missing-reply scenario — prompts A and B received replies while prompt C timed out with zero chat events.

Changes

Widen looksLikeEventCaptureFailure to fire whenever missingReplies.length > 0 (not just === sentRuns.length), while keeping the correlation-bug checks (no empty finals, no duplicates, no uncorrelated replies) so real regressions still fail immediately.
Remove the chatEvents.length === 0 requirement that blocked retries on partial failures.
Update the retry log message and unit test to reflect the widened guard.

Root Cause

The LLM inside the sandbox timed out on the third rapid-fire prompt (C2603) within the 120-second polling deadline. Since 2 of 3 replies arrived, the retry guard saw non-zero chatEvents and missingReplies.length (1) !== sentRuns.length (3), so it did not retry. The test failed with expected [ 'C2603-REPLY' ] to deeply equal [].

Validation

The GITHUB_TOKEN used by this CI run does not include the workflow scope, so the -custom-e2e validation branch could not be pushed. To validate manually:

gh workflow run nightly-e2e.yaml --repo NVIDIA/NemoClaw \
  --ref fix/nightly-e2e-tui-chat-partial-retry-a25b393 \
  -f jobs=openclaw-tui-chat-correlation-e2e

Original failing run: #26698759656 on a25b3931

Signed-off-by: Hung Le hple@nvidia.com

…sing replies The looksLikeEventCaptureFailure guard only triggered a retry when ALL three replies were missing (chatEvents.length === 0). In nightly run 26698759656, prompts A and B received replies but prompt C timed out — a partial missing-reply scenario that bypassed the retry guard entirely. Widen the condition to fire whenever missingReplies.length > 0 (not just === sentRuns.length) while keeping the correlation-bug checks (no empty finals, no duplicates, no uncorrelated replies) so real regressions still fail immediately. Signed-off-by: Hung Le <hple@nvidia.com>

copy-pr-bot · 2026-05-31T01:12:02Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-31T01:12:04Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5263009d-b62a-4bb6-a78f-6110941597a4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/nightly-e2e-tui-chat-partial-retry-a25b393

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-31T01:13:09Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: openclaw-tui-chat-correlation-e2e

Dispatch hint: openclaw-tui-chat-correlation-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. No required E2E: the PR is tests-only and changes the retry/classification behavior of an existing E2E harness, not NemoClaw runtime or user-flow implementation. The affected existing E2E is listed as optional for harness validation.

Optional E2E

openclaw-tui-chat-correlation-e2e (high; provisions a fresh cloud-backed OpenClaw sandbox and may take up to 75 minutes): Optional validation of the modified live TUI/webchat correlation harness and retry behavior. The PR changes only test logic, so this should not be merge-blocking unless maintainers specifically want to prove the adjusted flake guard against a real sandbox.

New E2E recommendations

None.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: openclaw-tui-chat-correlation-e2e

github-actions · 2026-05-31T01:13:10Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. Changed file is a non-scenario test outside test/e2e-scenario/ and does not affect scenario E2E workflows, metadata, runtime, suites, or scenario-relevant helpers.

Optional scenario E2E

None.

Relevant changed files

None.

github-actions · 2026-05-31T01:13:56Z

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Top item: Retry guard omits missing/duplicate user-turn checks

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: Live TUI chat correlation retry workaround: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: The widened predicate is a fallback/recovery path. It does not include `missingUserTurns` or `duplicateUserTurns`, despite comments saying it preserves duplicate-turn signal and despite final live assertions checking both fields.
Retry guard omits missing/duplicate user-turn checks (test/openclaw-tui-chat-correlation.test.ts:537): The broadened fallback now retries whenever any reply is missing and there are no empty finals, duplicate replies, or uncorrelated replies. However, the same regression test explicitly treats missing and duplicate user turns as failures. A first live attempt that has missing replies plus a missing or duplicate chat.history user turn would now be classified as a transient capture failure and retried, potentially hiding the original [Linux][Agent&Skills] TUI chat previous message disappears from UI after reconnect or scroll #2603 signal if the second attempt passes.
- Recommendation: Keep the widened missing-reply retry, but also require `analysis.missingUserTurns.length === 0` and `analysis.duplicateUserTurns.length === 0` before retrying. Add negative unit coverage showing that missing replies do not trigger a retry when history has missing or duplicate submitted prompts.
- Evidence: `looksLikeEventCaptureFailure` checks `analysis.missingReplies.length > 0`, `emptyFinalsForSubmittedRuns.length === 0`, `duplicateReplies.length === 0`, and `uncorrelatedReplies.length === 0`, but omits `missingUserTurns` and `duplicateUserTurns`. The live assertions later require both `analysis.missingUserTurns` and `analysis.duplicateUserTurns` to equal `[]`, and `ISSUE_2603_FIX_EXPECTATIONS` includes `chat.history contains exactly one user turn per submitted prompt`.

🌱 Nice ideas

None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

hunglp6d mentioned this pull request May 31, 2026

nightly-e2e: TUI chat correlation retry guard too narrow for partial missing replies #4574

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(e2e): widen TUI chat correlation retry guard to cover partial missing replies#4572

fix(e2e): widen TUI chat correlation retry guard to cover partial missing replies#4572
hunglp6d wants to merge 1 commit into
mainfrom
fix/nightly-e2e-tui-chat-partial-retry-a25b393

hunglp6d commented May 31, 2026

Uh oh!

copy-pr-bot Bot commented May 31, 2026

Uh oh!

coderabbitai Bot commented May 31, 2026

Review skipped

Uh oh!

github-actions Bot commented May 31, 2026

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 31, 2026

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 31, 2026

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hunglp6d commented May 31, 2026

Summary

Changes

Root Cause

Validation

Uh oh!

copy-pr-bot Bot commented May 31, 2026

Uh oh!

coderabbitai Bot commented May 31, 2026

Review skipped

Uh oh!

github-actions Bot commented May 31, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 31, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 31, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant