Skip to content

Fix ReconnectApp self-re-arming loop via clock-skew-safe receipt seeding#93426

Draft
adhorodyski wants to merge 2 commits into
Expensify:mainfrom
callstack-internal:audit-reconnect-patterns
Draft

Fix ReconnectApp self-re-arming loop via clock-skew-safe receipt seeding#93426
adhorodyski wants to merge 2 commits into
Expensify:mainfrom
callstack-internal:audit-reconnect-patterns

Conversation

@adhorodyski

@adhorodyski adhorodyski commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Explanation of Change

subscribeToFullReconnect fires reconnectApp() whenever lastFullReconnectTime < reconnectAppIfFullReconnectBefore. The bug: every OpenApp/full-ReconnectApp response re-delivers the NVP in response.onyxData, applied before the successData that writes the loop-breaking receipt — and the receipt was written as plain client-now. On a device whose clock is behind the server, the receipt always compares below the server-stamped NVP, so every receipt write re-triggers the subscription and fires another full reconnect, serially at round-trip pace. Production: ~816k ReconnectApp calls/14d, storm traces of 31 calls in 9s.

Fix: write the receipt as max(client-now, NVP) at both write sites, so it can never compare below the demand it answers, in any clock regime.

  • New pure module src/libs/FullReconnectUtils.ts with shouldTriggerFullReconnect (single authoritative trigger decision) and getFullReconnectSeedTime (the max computation; doc comment carries the clock-skew rationale).
  • src/libs/subscribeToFullReconnect.ts now calls the new triggerFullReconnect action, which optimistically seeds the receipt to max(now, NVP) before firing reconnectApp() — the re-delivered NVP can't re-arm the trigger, and the seed write's re-entrant callback is a no-op by construction.
  • src/libs/actions/App.ts: new triggerFullReconnect action; the OpenApp/full-ReconnectApp successData receipt changes from plain DateUtils.getDBTime() to getFullReconnectSeedTime(nvp) (NVP mirrored via connectWithoutView). This is the write that actually kills the skew chain.
  • Legitimate reconnects are preserved: a genuinely newer NVP always beats the receipt, since max only raises the receipt to the already-answered demand.
  • The already-seeded paths (clear/reset, sign-in, delegate, supportal via clearOnyxAndSeedFullReconnect) are untouched: post-clear the NVP is empty so max ≡ client-now.

Fixed Issues

$ #92541
PROPOSAL:

Tests

  1. Open the app in a browser with DevTools → Network open, filtered by ReconnectApp.
  2. Toggle the network offline (DevTools → Network → Offline) and back online to trigger a reconnect.
  3. Verify exactly one ReconnectApp request fires per offline → online transition, with no repeated ReconnectApp requests looping afterwards.
  4. Set the OS clock a few minutes behind the real time (the storm regime) and repeat steps 2–3: still exactly one ReconnectApp per reconnect, and the app keeps responding to clicks.
  5. Run the regression suites: npx jest tests/unit/FullReconnectUtilsTest.ts tests/unit/SubscribeToFullReconnectTest.ts tests/actions/AppTest.ts — the integration suite simulates a full response cycle on a client clock behind the server (onyxData applied before successData) and asserts exactly one ReconnectApp fires; it fails on pre-fix code.
  • Verify that no errors appear in the JS console

Offline tests

  1. Go offline, perform a few actions (open reports, navigate around).
  2. Come back online.
  3. Verify the app reconnects with a single ReconnectApp request and queued actions process normally — no repeated full-payload reconnect downloads.

QA Steps

Same as Tests (steps 1–4). This change only affects when the ReconnectApp API command fires; there are no UI changes.

  • Verify that no errors appear in the JS console

PR Author Checklist

  • I linked the correct issue in the ### Fixed Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I added steps for the expected offline behavior in the Offline steps section
    • I added steps for Staging and/or Production testing in the QA steps section
    • I added steps to cover failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android: Native
    • Android: mWeb Chrome
    • iOS: Native
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))
  • If new assets were added or existing ones were modified, I verified that:
    • The assets are optimized and compressed (for SVG files, run npm run compress-svg)
    • The assets load correctly across all supported platforms.
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • I added unit tests for any new feature or bug fix in this PR to help automatically prevent regressions in this user flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.

Screenshots/Videos

Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari

… AppTest

The subscription now reaches reconnectApp through the internal
triggerFullReconnect call, so the AppTest spy on the reconnectApp export
no longer observes the full-reconnect trigger. Spying on
triggerFullReconnect (called via the live export binding) restores a
real assertion in both subscription tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant