Harden SD/I2C failure paths against on-track reboots & corruption#70
Merged
Merged
Conversation
A code-quality sweep prompted by spontaneous on-track reboots (suspected failing SD card tray). Fixes four edge cases where a flaky card or a glitched I2C bus turns into a reboot loop or silent corruption: - i2cBusRecover(): feed the watchdog before each blocking I2C re-init step so the recovery routine can't itself trip the 4 s WDT (boot loop under sustained ignition EMI); recover a hung bus inline instead of one frame late. - DOVEX header pre-fill: verify each write() and abort log init cleanly on a short write instead of marking logging ready over a truncated header. - Mid-session write failure: attempt writeDovexHeader() before close so the session's lap times survive a dying card. - buildTrackList(): hold SD_ACCESS_TRACK_PARSE for the directory walk (the lone SD consumer that bypassed the mutex) and check the directory open. Host unit tests pass; CHANGELOG updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WkdH4kE9pBKSfRBM7FTG1k
Coverage — host-testable units📂 Overall coverage
📄 File coverage
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A code-quality sweep prompted by spontaneous on-track reboots (suspected failing SD card tray). The reboot itself is the 4 s hardware watchdog doing its job when SdFat blocks on a card that's physically dropping out mid-write — that can't be fully prevented in software. But four spots turned a flaky card or a glitched I2C bus into a reboot loop or silent corruption; this fixes those:
i2cBusRecover()could boot-loop. It re-inits Wire + the OLED over I2C but never fed the watchdog (unlike the GPS baud-recovery path), so under sustained ignition EMI it could trip the WDT while recovering and re-trip on reboot. Now pets the watchdog before each blocking re-init step, andsafeDisplayUpdate()recovers a hung bus inline instead of one frame late.write()return, so a card dropping sectors during log creation was still marked "ready" and streamed rows into a truncated region. Now verifies every write and aborts log init cleanly (retries next second).writeDovexHeader()before closing.buildTrackList()bypassed the SD mutex. The lone SD consumer touching SdFat raw, leaving a window where a BLE track upload/delete completing during a logging teardown could hit SdFat from two tasks. Now holdsSD_ACCESS_TRACK_PARSEfor the directory walk (both callers release first, so no self-deadlock) and checks the directory open.Type of change
How it was verified
ctest --test-dir tests/build) — pure units untouched, ran as a regression checkclang-tidyclean — relying on CIChecklist
CHANGELOG.mdupdated under[Unreleased]ARCHITECTURE.md/CLAUDE.mdupdated — no module or interface changedtests/— changes are Arduino/SdFat-bound failure paths, not host-testable pure logicNotes
A related finding from the same sweep was left out as a separate concern: auto-race can fire while viewing replay, and live racing + replay share the same
lapHistory[]buffer (stale-read hazard). Also worth wiringNRF_POWER->RESETREASto a debug surface to distinguish watchdog resets (card stalls) from hard faults.🤖 Generated with Claude Code
Generated by Claude Code