[2.3.0] performance enhancements#60
Draft
TheAngryRaven wants to merge 29 commits into
Draft
Conversation
…k task The SD access 'mutex' was a non-atomic check-then-set on a volatile int, called from both the main loop and the Bluefruit callback task, and several BLE commands ran SdFat directly in the callback: - LIST/TLIST iterated directories for seconds while only peeking the flag - GET:/TGET: started transfers in the callback, closing a file the main loop could be mid-read() on - DELETE: took no lock at all and could remove the file being streamed Any of these could interleave with 25 Hz session logging: FAT corruption, lost logs, SPI wedge — triggerable from radio range. Fixes: - acquireSDAccess()/releaseSDAccess() now commit state transitions inside a FreeRTOS critical section (taskENTER_CRITICAL — BASEPRI-masked, so SoftDevice radio interrupts are unaffected). - The grant/deny decision table is extracted into the host-tested sd_access_policy pure unit (single source of truth for the SD_ACCESS_* values; covered by tests/sd_access_policy_test.cpp, wired into the unit-test, coverage, and clang-tidy CI jobs). - LIST/GET:/DELETE:/TLIST/TGET: are deferred to BLUETOOTH_LOOP() via a one-deep command buffer, like the settings/track/OTA commands already were — SdFat is never touched from the callback task. A second command while one is queued gets the protocol's BUSY / TERR:BUSY reply. - Directory listings hold the SD lock for the whole walk; bleDeleteFile() takes the lock and refuses with BUSY while a transfer is streaming (the idempotent same-mode re-acquire would otherwise let it delete the streamed file and then drop the transfer's lock). - Queued commands are dropped on disconnect/BLE_STOP so a stale command can't execute on the next session. https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ion-mhs8yd Fix SD arbitration race (CR-1): atomic acquire + no SdFat in the BLE callback task
Coverage — host-testable units📂 Overall coverage
📄 File coverage
|
CR-4: Sleep mode was permanently defeated after the first engine pulse. The tach ISR set tachHavePeriod and nothing ever cleared it, so every sleep entry (long-press, 5-min menu idle, USB) read the stale flag and instantly bounced through exitSleepMode(true) back into race mode with logging enabled — silently starting a session and draining the pack overnight. New TACH_SLEEP() (called from enterSleepMode()) re-arms the wake trigger and drops stale ring-buffer/Kalman state so the wake's RPM computation starts clean; the ISR stays attached and the next real pulse still RPM-wakes straight into race mode as designed. H-1/H-2: Lap-time formatting was six divergent inline copies of the 60000/%1000 math with four padding behaviors. Three live pages (current/best/optimal lap) appended the millisecond zero-padding AFTER the value — 1:23.007 rendered as 1:23.700, a 693 ms error — and the lap-history list did no padding at all (1:5.7). All six sites now call one host-tested pure unit, lap_format::formatLapTime() (ms always three zero-padded digits), with three zero-minutes styles preserving each page's layout: kOmit (replay results, already-correct rendering unchanged), kShow (lap list, now a proper 0:SS.mmm fixed field), and kSpace (big-font live pages, column-stable decimal point). Covered by tests/lap_format_test.cpp including the exact H-1 regression case; wired into the unit-test, coverage, and clang-tidy CI jobs. https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ormat-mhs8yd Fix sleep-mode RPM-wake latch (CR-4) and lap-time rendering (H-1/H-2)
…aster/release pin v2.3.1 beta.yml installs the lap-timer library from its BETA branch so the two beta channels move together. release.yml pins the known-good v2.3.1 tag instead of floating on the library's default-branch tip. compile-sketch picks the ref dynamically from the build's target branch (pull_request base_ref or push/dispatch ref_name): BETA-targeted builds use the library's BETA branch, everything else uses v2.3.1 — one file content works unchanged on both branches, so BETA→master merges stay clean and nothing needs flipping at release time. https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
CI: DovesLapTimer per channel — BETA builds track library BETA, master/release pin v2.3.1
…0, not v2.3.1
v2.3.1 doesn't exist in the DovesLapTimer repo, so the library checkout
failed ('pathspec v2.3.1 did not match any file(s)') and the
compile-sketch jobs went down before compiling anything. Docs updated to
match.
https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…llowup-mhs8yd CI: fix DovesLapTimer pin to v4.1.0 — #62 merged before the correction landed
…ery (H-3/H-4/H-5, partial H-8) H-3: trackDetectionLoop() gated only on gpsData.fix, which stays true between PVT updates, so the O(N) software-double haversine manifest scan (several ms at the 200-entry ceiling on the single-precision-FPU M4F) ran every ~250 Hz loop iteration, collapsing the loop rate. Throttled to 1 Hz — still instant at driving pace. H-4: the tach pulse ISR advanced the ring head unconditionally. An SD GC stall (documented 100 ms–2 s — the same failure the GPS serial ring exists for) at racing RPM overruns the 16-entry buffer; lapping the consumer breaks the SPSC invariant and TACH_LOOP computes a confident-but-wrong RPM that lands in the CSV and feeds the >500 RPM auto-race trigger. The ISR now does the SPSC full check (one slot sacrificed, matching the GPS serial ISR), drops the pulse, and sets tachRingOverflow; TACH_LOOP discards the carried prev-timestamp so the one period spanning the gap is never computed and the Kalman estimate coasts. tachRingTail becomes volatile (the ISR reads it now); the wake trigger still fires on dropped pulses. H-5: GPS_BAUD_RECOVERY() runs from GPS_LOOP() under the armed ~4 s WDT but can block for up to three ~1.1 s myGNSS.begin() probes plus ~500 ms of delays — a genuinely hung module out-waits the watchdog and the one path built to recover a sick GPS becomes a reset/recovery boot loop. It now pets the WDT before each blocking probe (same treatment fwStageToFlash() already had). H-8 (partial): the tach Kalman filter math (predict/update, Q/R/floor tuning constants, period→RPM conversion) is extracted to the host-tested tach_filter pure unit per the project convention, with tests covering convergence, monotonic approach, R-scaling with period count, the uncertainty floor, and reset semantics. The OTA protocol state machine and BLE command dispatcher extractions are deliberately deferred — they are large refactors that deserve their own PRs. https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…r (sibling passed in 2.5 min) https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ening-mhs8yd Fix hot-path track scan, tach ring overflow, WDT-unsafe GPS recovery (H-3/H-4/H-5, partial H-8)
Logs dumped from a fleet of devices are hard to tell apart, so add a persistent device_name setting. On first boot (or upgrade), a friendly default is generated by picking two of twelve racing-adjacent words at random (e.g. ApexTurbo). Editable on a computer or over BLE like any other setting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01FCfoUzUcLQgDKhG2T41G1h
Add the logging device's name to the reserved 1 KB DOVEX header as a new trailing metadata column (after optimal_ms). Appending it keeps full backwards compatibility: older readers stop at optimal_ms and ignore the extra column, and pre-existing logs (no column) parse back with an empty device name. The pure dovex_header::format()/parse() unit carries the new field; writeDovexHeader() feeds it the device_name setting so every session logs which unit produced it. Host tests updated, with explicit legacy-header and empty-device round-trip coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01FCfoUzUcLQgDKhG2T41G1h
…uhewjs Add device_name setting and record it in the DOVEX header
The Transfer screen now opens a Bluetooth-vs-USB submenu. Bluetooth keeps the existing BLE file-transfer flow untouched; USB presents the SD card to a host computer as a standard drive (TinyUSB MSC) for drag-and-drop of track files and logs. - New usb_msc module: Adafruit_USBD_MSC wrapping SdFat's block device (readSectors/writeSectors/syncDevice). USB_MSC_SETUP() registers the callbacks at boot but presents no drive, so charging/plug-in is unchanged. USB_MSC_ENABLE() acquires the SD mutex, sets capacity, and re-enumerates via detach/attach so the host mounts the drive on demand. USB_MSC_DISABLE() reboots (NVIC_SystemReset) so the firmware remounts a clean filesystem after host edits and the drive drops. - New SD_ACCESS_USB_MSC mode in the host-tested sd_access_policy unit (a normal exclusive holder); policy tests extended to cover it. - New PAGE_TRANSFER_MENU and PAGE_USB_STORAGE pages wired into the menu dispatch, menuLimit, and selection handlers. - CHANGELOG and CLAUDE.md (File Map, SD modes, pages, subsystem 12) updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_013agh5b8yPouZYESrPDZhvh
The SD card runs at a slow 2 MHz to harden against ignition EMI, which caps throughput at ~250 KB/s — painfully slow for USB mass-storage drag-and-drop. File transfers only happen parked with the motor off, so the EMI rationale doesn't apply during a transfer. - New sdSetSpiClock()/sdSetTransferSpeed() in sd_functions: re-running SD.begin() is the supported runtime SPI-clock switch. SD_SETUP() now shares sdSetSpiClock(). sdSetTransferSpeed(true) bumps to SD_SPI_SPEED_FAST (8 MHz) and falls back to 2 MHz if the fast re-init fails. - BLE: fast at the top of BLE_SETUP(), restored in BLE_STOP() (the auto-reboot on disconnect also restores it). - USB: fast in USB_MSC_ENABLE() before block I/O; restored on the early-bail path, and by the reboot in USB_MSC_DISABLE() on normal exit. - SD_SPI_SPEED_FAST is a named constant (8 MHz) — bump to 16 MHz later if the board sustains it. - CHANGELOG and CLAUDE.md (SD subsystem, EMI section, constants) updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_013agh5b8yPouZYESrPDZhvh
…nsfer-godwkr Add USB mass-storage file transfer with Transfer-mode submenu
While the USB mass-storage drive is mounted the host PC owns the SD card, but the main loop kept running GPS_LOOP(), trackDetectionLoop(), checkAutoIdle() etc., so the firmware and host could both drive the FAT. Corruption was prevented only as a side effect of the SD-access policy denying those acquires, not by design. Add a usbMscActive early-return branch in loop() mirroring the existing bleActive branch: skip all GPS/tach/lap/SD processing, service only the Exit button (Select) and the status-page refresh, then return. The host now owns the card uncontested. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
The only exit from mass-storage mode was the on-device Exit button. Pulling the USB cable — the natural way to finish a drag-and-drop session — left the device stuck with usbMscActive true, holding SD_ACCESS_USB_MSC and the EMI-unsafe 8 MHz SD clock forever. A subsequent driving session would then be silently denied SD logging with no indication. The parked USB loop branch now checks isUsbConnected() (VBUS via NRF_POWER->USBREGSTATUS) each iteration and calls USB_MSC_DISABLE() (reboot) when the cable is removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
USB_MSC_ENABLE() ignored the bool returned by usb_msc.begin(); on a failed MSC interface registration it still set usbMscActive and re-enumerated, leaving the user on a "drive active" screen with no drive. Now bail cleanly on failure: drop media-ready, restore the normal SD clock, release the SD lock, and return false (the UI already shows the SD-busy warning on false). Also raise the detach()->attach() re-enumeration delay from 10 ms to 50 ms; 10 ms is below the USB spec's recommended detach settle window and some hosts would not re-mount the drive reliably. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
msc_read_cb/msc_write_cb computed the sector count with bufsize / 512 and trusted the comment that bufsize is always a multiple of 512. A malformed host transfer with a non-multiple size would silently read/write fewer sectors than the host believed succeeded. Reject non-aligned/zero sizes with -1, and retry the SD op up to 3x to match the EMI-hardening retry pattern used elsewhere instead of aborting the whole transfer on one glitch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
USB_MSC_DISABLE() went straight to delay(50) + NVIC_SystemReset(), relying entirely on the host having ejected first. If the user hit the on-device Exit (or pulled the cable) without safely ejecting, any buffered write was lost and the FAT could be left inconsistent across the hard reset. Now drop setUnitReady(false) and SD.card()->syncDevice() before resetting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
The file-request callback copies at most 64 bytes into buffer[65], so a command longer than that was silently truncated and then run through filename validation and path building as a mangled-but-plausible name. The raw-data paths (track upload, OTA image chunks) already return earlier using data/len directly, so by the command-dispatch point anything len >= 65 is a malformed command, never a valid (FAT-short) filename command. Reject with ERROR rather than acting on the truncated string. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
The lap_format refactor changed the big-font live pages from space-padded to zero-padded seconds once a lap passes one minute (1:05.007 vs the old 1: 5.007). Same field width, so the column stays stable, but the change was undocumented. Add an explicit column-stability test across the 1-minute boundary and call out the cosmetic change in the CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
Document the new usbMscActive loop-parking branch, VBUS-unplug exit, begin()-failure bailout, longer re-enumeration delay, and the flush/media-ready drop before the exit reboot. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
Code-review fixes for #60: USB MSC hardening + low-severity cleanups
A code-quality sweep prompted by spontaneous on-track reboots (suspected failing SD card tray). Fixes four edge cases where a flaky card or a glitched I2C bus turns into a reboot loop or silent corruption: - i2cBusRecover(): feed the watchdog before each blocking I2C re-init step so the recovery routine can't itself trip the 4 s WDT (boot loop under sustained ignition EMI); recover a hung bus inline instead of one frame late. - DOVEX header pre-fill: verify each write() and abort log init cleanly on a short write instead of marking logging ready over a truncated header. - Mid-session write failure: attempt writeDovexHeader() before close so the session's lap times survive a dying card. - buildTrackList(): hold SD_ACCESS_TRACK_PARSE for the directory walk (the lone SD consumer that bypassed the mutex) and check the directory open. Host unit tests pass; CHANGELOG updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01WkdH4kE9pBKSfRBM7FTG1k
…ses-dpbi7p Harden SD/I2C failure paths against on-track reboots & corruption
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rollup of the BETA hardening work driven by the professional code review — fixing both criticals that weren't slipped, three HIGHs, plus CI supply-chain pinning and a small per-device logging feature. Five PRs merged so far: #59, #61, #62, #65, #66 — plus the
device_namefeature in #67 (open into BETA).Added
Per-device
device_namesetting + DOVEX header column (#67). Logs dumped from a fleet of devices were hard to tell apart, so a new persistentdevice_namesetting identifies the unit that produced each log. On first boot (or upgrade) a friendly default is generated by picking two of twelve racing-adjacent words at random (e.g.ApexTurbo) — the word list lives in flash with a tiny pointer table, so it costs almost no RAM. It's editable on a computer or over BLE like any other setting, and is recorded in the reserved 1 KB DOVEX header as a new trailing metadata column (afteroptimal_ms). Appending it keeps full backwards compatibility: older readers stop atoptimal_msand ignore the extra column, and pre-existing logs parse back with an empty device name. Thedevicefield is carried through the host-testeddovex_headerunit with new legacy-header and empty/null-device round-trip coverage.Security
SD card arbitration race — fixable from radio range (CR-1, #59).
acquireSDAccess()was a non-atomic check-then-set on a shared flag, and five BLE commands (LIST,GET:,DELETE:,TLIST,TGET:) ran SdFat directly in the Bluefruit callback task, concurrent with 25 Hz session logging — overlapping commands could close a file mid-read, delete the file being streamed (DELETE:took no lock at all), or corrupt FAT outright. Now: lock transitions are atomic (FreeRTOS critical section; grant/deny table lives in the host-testedsd_access_policyunit), every SD-touching BLE command is deferred toBLUETOOTH_LOOP()so SdFat is single-task, listings hold the lock for the whole walk, andDELETErefuses while a transfer streams. Protocol impact: overlapping commands get the existingBUSY/TERR:BUSYreplies instead of executing concurrently.Fixed
Sleep mode actually sleeps (CR-4, #61). The tach ISR latched
tachHavePeriodon the first engine pulse since boot and nothing cleared it, so every sleep entry instantly RPM-bounced back into race mode with logging enabled, draining the pack overnight.enterSleepMode()now callsTACH_SLEEP()to re-arm the wake trigger; a real engine pulse still wakes straight into race mode.Lap times rendered wrong on the live pages (H-1/H-2, #61). Three pages appended millisecond zero-padding after the value (
1:23.007displayed as1:23.700— a 693 ms error), and the lap list did no padding at all. All six divergent inline copies of the ms→M:SS.mmmmath were replaced by the host-testedlap_formatunit (three zero-minutes styles preserve each page's layout; replay's already-correct rendering unchanged).Track detection no longer throttles the main loop (H-3, #66). The O(N) software-double haversine manifest scan was gated on
gpsData.fix(which stays true between PVT updates) and ran every ~250 Hz loop iteration. Now throttled to 1 Hz.Tach ring can't be lapped during SD stalls (H-4, #66). The pulse ISR advanced head unconditionally; a documented 100 ms–2 s SD GC stall at racing RPM overran the 16-entry ring, breaking the SPSC invariant and logging confident-but-wrong RPM that also fed the >500 RPM auto-race trigger. The ISR now checks full, drops, and flags;
TACH_LOOP()discards the one period spanning the gap.GPS baud recovery can't trip the watchdog (H-5, #66).
GPS_BAUD_RECOVERY()(up to three ~1.1 s probes + ~500 ms delays) could out-wait the 4 s WDT against a hung module — reset → recovery → reset boot loop. It now pets the WDT before each blocking probe.Changed
CI pins DovesLapTimer per channel (#62 + #65). BETA-targeted builds track the library's
BETAbranch so the two beta channels move together; master CI and release/tag builds pinv4.1.0instead of floating on the library's default-branch tip.compile-sketchselects the ref dynamically from the build's target branch, so the same file content is correct on both branches and nothing needs flipping at release time. (Master got the identical workflow commit directly via #63.)Test infrastructure
Three new host-tested pure units per the project convention —
sd_access_policy(arbitration decision table),lap_format(lap-time rendering incl. the exact H-1 regression case), andtach_filter(Kalman predict/update math, partial H-8) — all wired into the unit-test, coverage, and clang-tidy CI jobs. Thedevice_namework (#67) adds thedovex_headerdevice column with legacy/empty-device coverage. Host suite is now 120 test cases / 1605 assertions.Known-remaining (deliberately not in this release)
BirdsEye.inogod-file decomposition and layering — needs module-boundary decisions first.Full details per change in
CHANGELOG.md[Unreleased], which rolls into2.3.0with this merge.