Skip to content

[2.3.0] performance enhancements#60

Draft
TheAngryRaven wants to merge 29 commits into
masterfrom
BETA
Draft

[2.3.0] performance enhancements#60
TheAngryRaven wants to merge 29 commits into
masterfrom
BETA

Conversation

@TheAngryRaven

@TheAngryRaven TheAngryRaven commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Rollup of the BETA hardening work driven by the professional code review — fixing both criticals that weren't slipped, three HIGHs, plus CI supply-chain pinning and a small per-device logging feature. Five PRs merged so far: #59, #61, #62, #65, #66 — plus the device_name feature in #67 (open into BETA).

Added

Per-device device_name setting + DOVEX header column (#67). Logs dumped from a fleet of devices were hard to tell apart, so a new persistent device_name setting identifies the unit that produced each log. On first boot (or upgrade) a friendly default is generated by picking two of twelve racing-adjacent words at random (e.g. ApexTurbo) — the word list lives in flash with a tiny pointer table, so it costs almost no RAM. It's editable on a computer or over BLE like any other setting, and is recorded in the reserved 1 KB DOVEX header as a new trailing metadata column (after optimal_ms). Appending it keeps full backwards compatibility: older readers stop at optimal_ms and ignore the extra column, and pre-existing logs parse back with an empty device name. The device field is carried through the host-tested dovex_header unit with new legacy-header and empty/null-device round-trip coverage.

Security

SD card arbitration race — fixable from radio range (CR-1, #59). acquireSDAccess() was a non-atomic check-then-set on a shared flag, and five BLE commands (LIST, GET:, DELETE:, TLIST, TGET:) ran SdFat directly in the Bluefruit callback task, concurrent with 25 Hz session logging — overlapping commands could close a file mid-read, delete the file being streamed (DELETE: took no lock at all), or corrupt FAT outright. Now: lock transitions are atomic (FreeRTOS critical section; grant/deny table lives in the host-tested sd_access_policy unit), every SD-touching BLE command is deferred to BLUETOOTH_LOOP() so SdFat is single-task, listings hold the lock for the whole walk, and DELETE refuses while a transfer streams. Protocol impact: overlapping commands get the existing BUSY/TERR:BUSY replies instead of executing concurrently.

Fixed

Sleep mode actually sleeps (CR-4, #61). The tach ISR latched tachHavePeriod on the first engine pulse since boot and nothing cleared it, so every sleep entry instantly RPM-bounced back into race mode with logging enabled, draining the pack overnight. enterSleepMode() now calls TACH_SLEEP() to re-arm the wake trigger; a real engine pulse still wakes straight into race mode.

Lap times rendered wrong on the live pages (H-1/H-2, #61). Three pages appended millisecond zero-padding after the value (1:23.007 displayed as 1:23.700 — a 693 ms error), and the lap list did no padding at all. All six divergent inline copies of the ms→M:SS.mmm math were replaced by the host-tested lap_format unit (three zero-minutes styles preserve each page's layout; replay's already-correct rendering unchanged).

Track detection no longer throttles the main loop (H-3, #66). The O(N) software-double haversine manifest scan was gated on gpsData.fix (which stays true between PVT updates) and ran every ~250 Hz loop iteration. Now throttled to 1 Hz.

Tach ring can't be lapped during SD stalls (H-4, #66). The pulse ISR advanced head unconditionally; a documented 100 ms–2 s SD GC stall at racing RPM overran the 16-entry ring, breaking the SPSC invariant and logging confident-but-wrong RPM that also fed the >500 RPM auto-race trigger. The ISR now checks full, drops, and flags; TACH_LOOP() discards the one period spanning the gap.

GPS baud recovery can't trip the watchdog (H-5, #66). GPS_BAUD_RECOVERY() (up to three ~1.1 s probes + ~500 ms delays) could out-wait the 4 s WDT against a hung module — reset → recovery → reset boot loop. It now pets the WDT before each blocking probe.

Changed

CI pins DovesLapTimer per channel (#62 + #65). BETA-targeted builds track the library's BETA branch so the two beta channels move together; master CI and release/tag builds pin v4.1.0 instead of floating on the library's default-branch tip. compile-sketch selects the ref dynamically from the build's target branch, so the same file content is correct on both branches and nothing needs flipping at release time. (Master got the identical workflow commit directly via #63.)

Test infrastructure

Three new host-tested pure units per the project convention — sd_access_policy (arbitration decision table), lap_format (lap-time rendering incl. the exact H-1 regression case), and tach_filter (Kalman predict/update math, partial H-8) — all wired into the unit-test, coverage, and clang-tidy CI jobs. The device_name work (#67) adds the dovex_header device column with legacy/empty-device coverage. Host suite is now 120 test cases / 1605 assertions.

Known-remaining (deliberately not in this release)

  • Two slipped criticals, accepted as known issues for now.
  • H-6/H-7: BirdsEye.ino god-file decomposition and layering — needs module-boundary decisions first.
  • H-8 remainder: extracting the OTA protocol state machine and BLE command dispatcher to tested units — each deserves its own reviewed PR.

Full details per change in CHANGELOG.md [Unreleased], which rolls into 2.3.0 with this merge.

claude and others added 2 commits June 11, 2026 03:32
…k task

The SD access 'mutex' was a non-atomic check-then-set on a volatile int,
called from both the main loop and the Bluefruit callback task, and several
BLE commands ran SdFat directly in the callback:

- LIST/TLIST iterated directories for seconds while only peeking the flag
- GET:/TGET: started transfers in the callback, closing a file the main
  loop could be mid-read() on
- DELETE: took no lock at all and could remove the file being streamed

Any of these could interleave with 25 Hz session logging: FAT corruption,
lost logs, SPI wedge — triggerable from radio range.

Fixes:
- acquireSDAccess()/releaseSDAccess() now commit state transitions inside
  a FreeRTOS critical section (taskENTER_CRITICAL — BASEPRI-masked, so
  SoftDevice radio interrupts are unaffected).
- The grant/deny decision table is extracted into the host-tested
  sd_access_policy pure unit (single source of truth for the SD_ACCESS_*
  values; covered by tests/sd_access_policy_test.cpp, wired into the
  unit-test, coverage, and clang-tidy CI jobs).
- LIST/GET:/DELETE:/TLIST/TGET: are deferred to BLUETOOTH_LOOP() via a
  one-deep command buffer, like the settings/track/OTA commands already
  were — SdFat is never touched from the callback task. A second command
  while one is queued gets the protocol's BUSY / TERR:BUSY reply.
- Directory listings hold the SD lock for the whole walk; bleDeleteFile()
  takes the lock and refuses with BUSY while a transfer is streaming (the
  idempotent same-mode re-acquire would otherwise let it delete the
  streamed file and then drop the transfer's lock).
- Queued commands are dropped on disconnect/BLE_STOP so a stale command
  can't execute on the next session.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ion-mhs8yd

Fix SD arbitration race (CR-1): atomic acquire + no SdFat in the BLE callback task
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Coverage — host-testable units

📂 Overall coverage

Metric Coverage
Lines 🟢 259/260 (99.6%)
Functions 🟢 26/26 (100.0%)
Branches 🟡 237/271 (87.5%)

📄 File coverage

File Lines Functions Branches
BirdsEye/crc32.cpp 🟢 30/30 (100.0%) 🟢 4/4 (100.0%) 🟢 24/24 (100.0%)
BirdsEye/dovex_header.cpp 🟢 100/101 (99.0%) 🟢 6/6 (100.0%) 🔴 61/92 (66.3%)
BirdsEye/filename_validator.cpp 🟢 14/14 (100.0%) 🟢 1/1 (100.0%) 🟢 30/30 (100.0%)
BirdsEye/gps_time.cpp 🟢 45/45 (100.0%) 🟢 6/6 (100.0%) 🟢 30/32 (93.8%)
BirdsEye/gps_validation.cpp 🟢 24/24 (100.0%) 🟢 2/2 (100.0%) 🟢 66/66 (100.0%)
BirdsEye/haversine.cpp 🟢 8/8 (100.0%) 🟢 1/1 (100.0%) ⚫ 0/0 (0.0%)
BirdsEye/lap_format.cpp 🟢 18/18 (100.0%) 🟢 1/1 (100.0%) 🟢 9/9 (100.0%)
BirdsEye/sd_access_policy.cpp 🟢 5/5 (100.0%) 🟢 2/2 (100.0%) 🟢 10/10 (100.0%)
BirdsEye/tach_filter.cpp 🟢 15/15 (100.0%) 🟢 3/3 (100.0%) 🟡 7/8 (87.5%)

@TheAngryRaven TheAngryRaven changed the title [2.4.0] performance enhancements [2.3.0] performance enhancements Jun 11, 2026
claude and others added 24 commits June 11, 2026 03:50
CR-4: Sleep mode was permanently defeated after the first engine pulse.
The tach ISR set tachHavePeriod and nothing ever cleared it, so every
sleep entry (long-press, 5-min menu idle, USB) read the stale flag and
instantly bounced through exitSleepMode(true) back into race mode with
logging enabled — silently starting a session and draining the pack
overnight. New TACH_SLEEP() (called from enterSleepMode()) re-arms the
wake trigger and drops stale ring-buffer/Kalman state so the wake's RPM
computation starts clean; the ISR stays attached and the next real pulse
still RPM-wakes straight into race mode as designed.

H-1/H-2: Lap-time formatting was six divergent inline copies of the
60000/%1000 math with four padding behaviors. Three live pages
(current/best/optimal lap) appended the millisecond zero-padding AFTER
the value — 1:23.007 rendered as 1:23.700, a 693 ms error — and the
lap-history list did no padding at all (1:5.7). All six sites now call
one host-tested pure unit, lap_format::formatLapTime() (ms always three
zero-padded digits), with three zero-minutes styles preserving each
page's layout: kOmit (replay results, already-correct rendering
unchanged), kShow (lap list, now a proper 0:SS.mmm fixed field), and
kSpace (big-font live pages, column-stable decimal point). Covered by
tests/lap_format_test.cpp including the exact H-1 regression case;
wired into the unit-test, coverage, and clang-tidy CI jobs.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ormat-mhs8yd

Fix sleep-mode RPM-wake latch (CR-4) and lap-time rendering (H-1/H-2)
…aster/release pin v2.3.1

beta.yml installs the lap-timer library from its BETA branch so the two
beta channels move together. release.yml pins the known-good v2.3.1 tag
instead of floating on the library's default-branch tip. compile-sketch
picks the ref dynamically from the build's target branch (pull_request
base_ref or push/dispatch ref_name): BETA-targeted builds use the
library's BETA branch, everything else uses v2.3.1 — one file content
works unchanged on both branches, so BETA→master merges stay clean and
nothing needs flipping at release time.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
CI: DovesLapTimer per channel — BETA builds track library BETA, master/release pin v2.3.1
…0, not v2.3.1

v2.3.1 doesn't exist in the DovesLapTimer repo, so the library checkout
failed ('pathspec v2.3.1 did not match any file(s)') and the
compile-sketch jobs went down before compiling anything. Docs updated to
match.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…llowup-mhs8yd

CI: fix DovesLapTimer pin to v4.1.0 — #62 merged before the correction landed
…ery (H-3/H-4/H-5, partial H-8)

H-3: trackDetectionLoop() gated only on gpsData.fix, which stays true
between PVT updates, so the O(N) software-double haversine manifest scan
(several ms at the 200-entry ceiling on the single-precision-FPU M4F)
ran every ~250 Hz loop iteration, collapsing the loop rate. Throttled to
1 Hz — still instant at driving pace.

H-4: the tach pulse ISR advanced the ring head unconditionally. An SD GC
stall (documented 100 ms–2 s — the same failure the GPS serial ring
exists for) at racing RPM overruns the 16-entry buffer; lapping the
consumer breaks the SPSC invariant and TACH_LOOP computes a
confident-but-wrong RPM that lands in the CSV and feeds the >500 RPM
auto-race trigger. The ISR now does the SPSC full check (one slot
sacrificed, matching the GPS serial ISR), drops the pulse, and sets
tachRingOverflow; TACH_LOOP discards the carried prev-timestamp so the
one period spanning the gap is never computed and the Kalman estimate
coasts. tachRingTail becomes volatile (the ISR reads it now); the wake
trigger still fires on dropped pulses.

H-5: GPS_BAUD_RECOVERY() runs from GPS_LOOP() under the armed ~4 s WDT
but can block for up to three ~1.1 s myGNSS.begin() probes plus ~500 ms
of delays — a genuinely hung module out-waits the watchdog and the one
path built to recover a sick GPS becomes a reset/recovery boot loop. It
now pets the WDT before each blocking probe (same treatment
fwStageToFlash() already had).

H-8 (partial): the tach Kalman filter math (predict/update, Q/R/floor
tuning constants, period→RPM conversion) is extracted to the host-tested
tach_filter pure unit per the project convention, with tests covering
convergence, monotonic approach, R-scaling with period count, the
uncertainty floor, and reset semantics. The OTA protocol state machine
and BLE command dispatcher extractions are deliberately deferred — they
are large refactors that deserve their own PRs.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ening-mhs8yd

Fix hot-path track scan, tach ring overflow, WDT-unsafe GPS recovery (H-3/H-4/H-5, partial H-8)
Logs dumped from a fleet of devices are hard to tell apart, so add a
persistent device_name setting. On first boot (or upgrade), a friendly
default is generated by picking two of twelve racing-adjacent words at
random (e.g. ApexTurbo). Editable on a computer or over BLE like any
other setting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FCfoUzUcLQgDKhG2T41G1h
Add the logging device's name to the reserved 1 KB DOVEX header as a new
trailing metadata column (after optimal_ms). Appending it keeps full
backwards compatibility: older readers stop at optimal_ms and ignore the
extra column, and pre-existing logs (no column) parse back with an empty
device name.

The pure dovex_header::format()/parse() unit carries the new field;
writeDovexHeader() feeds it the device_name setting so every session logs
which unit produced it. Host tests updated, with explicit legacy-header
and empty-device round-trip coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FCfoUzUcLQgDKhG2T41G1h
…uhewjs

Add device_name setting and record it in the DOVEX header
The Transfer screen now opens a Bluetooth-vs-USB submenu. Bluetooth keeps
the existing BLE file-transfer flow untouched; USB presents the SD card to
a host computer as a standard drive (TinyUSB MSC) for drag-and-drop of
track files and logs.

- New usb_msc module: Adafruit_USBD_MSC wrapping SdFat's block device
  (readSectors/writeSectors/syncDevice). USB_MSC_SETUP() registers the
  callbacks at boot but presents no drive, so charging/plug-in is unchanged.
  USB_MSC_ENABLE() acquires the SD mutex, sets capacity, and re-enumerates
  via detach/attach so the host mounts the drive on demand. USB_MSC_DISABLE()
  reboots (NVIC_SystemReset) so the firmware remounts a clean filesystem
  after host edits and the drive drops.
- New SD_ACCESS_USB_MSC mode in the host-tested sd_access_policy unit (a
  normal exclusive holder); policy tests extended to cover it.
- New PAGE_TRANSFER_MENU and PAGE_USB_STORAGE pages wired into the menu
  dispatch, menuLimit, and selection handlers.
- CHANGELOG and CLAUDE.md (File Map, SD modes, pages, subsystem 12) updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_013agh5b8yPouZYESrPDZhvh
The SD card runs at a slow 2 MHz to harden against ignition EMI, which caps
throughput at ~250 KB/s — painfully slow for USB mass-storage drag-and-drop.
File transfers only happen parked with the motor off, so the EMI rationale
doesn't apply during a transfer.

- New sdSetSpiClock()/sdSetTransferSpeed() in sd_functions: re-running
  SD.begin() is the supported runtime SPI-clock switch. SD_SETUP() now shares
  sdSetSpiClock(). sdSetTransferSpeed(true) bumps to SD_SPI_SPEED_FAST (8 MHz)
  and falls back to 2 MHz if the fast re-init fails.
- BLE: fast at the top of BLE_SETUP(), restored in BLE_STOP() (the auto-reboot
  on disconnect also restores it).
- USB: fast in USB_MSC_ENABLE() before block I/O; restored on the early-bail
  path, and by the reboot in USB_MSC_DISABLE() on normal exit.
- SD_SPI_SPEED_FAST is a named constant (8 MHz) — bump to 16 MHz later if the
  board sustains it.
- CHANGELOG and CLAUDE.md (SD subsystem, EMI section, constants) updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_013agh5b8yPouZYESrPDZhvh
…nsfer-godwkr

Add USB mass-storage file transfer with Transfer-mode submenu
While the USB mass-storage drive is mounted the host PC owns the SD card,
but the main loop kept running GPS_LOOP(), trackDetectionLoop(),
checkAutoIdle() etc., so the firmware and host could both drive the FAT.
Corruption was prevented only as a side effect of the SD-access policy
denying those acquires, not by design.

Add a usbMscActive early-return branch in loop() mirroring the existing
bleActive branch: skip all GPS/tach/lap/SD processing, service only the
Exit button (Select) and the status-page refresh, then return. The host
now owns the card uncontested.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
The only exit from mass-storage mode was the on-device Exit button. Pulling
the USB cable — the natural way to finish a drag-and-drop session — left the
device stuck with usbMscActive true, holding SD_ACCESS_USB_MSC and the
EMI-unsafe 8 MHz SD clock forever. A subsequent driving session would then
be silently denied SD logging with no indication.

The parked USB loop branch now checks isUsbConnected() (VBUS via
NRF_POWER->USBREGSTATUS) each iteration and calls USB_MSC_DISABLE() (reboot)
when the cable is removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
USB_MSC_ENABLE() ignored the bool returned by usb_msc.begin(); on a failed
MSC interface registration it still set usbMscActive and re-enumerated,
leaving the user on a "drive active" screen with no drive. Now bail cleanly
on failure: drop media-ready, restore the normal SD clock, release the SD
lock, and return false (the UI already shows the SD-busy warning on false).

Also raise the detach()->attach() re-enumeration delay from 10 ms to 50 ms;
10 ms is below the USB spec's recommended detach settle window and some
hosts would not re-mount the drive reliably.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
msc_read_cb/msc_write_cb computed the sector count with bufsize / 512 and
trusted the comment that bufsize is always a multiple of 512. A malformed
host transfer with a non-multiple size would silently read/write fewer
sectors than the host believed succeeded. Reject non-aligned/zero sizes
with -1, and retry the SD op up to 3x to match the EMI-hardening retry
pattern used elsewhere instead of aborting the whole transfer on one glitch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
USB_MSC_DISABLE() went straight to delay(50) + NVIC_SystemReset(), relying
entirely on the host having ejected first. If the user hit the on-device
Exit (or pulled the cable) without safely ejecting, any buffered write was
lost and the FAT could be left inconsistent across the hard reset. Now drop
setUnitReady(false) and SD.card()->syncDevice() before resetting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
The file-request callback copies at most 64 bytes into buffer[65], so a
command longer than that was silently truncated and then run through
filename validation and path building as a mangled-but-plausible name. The
raw-data paths (track upload, OTA image chunks) already return earlier using
data/len directly, so by the command-dispatch point anything len >= 65 is a
malformed command, never a valid (FAT-short) filename command. Reject with
ERROR rather than acting on the truncated string.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
The lap_format refactor changed the big-font live pages from space-padded
to zero-padded seconds once a lap passes one minute (1:05.007 vs the old
1: 5.007). Same field width, so the column stays stable, but the change was
undocumented. Add an explicit column-stability test across the 1-minute
boundary and call out the cosmetic change in the CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
Document the new usbMscActive loop-parking branch, VBUS-unplug exit,
begin()-failure bailout, longer re-enumeration delay, and the
flush/media-ready drop before the exit reboot.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Gjbi1LXFAFetL5GPohypVu
Code-review fixes for #60: USB MSC hardening + low-severity cleanups
claude and others added 2 commits June 27, 2026 15:32
A code-quality sweep prompted by spontaneous on-track reboots (suspected
failing SD card tray). Fixes four edge cases where a flaky card or a
glitched I2C bus turns into a reboot loop or silent corruption:

- i2cBusRecover(): feed the watchdog before each blocking I2C re-init step
  so the recovery routine can't itself trip the 4 s WDT (boot loop under
  sustained ignition EMI); recover a hung bus inline instead of one frame
  late.
- DOVEX header pre-fill: verify each write() and abort log init cleanly on
  a short write instead of marking logging ready over a truncated header.
- Mid-session write failure: attempt writeDovexHeader() before close so the
  session's lap times survive a dying card.
- buildTrackList(): hold SD_ACCESS_TRACK_PARSE for the directory walk (the
  lone SD consumer that bypassed the mutex) and check the directory open.

Host unit tests pass; CHANGELOG updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WkdH4kE9pBKSfRBM7FTG1k
…ses-dpbi7p

Harden SD/I2C failure paths against on-track reboots & corruption
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants