Skip to content

Fix SDIO blocking delays and state management issues#11674

Open
sensei-hacker wants to merge 2 commits into
iNavFlight:release/9.1from
sensei-hacker:fix/sdio-retry-blocking-delay
Open

Fix SDIO blocking delays and state management issues#11674
sensei-hacker wants to merge 2 commits into
iNavFlight:release/9.1from
sensei-hacker:fix/sdio-retry-blocking-delay

Conversation

@sensei-hacker

Copy link
Copy Markdown
Member

Summary

Remove blocking delay(1) calls from SDIO retry paths and fix write-path state management. Blocking delays on the flight-control hot path cause ~3 ms CPU stalls during transient SD card DMA rejections, resulting in loop overruns and dropped gyro samples during active logging.

Problem

Blocking delay on hot path:
The driver calls delay(1) when DMA operations are rejected before retrying. In INAV's cooperative scheduler, this stalls the entire flight-control loop, gyro sampling, and PID control — exactly when loop timing matters most (during active blackbox logging).

State management bug:
The write path sets sdcard.state = SDCARD_STATE_SENDING_WRITE before the DMA call. On rejection, state remains SENDING_WRITE instead of READY. The poll cycle then calls SD_CheckWrite() on a transfer that never started, potentially invoking callbacks for failed writes.

Solution

Remove blocking delays: Both retry branches now return SDCARD_OPERATION_BUSY/false immediately. The asyncfatfs layer re-invokes on the next loop naturally, pacing retries without blocking.

Fix state assignment: Only set sdcard.state = SDCARD_STATE_SENDING_WRITE after confirming SD_WriteBlocks_DMA() succeeds. This mirrors the read-path pattern and ensures correct retry behavior.

Testing

Build: MATEKH743 (H7 SDIO target) compiles cleanly, no warnings
Code review: Both commits approved
Backend audit: No similar issues in other SD-card drivers

Compliance

Retry-before-reset principle preserved — still retries 3 times before reset
No new complexity — pure removal of blocking calls
No behavior change — only timing/reliability improvements

Files Changed

  • src/main/drivers/sdcard/sdcard_sdio.c

Related

  • Commit 146ff5616c (introduced retry logic)
  • Project: claude/projects/active/fix-sdio-retry-blocking-delay/

Remove blocking `delay(1)` calls from SDIO SD-card driver retry logic in
both read and write paths. These delays stalled the flight-control loop
on transient DMA/bus rejections, causing loop overruns and dropped gyro
samples during active blackbox logging.

The retries are naturally paced by the asyncfatfs/blackbox polling loop
(one iteration per PID loop), which eliminates the need for an explicit
delay. By returning SDCARD_OPERATION_BUSY / false immediately on rejection,
the cooperative scheduler continues running other tasks instead of spinning
the CPU for 1 ms per retry.

Impact:
- Eliminates up to ~3 ms of blocking CPU stall per transient rejection
  (3 retries × 1 ms delay at 2 kHz loop rate)
- Retries still occur up to SDCARD_MAX_OPERATION_RETRIES (3) times before
  card reset, preserving the retry-before-reset intent
- Retry spacing now depends on loop rate:
  - 1 kHz loop: 1 ms spacing (same as before)
  - 2-4 kHz loop: 250-500 µs spacing (tighter, safer for fast loops)

Backend audit: sdcard_spi.c, sdmmc_sdio_f4xx.c, and sdmmc_sdio_hal.c
all have no per-block hot-path blocking delays (only init/reset delays).
Move state assignment after confirming DMA operation starts successfully.
Previously, sdcard.state was set to SDCARD_STATE_SENDING_WRITE before the
SD_WriteBlocks_DMA() call. On DMA rejection and retry, the state remained
SENDING_WRITE, causing the next poll cycle to call SD_CheckWrite() on a
transfer that never actually started, leading to stale status
misinterpretation.

This fix mirrors the read-path behavior (sdcardSdio_readBlock) where state
is only set after confirming the DMA operation succeeded. This ensures:
- Retry mechanism works correctly (state stays READY for next attempt)
- Poll cycle sees accurate state
- No risk of invoking completion callbacks for failed operations

This bug becomes critical after removing blocking delays from retry paths,
as retries now happen immediately on the next loop iteration instead of
after 1ms, exposing the state inconsistency more frequently.

Related: commit 146ff56 (introduced the retry logic with this bug)
@qodo-code-review

Copy link
Copy Markdown
Contributor

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant