ipc4: logging: prevent re-initialization on multiple enable calls#10516
Conversation
Multiple IPC4_ENABLE_LOGS commands can cause FW crashes due to re-initialization of already active resources. This occurs when: - Backend is auto-started at boot and test sends enable command - Test sends multiple enable commands for reconfiguration - Enable is called without checking current state Changes: - Use K_MUTEX_DEFINE() for static mutex initialization instead of runtime k_mutex_init() to prevent re-initialization - Check work queue handler before initializing with k_work_init_delayable() to avoid re-initializing active work queue - Check log_backend_is_active() before calling log_backend_enable() to handle auto-started backend case - Check work queue handler before flushing and clear it after flush - Check backend active state before calling log_backend_disable() This makes the enable/disable operations idempotent and safe to call multiple times, allowing future support for runtime reconfiguration of logging parameters (aging timer, log level, etc.) while preventing crashes from resource re-initialization. Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
There was a problem hiding this comment.
Pull request overview
This PR prevents firmware crashes caused by re-initialization of logging resources when multiple IPC4_ENABLE_LOGS commands are received. The changes make logging enable/disable operations idempotent and safe for repeated invocation.
Changes:
- Replace runtime mutex initialization with static initialization using K_MUTEX_DEFINE()
- Add state checks before initializing work queues and enabling/disabling log backends
- Clear work queue handler after flushing to prevent double-flush attempts
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /* Initialize work queue if not already initialized */ | ||
| if (!log_work.work.handler) | ||
| k_work_init_delayable(&log_work, log_work_handler); |
There was a problem hiding this comment.
Checking log_work.work.handler without mutex protection creates a race condition. Multiple threads could pass the null check simultaneously and both call k_work_init_delayable(). Wrap this check and initialization in a k_mutex_lock(&log_mutex) / k_mutex_unlock(&log_mutex) block.
| /* Initialize work queue if not already initialized */ | |
| if (!log_work.work.handler) | |
| k_work_init_delayable(&log_work, log_work_handler); | |
| /* Initialize work queue if not already initialized */ | |
| k_mutex_lock(&log_mutex, K_FOREVER); | |
| if (!log_work.work.handler) | |
| k_work_init_delayable(&log_work, log_work_handler); | |
| k_mutex_unlock(&log_mutex); |
| if (log_work.work.handler) { | ||
| k_work_flush_delayable(&log_work, &ipc4_log_work_sync); | ||
| log_work.work.handler = NULL; | ||
| } |
There was a problem hiding this comment.
The handler check and subsequent operations lack mutex protection, allowing race conditions where the work queue could be modified by another thread between the check and the flush/clear operations. Protect this entire block with k_mutex_lock(&log_mutex) / k_mutex_unlock(&log_mutex).
| /* Flush work queue if initialized */ | ||
| if (log_work.work.handler) { | ||
| k_work_flush_delayable(&log_work, &ipc4_log_work_sync); | ||
| log_work.work.handler = NULL; |
There was a problem hiding this comment.
Manually clearing the handler field breaks the work queue abstraction and could interfere with Zephyr's internal work queue state management. Consider using a separate boolean flag to track initialization state instead of relying on internal fields of the work structure.
kv2019i
left a comment
There was a problem hiding this comment.
Nice, thanks. The existing implementation really wasn't ready to handle multiple IPCs to enable/disable the logs...
|
@kv2019i do you know what happened to sof-ci? |
|
SOFCI TEST |
|
@tmleman Let's see new CI build goes. Yesterday's build job of this PR landed on a builder that had a setup probelem and it failed (not related to this PR). Today's builds have gone fine, so let's see how the re-run goes today. |
Something odd here, still building after 3 hours. Will enquire. |
|
SOFCI TEST |
|
An unrelated issue with captures cases in https://sof-ci.01.org/sofpr/PR10516/build18986/devicetest/index.html (filed an internel infra ticket for this). Rest looks good, so proceeding with merge. |
Multiple IPC4_ENABLE_LOGS commands can cause FW crashes due to re-initialization of already active resources. This occurs when:
Changes:
This makes the enable/disable operations idempotent and safe to call multiple times, allowing future support for runtime reconfiguration of logging parameters (aging timer, log level, etc.) while preventing crashes from resource re-initialization.