Skip to content

ipc4: logging: prevent re-initialization on multiple enable calls#10516

Merged
kv2019i merged 1 commit intothesofproject:mainfrom
tmleman:topic/upstream/pr/ipc4/logging/fix_crash
Feb 4, 2026
Merged

ipc4: logging: prevent re-initialization on multiple enable calls#10516
kv2019i merged 1 commit intothesofproject:mainfrom
tmleman:topic/upstream/pr/ipc4/logging/fix_crash

Conversation

@tmleman
Copy link
Contributor

@tmleman tmleman commented Feb 3, 2026

Multiple IPC4_ENABLE_LOGS commands can cause FW crashes due to re-initialization of already active resources. This occurs when:

  • Backend is auto-started at boot and test sends enable command
  • Test sends multiple enable commands for reconfiguration
  • Enable is called without checking current state

Changes:

  • Use K_MUTEX_DEFINE() for static mutex initialization instead of runtime k_mutex_init() to prevent re-initialization
  • Check work queue handler before initializing with k_work_init_delayable() to avoid re-initializing active work queue
  • Check log_backend_is_active() before calling log_backend_enable() to handle auto-started backend case
  • Check work queue handler before flushing and clear it after flush
  • Check backend active state before calling log_backend_disable()

This makes the enable/disable operations idempotent and safe to call multiple times, allowing future support for runtime reconfiguration of logging parameters (aging timer, log level, etc.) while preventing crashes from resource re-initialization.

Multiple IPC4_ENABLE_LOGS commands can cause FW crashes due to
re-initialization of already active resources. This occurs when:
- Backend is auto-started at boot and test sends enable command
- Test sends multiple enable commands for reconfiguration
- Enable is called without checking current state

Changes:
- Use K_MUTEX_DEFINE() for static mutex initialization instead of
  runtime k_mutex_init() to prevent re-initialization
- Check work queue handler before initializing with
  k_work_init_delayable() to avoid re-initializing active work queue
- Check log_backend_is_active() before calling log_backend_enable() to
  handle auto-started backend case
- Check work queue handler before flushing and clear it after flush
- Check backend active state before calling log_backend_disable()

This makes the enable/disable operations idempotent and safe to call
multiple times, allowing future support for runtime reconfiguration of
logging parameters (aging timer, log level, etc.) while preventing
crashes from resource re-initialization.

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
Copilot AI review requested due to automatic review settings February 3, 2026 15:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents firmware crashes caused by re-initialization of logging resources when multiple IPC4_ENABLE_LOGS commands are received. The changes make logging enable/disable operations idempotent and safe for repeated invocation.

Changes:

  • Replace runtime mutex initialization with static initialization using K_MUTEX_DEFINE()
  • Add state checks before initializing work queues and enabling/disabling log backends
  • Clear work queue handler after flushing to prevent double-flush attempts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +144 to +146
/* Initialize work queue if not already initialized */
if (!log_work.work.handler)
k_work_init_delayable(&log_work, log_work_handler);
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking log_work.work.handler without mutex protection creates a race condition. Multiple threads could pass the null check simultaneously and both call k_work_init_delayable(). Wrap this check and initialization in a k_mutex_lock(&log_mutex) / k_mutex_unlock(&log_mutex) block.

Suggested change
/* Initialize work queue if not already initialized */
if (!log_work.work.handler)
k_work_init_delayable(&log_work, log_work_handler);
/* Initialize work queue if not already initialized */
k_mutex_lock(&log_mutex, K_FOREVER);
if (!log_work.work.handler)
k_work_init_delayable(&log_work, log_work_handler);
k_mutex_unlock(&log_mutex);

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +166
if (log_work.work.handler) {
k_work_flush_delayable(&log_work, &ipc4_log_work_sync);
log_work.work.handler = NULL;
}
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handler check and subsequent operations lack mutex protection, allowing race conditions where the work queue could be modified by another thread between the check and the flush/clear operations. Protect this entire block with k_mutex_lock(&log_mutex) / k_mutex_unlock(&log_mutex).

Copilot uses AI. Check for mistakes.
/* Flush work queue if initialized */
if (log_work.work.handler) {
k_work_flush_delayable(&log_work, &ipc4_log_work_sync);
log_work.work.handler = NULL;
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually clearing the handler field breaks the work queue abstraction and could interfere with Zephyr's internal work queue state management. Consider using a separate boolean flag to track initialization state instead of relying on internal fields of the work structure.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks. The existing implementation really wasn't ready to handle multiple IPCs to enable/disable the logs...

@tmleman
Copy link
Contributor Author

tmleman commented Feb 4, 2026

@kv2019i do you know what happened to sof-ci?

@tmleman
Copy link
Contributor Author

tmleman commented Feb 4, 2026

SOFCI TEST

@kv2019i
Copy link
Collaborator

kv2019i commented Feb 4, 2026

@tmleman Let's see new CI build goes. Yesterday's build job of this PR landed on a builder that had a setup probelem and it failed (not related to this PR). Today's builds have gone fine, so let's see how the re-run goes today.

@lgirdwood
Copy link
Member

@kv2019i do you know what happened to sof-ci?

Something odd here, still building after 3 hours. Will enquire.

@lgirdwood
Copy link
Member

SOFCI TEST

@kv2019i
Copy link
Collaborator

kv2019i commented Feb 4, 2026

An unrelated issue with captures cases in https://sof-ci.01.org/sofpr/PR10516/build18986/devicetest/index.html (filed an internel infra ticket for this). Rest looks good, so proceeding with merge.

@kv2019i kv2019i merged commit 15f94f6 into thesofproject:main Feb 4, 2026
46 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants