Conversation
There was a problem hiding this comment.
Pull request overview
Fixes an IDC submission race in the Zephyr/P4WQ-based IDC path by changing how work items are allocated/submitted per target core, and aligns a notifier call site with the codebase’s “local core mask” convention.
Changes:
- Expand IDC per-target work storage to two slots and add a mutex intended to serialize submissions.
- Adjust IDC work selection logic to avoid re-submitting a work item that hasn’t been fully released by P4WQ yet.
- Replace a direct core-bitmask expression with
NOTIFIER_TARGET_CORE_LOCALin the DAI trigger notifier.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/idc/zephyr_idc.c | Attempts to prevent P4WQ re-submit race by adding 2 work items/core and a mutex around selection/submission. |
| src/audio/dai-zephyr.c | Uses NOTIFIER_TARGET_CORE_LOCAL instead of BIT(cpu_get_id()) when firing local-core notifier events. |
Comments suppressed due to low confidence (1)
src/idc/zephyr_idc.c:156
k_mutex_lock(&idc_mutex, ...)is not released on the error path when the target core is down. The earlyreturn -EACCES;will leave the mutex locked and can deadlock all subsequent IDC sends. Ensure the mutex is unlocked before returning (or checkcpu_is_core_enabled()before taking the mutex).
k_mutex_lock(&idc_mutex, K_FOREVER);
if (work->thread)
/* See comment above the idc_work[] array. */
work++;
/* Same priority as the IPC thread which is an EDF task and under Zephyr */
work->priority = CONFIG_EDF_THREAD_PRIORITY;
work->deadline = 0;
work->handler = idc_handler;
work->sync = mode == IDC_BLOCKING;
if (!cpu_is_core_enabled(target_cpu)) {
tr_err(&zephyr_idc_tr, "Core %u is down, cannot sent IDC message", target_cpu);
return -EACCES;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Under Zephyr SOF IDC uses Zephyr's P4WQ for message passing. In it k_p4wq_submit() checks that the new work item isn't already on the queue, and it does that by checking its .thread member. That pointer is set to NULL in p4wq_loop() after calling the work handler with no locks held. So, the work handler could've completed handling an IPC, a response could have been sent and a new IPC could arrive, while p4wq_loop() still hasn't cleared the .thread pointer. To fix this use a mutex in idc_send_msg() to protect the work items. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Use NOTIFIER_TARGET_CORE_LOCAL instead of open coding it. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
kv2019i
left a comment
There was a problem hiding this comment.
This is a nice fix. I have a faint memory we've been looking at this problem as well, but can't find any PR now (I recall there was an attempt to have a pool of Idc_work items. But this seems simpler and does the job.
jsarha
left a comment
There was a problem hiding this comment.
LGTM, and also "fixes" the same assert when I try to use my "user" IPC thread to play audio on pipelines that are not on core 0 (it still does not work, as is expected, but does not hit an assert either).
fix an IDC race when using P4WQ