-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Describe the bug
Referring to FreeRTOS SMP tag: V11.2.0 with the following relevant configuration:
#define configNUMBER_OF_CORES 2
#define configRUN_MULTIPLE_PRIORITIES 1
#define configUSE_PREEMPTION 1
#define configUSE_TASK_PREEMPTION_DISABLE 1
#define configUSE_TIME_SLICING 1
A task is performing a critical operation and cannot be interrupted. It calls vTaskPreemptionDisable(NULL) before the operation and vTaskPreemptionEnable(NULL) after the operation. There is another task at the same priority, which could run via time slicing but for the pre-emption being disabled. The critical operation is lengthy and involves switching the hardware into different modes, so an attempted context switch during the operation will crash the system. However, interrupts need to remain enabled and be serviced as normal.
During the critical operation, a timer tick occurs, causing an interrupt which calls xTaskIncrementTick(). The line
Line 4882 in 0adc196
| xYieldPendings[ xCoreID ] = pdTRUE; |
xYieldPendings[ xCoreID ] = pdTRUE, because there is another task at the same priority that could run via time slicing, but the line Line 4931 in 0adc196
| if( pxCurrentTCBs[ xCoreID ]->xPreemptionDisable == pdFALSE ) |
pxCurrentTCBs[ xCoreID ]->xPreemptionDisable != pdFALSE and thus it neither sets xSwitchRequired = pdTRUE nor calls prvYieldCore( xCoreID ). So far, so good.
Then, still during the critical operation, an unrelated I/O operation completes, causing an interrupt which calls xTaskGenericNotifyFromISR(). The unrelated task being notified, was blocked waiting for this notification, so it gets to line
Line 8154 in 0adc196
| listINSERT_END( &( xPendingReadyList ), &( pxTCB->xEventListItem ) ); |
Line 8182 in 0adc196
| prvYieldForTask( pxTCB ); |
prvYieldForTask( pxTCB ) but this routine does not do anything because it correctly sees that pxCurrentTCBs[ xCoreID ]->xPreemptionDisable != pdFALSE and so context switching is not allowed. Still good.
The bug is that upon return from prvYieldForTask(), the next line
Line 8184 in 0adc196
| if( xYieldPendings[ portGET_CORE_ID() ] == pdTRUE ) |
xYieldPendings[ portGET_CORE_ID() ] == pdTRUE, obviously intending determine the action taken by prvYieldForTask() which is a void function so doesn't inform whether it yielded or not. Although prvYieldForTask() did not yield, there is a leftover flag in xYieldPendings[] which was set earlier by xTaskIncrementTick(), even though that flag was not acted upon at the time. So it thinks that prvYieldForTask() has yielded and goes on to line Line 8188 in 0adc196
| *pxHigherPriorityTaskWoken = pdTRUE; |
*pxHigherPriorityTaskWoken = pdTRUE.
My I/O completion interrupt handler ends with a line like portYIELD_FROM_ISR(task_woken), and on the Xtensa port that I'm using, it ends up calling vTaskSwitchContext() from the assembly ISR wrapper, instead of returning to the user code that is performing my critical operation. The actual context switching is done in prvSelectHighestPriorityTask() and the if-block at
Line 1086 in 0adc196
| if( pxTCB->xTaskRunState == taskTASK_NOT_RUNNING ) |
pxCurrentTCBs[ xCoreID ]->xPreemptionDisable. It does have such a check further down, at line Line 1250 in 0adc196
| if( pxCurrentTCBs[ uxCore ]->xPreemptionDisable == pdFALSE ) |
This bug has been verified on real hardware. I had to reconstruct the above sequence of events through tracing, so it is possible that the exact failure sequence could vary slightly from what I described. Indeed it fails in several different ways depending on the exact timing of the timer ticks and unrelated I/O operations (of which several are going on simultaneously). I can see two other possible failure paths through xTaskResumeAll() or xTaskResumeFromISR(), and these are called from basically everywhere -- I think another of my failure cases is caused by my calling xSemaphoreGiveFromISR() -> xQueueGenericSend() -> xTaskResumeAll(), but I haven't definitively verified this. What I can say, is that it is definitely dangerous to leave a true value in xYieldPendings[] because there are a number of ways that it could be acted upon later.
My diagnosis is that the incorrect logic is in xTaskIncrementTick(), and that the bug was introduced with commit https://github.com/FreeRTOS/FreeRTOS-Kernel/tree/294569e495515e238dc890a8b4613d01260d1f06 which was intended to optimize the logic for configNUMBER_OF_CORES > 1. See #1118 for the discussion. The original logic was correct, as far as I can see. Backing out this commit fixes the issue for me on real hardware.
Incidentally, the discussion of that PR is claiming that the original code line
BaseType_t xYieldRequiredForCore[ configNUMBER_OF_CORES ] = { pdFALSE };
was leaving some array members uninitialized. That's not the case. It has always been legal C to initialize just the first array member and rely on the compiler to fill the rest with zeros, even for an auto array as here. MISRA may have clouded the issue -- I believe MISRA accepts an initializer of {0} to initialize the entire array.
Target
- Development board: ESP32-S3-MINI-1-N4R2
- Instruction Set Architecture: Xtensa LX
- IDE and version: command line
- Toolchain and version: xtensa-esp-elf-gcc (crosstool-NG esp-13.2.0_20240530) 13.2.0
Host
- Host OS: Devuan
- Version: Excalibur
To Reproduce
Suggest to reproduce using ESP-IDF with its OTA update feature, which needs to disable pre-emption in order to write the flash chip. A non-trivial test program would have to be written with the following features:
- Use menuconfig to set the FreeRTOS type to FreeRTOS-SMP rather than Espressif's custom FreeRTOS.
- Manually upgrade the FreeRTOS-SMP version to the appropriate tag -- Espressif uses an older one without the bug, when they eventually upgrade to a newer version they are likely to encounter this bug eventually.
- Have either semaphores or direct-to-task notifications happening during the OTA update. Have this happen from interrupts marked as IRAM-safe.
- Have multiple tasks of the same priority as the one writing the flash during the OTA update, that could execute but for the preemption being disabled.
- Suggest to have task(s) blocked waiting on the semaphores or direct-to-task notifications. Suggest these additional tasks should have higher priority.
Expected behavior
A task which has disabled pre-emption should never be pre-empted, regardless of what interrupts happen.
Screenshots
Screenshots of the failure mode and the diagnostic tracing can be provided upon request. Additional tests can be performed upon request if there is a suggestion that the current logic should have worked or that the failure mode is different than described.
Additional context
None