fix(zkernel): mark ISR-reachable list helpers K_ISR_SAFE (IRAM) — closes #53#54
Closed
swoisz wants to merge 135 commits into
Closed
fix(zkernel): mark ISR-reachable list helpers K_ISR_SAFE (IRAM) — closes #53#54swoisz wants to merge 135 commits into
swoisz wants to merge 135 commits into
Conversation
docs: CI badge, update URL
fix: linker fragment collecting SYS_INITs so IDF doesn't drop them
fix: many resolved and validated iterable sections for logging, shell…
fix: deferred logging bringup, redundant static and unused variables …
The sdkconfig.boreas requirement lived only in the zkernel component README -- the top-level Usage section (the first thing an integrator reads) now wires it into the submodule instructions, including the one-time config regeneration step. CHANGELOG.md captures the 2026-06 hardening series as a downstream migration checklist: the defaults-list addition (#41), the k_work return-code audit (#37), the k_work_cancel polarity inversion (#38), the cancel_delayable_sync collapse of triple-cancel workarounds, the reserved notification index, and the k_sem_take abort restriction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat!: notification-backed k_sem (own state, no FreeRTOS control block)
Second primitive converted under #40, on the architecture proven by the k_sem conversion: events word and waiter list in the caller-owned struct under a portMUX; blocking on the shared reserved notification index; stack-resident waiter nodes severed under the lock before any return; the consume-the-in-flight-give protocol for the timeout race. A post wakes ALL waiters whose conditions become met: satisfied nodes are unlinked into a local chain in one lock pass and notified after unlock (safe -- woken waiters block in the consume path until the notification lands). This retires the FreeRTOS event-group backend and with it three parity gaps plus a hard ceiling: - Full 32-bit events (EventBits_t reserved the top byte: 24 usable). - k_event_set now REPLACES the tracked set (upstream: setting differs from posting); previously it merged, aliasing k_event_post. - wait/wait_all reset=true zeroes the ENTIRE tracked set BEFORE waiting (previously: cleared only the matched bits, after). - wait_all on timeout returns 0 (previously could return a truthy partial match). New upstream surface: k_event_set_masked, k_event_test, k_event_wait_safe / k_event_wait_all_safe (atomically consume matched bits), all mutators return the previous value of the affected bits, K_EVENT_DEFINE is a true compile-time initializer, and k_event_init returns void (upstream signature; it can no longer fail). The notification index and its config requirement move to zkernel_internal.h (Z_KERNEL_NOTIFY_INDEX), shared by k_sem and k_event -- safe because a task blocks on at most one primitive at a time and every blocking call drains in-flight notifications before returning. BREAKING: k_event_set merge->replace (use k_event_post to accumulate); reset=true semantics as above; k_event_init signature. In-tree tests updated intentionally; the MT multi-setter now uses k_event_post and the consume-on-wake test uses k_event_wait_safe. Tests: 190 -> 196 (set-replaces vs post-merges, previous-value returns, set_masked, wait_safe consumption, wait_all timeout-zero, full-32-bit bits 24/31, K_EVENT_DEFINE static init). Linux suite green x3. Part of #40 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- The post-unlock chain walk's safety premise is now stated precisely at the source: within the protocol exactly one in-flight notification can exist per blocked waiter (targeting is exclusive under the lock; every return path drains what woke it), so a chained waiter cannot be released before our notify -- the premise IS the notification-index reservation, the same one k_sem_give's post-unlock handle use rests on. - ISR-context posts accumulate the higher-priority-woken flag across the wake chain and yield once after the loop instead of per waiter. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat!: notification-backed k_event (full 32 bits, upstream semantics)
Closes the coverage gaps from the #41/#42 design review (issue #43): - timeout-vs-give stress (k_sem) and timeout-vs-post stress (k_event): sweep the giver/poster firing time across the taker's 20 ms timeout (early / same-tick / late) for 100 iterations, checking conservation invariants on every outcome and probing the reserved notification index for stranded notifications after each cycle. The giver runs as a raw FreeRTOS task pinned to the other core on multicore targets (k_thread_create pins to core 0). - give-racing-the-park: 1000 zero-delay handshake iterations hammer the unlock-sample-relock recheck window and the enqueue->park window inside k_sem_take. - multi-waiter beyond two: 4-waiter conservation, FIFO order among equal-priority waiters (pins the strict '>' in z_sem_pop_waiter), k_sem_reset waking 3 waiters, single k_event_post waking 3 waiters. - FromISR coverage (HW-gated on CONFIG_K_TIMER_DISPATCH_ISR): k_sem_give from real ISR context with a prompt-wake latency bound proving the portYIELD_FROM_ISR path (mid-tick expiry via K_USEC); k_sem_take(K_NO_WAIT) from ISR (upstream isr-ok contract); k_event_post from ISR waking 3 waiters (accumulate-yield-once path). Doc notes on the declarations (inline-divergence convention): - k_sem_take: ISR-legal only with K_NO_WAIT (upstream contract); waiter priority is cached at enqueue (k_thread_priority_set on a blocked waiter does not re-sort the wake order -- upstream re-sorts). - k_sem_give: documented wake order (highest priority, FIFO among equals). - k_sem_reset: task context only. Upstream verification showed upstream's k_sem_reset is NOT isr-ok either (no @isr_ok; calls z_reschedule), so this is parity, not a divergence -- the #43 item proposing an ISR-safe reset was based on a wrong premise. The stale-priority note is NOT added to k_event_wait: k_event wakes ALL satisfied waiters with no priority ordering, so there is no stale value to document. linux suite: 204/0 (197 + 7; the 3 ISR tests compile out on linux). Refs #43 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review fan-out findings folded: - k_sem_take, z_event_wait_internal, and the four k_event_wait* wrappers are now K_ISR_SAFE (IRAM-resident). The documented isr-ok-with-K_NO_WAIT contract was unsound for flash-resident code: the esp_timer ISR is allocated ESP_INTR_FLAG_IRAM, so the take/wait fast paths would fault if the ISR fired during a concurrent flash operation. New HW-gated tests pin the IRAM residency (prior art: the iram_attr tests in test_k_timer.c). - Prompt-wake test retuned: expiry at 10.1 ms (was 10.5) makes a missing portYIELD_FROM_ISR cost ~900 us instead of ~500, so the 700 us bound has wide margin against both false pass (deferred wake) and false fail (cache-cold first run, tick coincidence). - The both-outcomes sweep asserts are now HW-only: on the linux target a loaded CI host can stall the tick clock and collapse the sweep onto one outcome. The per-iteration conservation invariants (which hold under every interleaving) still run everywhere. - BUILD_ASSERT(CONFIG_FREERTOS_HZ >= 1000): the 18..22 ms sweep separation quantizes away at coarser ticks. - Multi-waiter state hygiene: mw_result/mw_order sentinel-reset per test (stale values from a prior test can no longer satisfy assertions), mw_next is a Zephyr atomic_t. - kernel.h: precise wording on the K_NO_WAIT-from-ISR note (spinlock only, no task-notify/blocking calls, IRAM-resident). - test_k_event_mt.c: pre-existing sizeof(stack) call sites migrated to K_THREAD_STACK_SIZEOF; FreeRTOS-priority-convention reminder on the priority-wake test. linux suite: 204/0 x3. Refs #43 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ning test: stress coverage for k_sem/k_event race windows (#43)
Rewrites k_timer_status_sync (both backends) from the k_msleep(1) busy-poll to a true blocking wait on a binary k_sem embedded in struct k_timer -- the original PR-4 design, now safe: the April crash shapes that killed it were root-caused to the pre-#18 k_thread zombies (#21) and are green regression tests, and the sem itself is notification-backed (#41) with synchronous severance. Upstream-verified semantics (kernel/timer.c): - single-waiter model (upstream's wait_q holds "the (single) thread waiting on this timer"; expiry/stop wake exactly one) -> a binary sem latch matches exactly - woken by expiry OR stop; status re-read after wake; status reset to 0 on every path; returns 0 immediately when stopped - the wake fires AFTER the expiry callback / stop_fn (upstream order) The sem is a wake LATCH, not a counter -- the expiry count stays in timer->status. A give latched while nobody waited would satisfy a later take early, so status_sync re-checks and re-blocks in a loop; pinned by the new test_timer_status_sync_after_status_get_blocks regression. Wakes are now immediate instead of quantized to the FreeRTOS tick; the busy-poll divergence note on the declaration is replaced. k_sem_give is ISR-safe (IRAM), so the expiry-path give works under CONFIG_K_TIMER_DISPATCH_ISR=y, the ESP_TIMER_TASK fallback, and the linux dispatcher. K_TIMER_DEFINE gains the compile-time sem initializer; k_work_init_delayable inherits the sem init via k_timer_init. struct k_timer grows by sizeof(struct k_sem) (~24 B), including inside k_work_delayable. k_timer_remaining_get needed no change: it already returns uint32_t milliseconds like upstream's k_ticks_to_ms_floor32 shape. linux suite: 206/0 x3 (204 + 2). Closes #28 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review fan-out findings folded: - One-shot exchange-vs-running window closed: between status_sync's status exchange and its running load, a one-shot expiry could complete entirely (status=1, running=false, give latched) and the waiter returned 0 where upstream's spinlocked read returns 1 -- terminally lost for a one-shot in a while(status_sync()) drain loop. The !running path now re-reads status instead of returning 0; the callback orders status++ (RELEASE) before running=false (RELEASE), so an ACQUIRE load of running==false is guaranteed to observe the increment. - test_timer_status_sync_after_status_get_blocks made deterministic: one-shot expiries only and t0 captured BEFORE the restart, so the elapsed lower bound holds under arbitrary host stall (the periodic re-arm racing the entry was CI-flaky both directions). - @note on k_timer_start: restart does not wake a blocked status_sync waiter (upstream parity). - @note on k_timer_init: re-init while a status_sync waiter is blocked is caller error (clobbers the embedded sem's waiter list). - Z_SEM_INITIALIZER single-sources the compile-time k_sem initializer body for K_SEM_DEFINE and K_TIMER_DEFINE. Adversarial review refuted (no change needed): zeroed-sem gives are structurally unreachable before k_timer_init (running-guard + handle-guard); stale latches cost at most one extra loop iteration (binary sem caps at 1, no busy-spin); SMP ordering is sound (RELEASE-before-give, portMUX barriers, aligned 32-bit status). linux suite: 206/0 x3. Refs #28 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Upstream unpends aborted threads from the timer wait queue; Boreas cannot, so this restriction is a divergence, not parity -- matching the wording already on k_sem_take. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix: k_timer_status_sync blocks on an embedded sem, not a poll loop (#28)
Near-verbatim port of upstream's ring_buffer.h + lib/utils/ring_buffer.c (both Apache-2.0), the canonical companion to the UART IRQ pattern and the byte transport the direct-UART shell (#31) and a modbus serial backend want. Pure arithmetic + memcpy; no kernel dependencies, nothing blocks. Byte-mode API: ring_buf_init, RING_BUF_DECLARE, reset, size/space/ capacity_get, is_empty, put/get/peek, and the zero-copy claim/finish pairs. Indices use upstream's base-offset scheme (head/tail/base of ring_buf_idx_t, byte offset = head-base with one wrap correction, RING_BUFFER_MAX_SIZE caps size to half the index range so unsigned subtraction disambiguates full vs empty). CONFIG_RING_BUFFER_LARGE widens indices to 32-bit, matching upstream. Adaptations for Boreas (documented in the files): - include layout (zephyr/sys/util.h) and lowercase upstream min() -> MIN. - toolchain shims added to sys/util.h: __ASSERT_NO_MSG, likely/unlikely, __noinit (mapped to zero-init storage -- correct here, and avoids __NOINIT_ATTR wrongly surviving deep sleep), __deprecated. Also made __ASSERT self-contained (#include <stdlib.h> for abort()). Deliberate divergence: the upstream item-mode API (ring_buf_item_put/ _get, RING_BUF_ITEM_DECLARE*) is NOT ported -- it is @deprecated upstream ("use <zephyr/sys/ringq.h>") and unused by the byte-transport consumers this targets. Concurrency is unchanged: lock-free for single-producer/single-consumer in separate contexts; multiple producers/consumers must serialize externally. Tests (15): accounting, round-trip, capacity saturation, partial get, NULL-discard, peek-no-consume, reset, claim/finish (put+get), surplus return, over-claim -EINVAL, and the wrap-sensitive cases -- claim splits at the physical end, put/get spanning the wrap, and a 20000-cycle stream through a 7-byte (non-power-of-two) buffer exercising base advancement and the index arithmetic across many wraps. linux suite: 221/0 x3 (206 + 15). Closes #45 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tests Review fan-out (fidelity + conformance + adversarial) findings folded: - Restore the public RING_BUF_INIT macro (upstream API completeness; RING_BUF_DECLARE is now defined in terms of it, matching upstream structure). A downstream `struct ring_buf rb = RING_BUF_INIT(...)` now compiles. - Retain upstream's "Copyright (c) 2015 Intel Corporation" line on the two verbatim files alongside the Intercreate port line (Apache-2.0 4(c) preservation for a substantial verbatim derivative; the repo is going public). - Document the __noinit limitation: Boreas wires no .noinit section, so no-init/retained-RAM semantics are not available (storage is plain zero-init BSS). - Tests: assert ring_buf_put_claim returns 0 on a full buffer (the zero-copy path UART IRQ uses), and assert ring_buf_space_get across a wrap (the put.head - get.tail subtraction path). Adversarial index-arithmetic probe found zero constructible failures: the UINT16_MAX/2 cap is exact and inclusive (size=32768 would alias head-base to 0 mod 65536 and is rejected at build/init), and the non-power-of-two-size x uint16-index-wrap case is safe because only modular differences are used and tail-base is held in [0,size) by one base bump per wrap. The 20000-cycle test crosses the uint16 boundary, so it covers that class. linux suite: 221/0 x3. Refs #45 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lic) A pre-public licensing audit compared every Zephyr-API file against the upstream original (function bodies and macro expansions, not just names). Apache-2.0 4(c) requires retaining the upstream copyright only where actual copyrightable expression was copied. Retain upstream copyright (genuine verbatim/near-verbatim content): - sys/util.h: IS_ENABLED's _XXXX##/_YYYY token-paste trick is upstream verbatim -> Copyright (c) 2011-2014 Wind River Systems, Inc. - sys/byteorder.h: the 32-bit be/le composition mirrors upstream's canonical chaining -> Copyright (c) 2015-2016 Intel Corporation. - (sys/ring_buffer.h + ring_buf.c already carry Copyright (c) 2015 Intel Corporation from the port commit.) Deliberately NOT attributed (independent reimplementations -- adding an upstream notice would be false attribution): sys/dlist.h, sys/slist.h (upstream is macro-generated; ours is hand-written over a different struct layout), sys/atomic.h (inline __atomic wrappers vs upstream's extern decls), sys/time_units.h (us-stored k_timeout_t + FreeRTOS ticks vs upstream's Hz-based z_tmcvt), and all of zshell/zsys/zdevice (the shell, logging, and device model are independent implementations over ESP-IDF/FreeRTOS; only the public API surface matches Zephyr). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Folded findings from the code-review fan-out (no correctness bugs were found; these are robustness/coverage/doc improvements): - ring_buffer.h + ring_buf.c now #include "sdkconfig.h" explicitly. struct ring_buf's index width is selected by CONFIG_RING_BUFFER_LARGE, so the config must be visible wherever the struct is defined. It was safe before only by a transitive chain (util.h -> esp_attr.h -> sdkconfig.h); making it explicit removes a silent cross-TU ABI hazard if that chain ever changes, and matches the sibling .c convention. - util.h: corrected the __noinit comment. The prior rationale was wrong (ESP-IDF DOES ship __NOINIT_ATTR + a .noinit section, and that attribute survives warm reset, not deep sleep). The real reasons to map __noinit to plain BSS: ring_buf never reads unwritten bytes (so no-init vs zero-init is invisible), and plain BSS avoids a section attribute that does not port to the Mach-O host toolchain. The no-init limitation is documented. - util.h: dropped the unused __deprecated shim (no users; add toolchain shims when a real user lands). likely/unlikely kept (unlikely is used by ring_buf.c; both are #ifndef-guarded and identical in value to ESP-IDF's, so the include-order race is codegen-only). - test: added test_ring_buf_full_empty_across_wrap -- fills to full / drains to empty for 40 cycles (163 KB through a 4 KB buffer, crossing the uint16 index wrap ~2.5x), asserting the full (space==0, put refused) and empty discriminations AT high `allocated` across the wrap. The existing many-wrap test drains within a 7-byte span, so it never exercised full-vs-empty near the index wrap. linux suite: 222/0 x3. Refs #45 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
util.h was accreting symbols whose upstream homes are <zephyr/toolchain.h> (likely/unlikely, __weak, ALWAYS_INLINE, __noinit) and <zephyr/sys/__assert.h> (__ASSERT, __ASSERT_NO_MSG), while the repo otherwise mirrors upstream include paths exactly -- the devlog TODO already records a downstream smf port hand-editing upstream #includes because of this. Thin headers at the upstream paths fix that now, while they have 1-2 consumers; util.h re-includes both so existing ports keep working. Also two correctness fixes folded in: - __noinit now maps to __NOINIT_ATTR (real .noinit on hardware, no-op on the linux target via esp_attr.h's CONFIG_IDF_TARGET_LINUX gate in both IDF v5.4 and v5.5). The old empty shim silently gave zero-init BSS semantics to any future warm-reset-retention user, and its stated Mach-O blocker was false -- the macOS host build of the linux target compiles clean. - likely/unlikely now document the divergence from esp_compiler.h (ESP-IDF's are CONFIG_COMPILER_OPTIMIZATION_PERF-gated; Boreas matches upstream Zephyr's unconditional __builtin_expect; both are #ifndef-guarded so the include-order race is codegen-only). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The header advertises the byte API for thread + ISR use (the UART IRQ pattern), and the repo convention is that ISR-callable primitives are K_ISR_SAFE (k_sem, k_msgq, k_event, k_work): ESP-IDF ISRs registered with ESP_INTR_FLAG_IRAM fire during flash-cache-disabled windows (e.g. NVS commits), where calling flash-resident code panics with "Cache disabled but cached memory region accessed". All five functions are now IRAM-resident (verified via nm: 0x4037xxxx alongside k_sem_give). The internal __ASSERT_NO_MSGs are replaced with k_panic() checks: the upstream asserts compile out under its default CONFIG_ASSERT=n, but Boreas's __ASSERT is always on and its ESP_LOGE+abort failure path is itself flash-resident (util.h documents it as not IRAM-safe and prescribes k_panic() -- k_work.c already set the precedent of replacing upstream asserts on K_ISR_SAFE paths). memcpy is ROM-resident on ESP32-S3, so the copy loops remain safe. ring_buf_peek's NULL check is also hoisted above the copy loop so the documented "Cannot be NULL" contract is enforced on an empty buffer too, instead of silently returning 0 until data arrives. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…retvals Doc-only fixes from review (code unchanged, upstream-verbatim): - Concurrency note rewritten: it overclaimed "lock-free SPSC" without stating that no field is volatile and no barriers are issued. Now documents that the query inlines read one index from each side (tear-free aligned loads, conservatively stale), forbids busy-waiting on them (the compiler may hoist the load out of a call-free loop), and requires per-handoff k_sem pairing on SMP. Also notes the implementation is K_ISR_SAFE. - RING_BUF_INIT: @warning that size8 > RING_BUFFER_MAX_SIZE is NOT checked on this path (unlike RING_BUF_DECLARE/ring_buf_init) and silently corrupts; macro body stays upstream-verbatim. - put_claim/get_claim/peek: @notes that finishing rewinds head to tail, so an outstanding claim must be finished before any other same-side call (including the copy-mode functions); restored upstream's dropped "with a non-zero `size`" qualifier on the peek doc. - put_finish/get_finish: -EINVAL condition corrected -- the code (here and upstream) rejects size > outstanding claimed bytes, not "exceeds free space"/"valid bytes" as upstream's doc has always misstated. - size_get: adopted current upstream main's "available data" wording (v3.7's "used space" miscounts outstanding claims); three @retval-with-prose fixed to @return (also fixed upstream post-v3.7). - Removed seven @warnings referencing the unported ring_buf_item_ API; RING_BUF_DECLARE gains a file-scope-only @note. - Kconfig: RING_BUFFER_LARGE costs 12 extra bytes per struct ring_buf on the 32-bit target (6 index fields x 2 bytes; measured 20 -> 32), not 6 as the help text claimed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two coverage gaps closed: - test_ring_buf_index_wrap_seeded: the traffic-volume stress tests can only reach the default uint16_t 65536 wrap; under CONFIG_RING_BUFFER_LARGE the 2^32 wrap was silently untested in exactly the config the option exists for. Seeding the indices just below the wrap via ring_buf_internal_reset (the technique upstream's ringbuffer test suite uses) makes the full/empty discrimination coverage index-width-independent. Verified locally: full suite passes on the linux target with CONFIG_RING_BUFFER_LARGE=y. - test_ring_buf_peek_across_wrap_multi_claim_get: peek was only tested contiguous-at-offset-0; now covers peek walking both physical segments of wrapped data and rewinding via its internal get_finish(0), plus multiple get-side claims completed by a single finish. Cleanups: per-byte TEST_ASSERT_EQUAL_UINT8 loops (~300k Unity calls across the two stress tests) replaced with per-chunk TEST_ASSERT_EQUAL_UINT8_ARRAY (better failure diagnostics: reports the element index); put_claim_wraps now drains and verifies its patterned writes landed at the correct physical offsets instead of leaving them unchecked; NULL-discard priming replaces drain-into-out + memset; dead sequence fill dropped from the saturates test. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e (Copilot) The file-level comment claimed all definitions are #ifndef-guarded, but ALWAYS_INLINE is intentionally #undef'd and redefined unconditionally (the RISC-V -Werror=attributes fix). Call out the exception so the include-order contract isn't misleading. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat: port ring_buf (sys/ring_buffer.h) + pre-public copyright audit (#45)
gpio_dt.h defined GPIO_ACTIVE_LOW but not its companion GPIO_ACTIVE_HIGH. Active-high pins already behaved correctly (active-high is the absence of the active-low bit), but code referencing the symbol explicitly failed to compile. Define GPIO_ACTIVE_HIGH as (0 << 0), mirroring upstream Zephyr's named-zero form in dt-bindings/gpio/gpio.h. Add a test asserting it is zero and that a spec declared with it behaves identically to the active-high default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
feat(zdevice): add GPIO_ACTIVE_HIGH flag macro
k_sem_give() is K_ISR_SAFE (IRAM_ATTR) so it can run from an ESP-IDF IRAM ISR while the flash cache is disabled, but on its hot path it called z_sem_pop_waiter(), a flash-resident static helper. When the give came from an IRAM ISR during a concurrent flash op (e.g. an IRAM GPIO ISR submitting work -> k_work_submit -> k_sem_give), the fetch of z_sem_pop_waiter faulted with a cache-access error and the chip panicked (issue #53, found and fixed locally by a downstream). Mark z_sem_pop_waiter K_ISR_SAFE so it relocates into IRAM. It is pure list-walking over the caller-owned waiter list with no FreeRTOS calls, so it is safe in IRAM. Its other caller, k_sem_reset(), is flash-resident, but flash code calling an IRAM helper is fine. Sweeping the rest of the K_ISR_SAFE call graph for the same class (an IRAM function calling a flash-resident static helper) turned up one sibling: z_event_match() in k_event.c, called on both ISR-safe paths -- the post family (z_event_post_internal) and the wait family fast path (z_event_wait_internal). Mark it K_ISR_SAFE too; it is pure arithmetic with no FreeRTOS calls. No behavioral change; host (linux target) suite unaffected (224/224), clang-format clean. The fault is a target-only memory-placement issue the host build cannot observe. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The issue-#53 fault is invisible to the host (linux) suite: it is a target-only memory-placement property. This guard runs against a real target ELF and fails if any K_ISR_SAFE helper landed in a flash-mapped text section instead of .iram0.text -- catching the exact regression where an IRAM_ATTR caller reaches a flash-resident static helper. Section-name based (.iram0.text vs .flash.text), so it is ISA-agnostic across Xtensa and RISC-V targets. Validated on esp32s3: z_sem_pop_waiter and z_event_match resolve into .iram0.text, while non-ISR helpers (k_sem_reset, k_timer_start) remain in .flash.text. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire tools/check_iram_symbols.sh into the existing build-esp32s3 job so every PR verifies that K_ISR_SAFE helpers are IRAM-resident on a real target ELF -- the issue-#53 regression the host (linux) suite cannot observe. Reuses the target build already produced by that job, so no extra CI cost. Also make the script derive its symbol set from source (every K_ISR_SAFE definition in components/*/src) instead of a hand-maintained list, so a newly-added ISR-safe helper is covered automatically. Inlined/gc'd symbols report "skip" (safe); only an out-of-line, flash-resident symbol fails. Validated on esp32s3: 27 symbols derived, 25 confirmed in .iram0.text (incl. z_sem_pop_waiter and z_event_match), 2 skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #53.
k_sem_give()isK_ISR_SAFE(IRAM_ATTR) so it can run from an ESP-IDF IRAM ISR while the flash cache is disabled, but on its hot path it calledz_sem_pop_waiter()— a flash-residentstatichelper. When the give comes from an IRAM ISR during a concurrent flash op (e.g. an IRAM GPIO ISR submitting work →k_work_submit→k_sem_give), fetching the flash-mapped helper faults with a cache-access error and the chip panics. Found and fixed locally by a downstream project (verified on esp32c5).Changes
z_sem_pop_waiter→K_ISR_SAFE(k_sem.c). Pure list-walk over the caller-owned waiter list, no FreeRTOS calls — safe in IRAM. Its other callerk_sem_reset()is flash-resident, but flash → IRAM calls are fine.z_event_match→K_ISR_SAFE(k_event.c). A sweep of the wholeK_ISR_SAFEcall graph for the same bug class (an IRAM function calling a flash-residentstatichelper) turned up exactly one sibling:z_event_match, called on both ISR-safe paths — the post family (z_event_post_internal) and the wait fast path (z_event_wait_internal). Pure arithmetic, no FreeRTOS calls.tools/check_iram_symbols.sh+ a step in the existingbuild-esp32s3job). Derives theK_ISR_SAFEsymbol set from source and asserts each is in.iram0.texton the target ELF — the regression the host (linux) suite structurally cannot observe. Inlined/gc'd symbols reportskip; only an out-of-line flash-resident symbol fails.Call-graph sweep
Audited every
K_ISR_SAFE/IRAM_ATTRentry point (k_sem, k_event, k_work, k_msgq, k_thread, k_timer, ring_buf, watchdog, gpio_dt). The rest were already clean: shared helpers (z_kernel_lock/unlock, allsys_dlist_*,ring_buf_*_claim/finish,k_timeout_*) are headerstatic inline/ALWAYS_INLINE(inline into the IRAM caller, no symbol);k_panic()is#define … __builtin_trap()(no symbol); FreeRTOS*FromISRpaths are IDF's IRAM responsibility.Two items outside this bug class, flagged for follow-up (not changed here):
esp_timer_start_periodicreached fromk_timer_esp_callbackunderCONFIG_K_TIMER_DISPATCH_ISR— an IDF API on the ISR-dispatch path.memcpyinring_buf.c— documented ROM-resident on ESP32-S3; confirm the same on RISC-V parts (C5/C6).Verification
z_sem_pop_waiterandz_event_matchresolve into.iram0.text(0x4037xxxx); non-ISR helpers likek_sem_reset/k_timer_startcorrectly stay in.flash.text(0x4201xxxx), confirming the check discriminates rather than rubber-stamps. 27 K_ISR_SAFE symbols derived, 25 confirmed IRAM, 2 skipped (inlined/not linked), exit 0.Follow-up offered (not in this PR)
A RISC-V (esp32c5) CI job would extend the guard to the exact ISA the downstream hit, and would cover the
memcpy-residency concern above. Held back because esp32c5 is preview in IDF v5.4 and could destabilize the matrix — happy to add it if wanted.