Skip to content

Introduce the PanicWriter trait, a standard mechanism for implementing synchronous writers for panic messages#4684

Merged
alevy merged 24 commits intomasterfrom
dev/panic-writer2
Apr 22, 2026
Merged

Introduce the PanicWriter trait, a standard mechanism for implementing synchronous writers for panic messages#4684
alevy merged 24 commits intomasterfrom
dev/panic-writer2

Conversation

@bradjc
Copy link
Copy Markdown
Contributor

@bradjc bradjc commented Dec 5, 2025

Pull Request Overview

For a long time, Tock's kernel-provided panic utilities required some type that implements IoWrite to display panic messages. How a board created such a type was up to each board, and the io.rs files we have in boards have used a variety of methods to do this. This has had the effect of creating three issues:

  1. These types (that implement IoWrite) are often instantiated as static muts, which we are trying to get rid of.
  2. These types often make assumptions about the state of the underlying hardware when the panic happens, and assume the kernel initialized the hardware correctly and that the object used by the normal kernel operation can still be used during a panic.
  3. Implementations of the types are duplicated across many boards, sometimes identically and sometimes with minor changes.

To try to fix these issues, this PR (from the static mut task force) proposes a new trait PanicWriter that chips would implement to provide a synchronous writer suitable for panic messages. Boards then only need to select the implementation of PanicWriter they want to use for panics, and the chip-provided implementation is used by debug.rs to correctly display the panic message.

The trait looks like this:

pub trait PanicWriter {
    /// The configuration data the mechanism needs to configure the writer for panic output.
    type Config;

    /// Create a new synchronous writer capable of sending panic messages.
    unsafe fn create_panic_writer(config: Self::Config) -> impl IoWrite;
}

The config type permits the board to provide whatever settings are necessary for the implementation to correctly configure the hardware for panic output.

This addresses the three concerns by:

  1. The create_panic_writer() function must instantiate the object on the stack, so it is not declared as a static mut.
  2. The chip-provided implementations must create a new hardware object instantiation, eliminating sharing between the main kernel and the panic handler. Also, the config provides a method for communicating exactly the settings needed by the writer.
  3. The implementations are provided by chips and not each board, eliminating redundancy.

Testing Strategy

todo

TODO or Help Wanted

Thoughts?

Need to port every chip.

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make prepush.

Copy link
Copy Markdown
Member

@lschuermann lschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree with this approach. Given the implementation of PanicWriter will be chip-specific, and should carefully reason about whether an implementation of PanicWriter is actually safe for each UART driver, I think it might make sense to split these up over multiple PRs.

Comment thread kernel/src/platform/chip.rs Outdated
Comment on lines +192 to +198
/// Create a new synchronous writer capable of sending panic messages.
///
/// The constructed writer must be created on the stack. Because panic
/// will never return this is effectively a static allocation.
///
/// The writer must implement [`IoWrite`].
unsafe fn create_panic_writer(config: Self::Config) -> impl IoWrite;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really enforce that the return value will be stored on the stack, and I don't think it actually matters.

What does matter are the other guarantees we ought to require callers of this unsafe function to uphold:

  • the function must only be called once: that ensures that we will never have two instances of the resulting impl IoWrite around that may be trampling over each other, and
  • this function must only be called in the panic handler. In particular, after this function is called, no other code will access the hardware backing this writer. In a sense, this function will forcefully "take over" the underlying hardware, whichever state it will be in. A full system reset is required before any other code is allowed to access the underlying hardware again.

These promises by the caller are, I think, required to build reliable and sound panic writer implementations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other two guarantees make sense and I will add.

Re: where the writer is stored, aren't the three options statically, on the heap, or on the stack? We don't have a heap. Is there a way for the implementation to declare it statically that we would find acceptable?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradjc I think you're right, but I'm not even sure what the reader is supposed to do with this:

   /// The constructed writer must be created on the stack. Because panic
   /// will never return this is effectively a static allocation.

I don't necessarily even think it's the case that it must have be static. If this drops, isn't it probably fine?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's to help people like me, who are used to tock drivers being statically allocated, and being confused about how to to do that within this interface. Also, I'm worried about implementors wanting to use static_init in the implementation.

Copy link
Copy Markdown
Member

@lschuermann lschuermann Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what bugs me about this comment specifically is the word "must", which implies that I have to do something special to make this work (which is not the case). When rephrasing this into more of an explanation, as opposed to a thing that users of this function "must" do, it is less confusing. Something like:

Because the panic handler will never return, and the returned value is only used for synchronously writing a panic message once, it is fine for the return value of this function to be stored on the panic handler's stack. No static_init! or similar is necessary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still would like to see this addressed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Just removed the comment since we have examples.

Comment thread chips/nrf52/src/uart.rs Outdated
Base automatically changed from dev/panic-handler-stv-transition to master December 11, 2025 00:00
Comment thread kernel/src/utilities/io_write.rs Outdated
///
/// See also the tracking issue:
/// <https://github.com/rust-lang/rfcs/issues/2262>.
pub trait IoWrite: core::fmt::Write {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why we restrict implementations to impl core::fmt::Write types?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saves some copying and pasting. Reverted in c481254

@github-actions github-actions Bot added the WG-OpenTitan In the purview of the OpenTitan working group. label Dec 23, 2025
@bradjc
Copy link
Copy Markdown
Contributor Author

bradjc commented Dec 24, 2025

Ok based on last week's meeting I've:

  1. Removed the apollo3 changes we can't easily test.
  2. Added back the old panic routines as panic_old().
  3. Fixed import errors so CI passes.

@bradjc
Copy link
Copy Markdown
Contributor Author

bradjc commented Jan 9, 2026

I updated the nrf52840dk, including adding an implementation for segger rtt. Please take a look and see if you think it is reasonable. It doesn't try to create a brand new SeggerRttMemory, but instead uses what is already setup.

@alevy alevy self-assigned this Jan 14, 2026
brghena
brghena previously approved these changes Jan 21, 2026
Copy link
Copy Markdown
Contributor

@brghena brghena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I think this is an improvement on the current state of the world and we should merge it.

@bradjc
Copy link
Copy Markdown
Contributor Author

bradjc commented Mar 11, 2026

I just added support for the esp32 and tested it.

Thoughts on moving forward with this?

@bradjc bradjc force-pushed the dev/panic-writer2 branch from 6aa8e76 to e204f1a Compare April 9, 2026 18:01
Copy link
Copy Markdown
Member

@lschuermann lschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems close. I'd like to see the open comments addressed.

Also, have you actually gone through the implementations of the various UARTs and reasoned about whether it is safe to construct the PanicWriter at an arbitrary time? For some of the implementations, I can see that some code was changed to ensure that prior, asynchronous operations are stopped and the hardware is returned to a well-defined state. For others, I think we've just been moving code from io.rs to the respective chip crate. That's fine, but we should discuss that we haven't yet carefully reasoned about the safety of these particular PanicWriter implementations.

Comment thread kernel/src/platform/chip.rs Outdated
Comment on lines +192 to +198
/// Create a new synchronous writer capable of sending panic messages.
///
/// The constructed writer must be created on the stack. Because panic
/// will never return this is effectively a static allocation.
///
/// The writer must implement [`IoWrite`].
unsafe fn create_panic_writer(config: Self::Config) -> impl IoWrite;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still would like to see this addressed.

unsafe fn create_panic_writer(config: Self::Config) -> impl IoWrite;
/// The writer must implement [`IoWrite`] (which is just `std:io::Write`
/// implemented for no_std).
unsafe fn create_panic_writer(config: Self::Config) -> impl IoWrite + core::fmt::Write;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still a little confused, why does the returned object need to implement core::fmt::Write? Couldn't we implement that with a zero-sized wrapper over the impl IoWrite that's being returned?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. These are interleaved throughout the kernel and I don't know what it is "supposed" to be.

Also, maybe what you are describing is what this PR was? Should I remove c481254?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we need to block this PR on that. I still don't understand why we have both IoWrite and core::io::Write, as these seem effectively redundant (esp. the write function. Perhaps core::io::Write could be a supertrait of IoWrite, and the latter be renamed to be called RingBufferWrite? Either way, not something to hold this up on.

///
/// The writer must implement [`IoWrite`] (which is just `std:io::Write`
/// implemented for no_std).
unsafe fn create_panic_writer(config: Self::Config) -> impl IoWrite + core::fmt::Write;
Copy link
Copy Markdown
Contributor

@alexandruradovici alexandruradovici Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsafe fn create_panic_writer(config: Self::Config) -> impl IoWrite + core::fmt::Write;
unsafe fn with_writer(config: Self::Config, f: F) -> !
where F: FnOnce(|&writer: impl IoWrite + core::fmt::Write| -> !;

If we want to make sure that this is called from and only from the panic, can we make it divergent and make it call a divergent closure?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amit? @lschuermann? We met and discussed this 5 months ago and arrived at the current design and I can't remember everything else we considered.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I think unsafe does the trick here.

@bradjc
Copy link
Copy Markdown
Contributor Author

bradjc commented Apr 16, 2026

Also, have you actually gone through the implementations of the various UARTs and reasoned about whether it is safe to construct the PanicWriter at an arbitrary time? For some of the implementations, I can see that some code was changed to ensure that prior, asynchronous operations are stopped and the hardware is returned to a well-defined state. For others, I think we've just been moving code from io.rs to the respective chip crate. That's fine, but we should discuss that we haven't yet carefully reasoned about the safety of these particular PanicWriter implementations.

We're 5 months in...no one seems to want to take up this PR. I think this is a step in the right direction. Because this is happening when the entire board is crashing, I personally don't care that much about what happens.

@alexandruradovici
Copy link
Copy Markdown
Contributor

alexandruradovici commented Apr 16, 2026

Also, have you actually gone through the implementations of the various UARTs and reasoned about whether it is safe to construct the PanicWriter at an arbitrary time? For some of the implementations, I can see that some code was changed to ensure that prior, asynchronous operations are stopped and the hardware is returned to a well-defined state. For others, I think we've just been moving code from io.rs to the respective chip crate. That's fine, but we should discuss that we haven't yet carefully reasoned about the safety of these particular PanicWriter implementations.

We're 5 months in...no one seems to want to take up this PR. I think this is a step in the right direction. Because this is happening when the entire board is crashing, I personally don't care that much about what happens.

I tend to agree with @bradjc here, the board panics anyway, something went wrong and I don't think it matters to much.

Panics should never happen in safery critical use cases, these need to make sure at link time the panic is never called. La

@lschuermann
Copy link
Copy Markdown
Member

I think I made a bad typo here:

we should discuss that we haven't yet carefully reasoned about the safety of these particular PanicWriter implementations

I meant to say: we should document that we haven't reasoned about this.

I'm in full agreement that this PR improves over the current state. That being said, we'd ultimately want to check each implementation at some point in the future, to check that they're actually sound. That should in no way hold up this PR though.

@bradjc
Copy link
Copy Markdown
Contributor Author

bradjc commented Apr 20, 2026

I think I made a bad typo here:

we should discuss that we haven't yet carefully reasoned about the safety of these particular PanicWriter implementations

I meant to say: we should document that we haven't reasoned about this.

I'm in full agreement that this PR improves over the current state. That being said, we'd ultimately want to check each implementation at some point in the future, to check that they're actually sound. That should in no way hold up this PR though.

Like this? 1564b17

@bradjc bradjc force-pushed the dev/panic-writer2 branch from 1564b17 to 01a31f1 Compare April 20, 2026 19:33
@lschuermann
Copy link
Copy Markdown
Member

Like this? 1564b17

Yes, apologies if I missed this before.

@alevy alevy added this pull request to the merge queue Apr 22, 2026
Merged via the queue into master with commit b037ee4 Apr 22, 2026
23 checks passed
@alevy alevy deleted the dev/panic-writer2 branch April 22, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kernel nrf WG-OpenTitan In the purview of the OpenTitan working group.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants