You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to POSIX, malloc is not async-signal-safe, meaning that it's not guaranteed to work after fork in a multi-thread process. In practice, this sometimes leads to deadlocks, although some libcs take liberty in making it fine to call malloc/free after fork.
2024 revision of POSIX says that "the application shall ensure that the child process only executes async-signal-safe operations until such time as one of the exec functions is successful". This does not clearly indicate that executing them is UB. However, the previous revision says that, under the same conditions, it is undefined behavior to invoke fork if async-signal-unsafe pthread_atfork handlers are registered. Since this is just invoking async-signal-unsafe functions with a layer of indirection, I believe this wording implies executing async-signal-unsafe operations after fork is UB in general. Musl authors seemingly agree with this interpretation.
Yet it can be tricky to know if the function is allocating, or at least not immediately obvious, as allocations are hidden and every Rust function can allocate. This problem is exacerbated by lack of tools to detect allocations inside pre_exec: Miri can't analyze pre_exec hooks, clippy does not lint against using these functions and Valgrind remains silent. We can't make pre_exec safe, but we can at least help people recognise bugs in their code.
The pre_exec allocations footgun is mostly hit by external code, though. @purplesyringa recently found many projects that unknowingly use allocating Error::other or Error::new in pre_exec hooks. It is not clear how developers could find this bug.
The idea is to make allocations inside spawn variant pre_exec a library UB and integrate checks into ub-checks efforts. Although POSIX mentions only multi-threaded processes, std could possibly have created helper threads, and lack of helper threads should probably be considered an implementation detail by treating all programs as multi-threaded in the pre_exec case.
While malloc(3) is async-signal-unsafe, certain custom allocators may be guaranteed to work correctly after fork (e.g. bump allocators). Therefore the check must be opt-in for the allocator.
Since async-signal-safe contexts also appear in signal handlers, it would be prudent to make the API useful for that purpose as well. We don't have to expose it right away, but the developed solution should be forward-compatible.
All in all, the plan is to expose a function to query whether we are in an "async signal" context (like std::thread::panicking) and a "guard" (like DropGuard).
// Names are up to bikeshedmod thread {#[thread_local]staticASYNC_SIGNAL_NESTING:Cell<usize> = const{Cell::new(0)};pubfnis_async_signal_context() -> bool{ASYNC_SIGNAL_NESTING.get() != 0}pubstructAsyncSignalGuard;implAsyncSignalGuard{fnnew() -> Self{ASYNC_SIGNAL_NESTING.set(ASYNC_SIGNAL_NESTING.get() + 1);Self}}implDropforAsyncSignalGuard{fndrop(&mutself){ASYNC_SIGNAL_NESTING.set(ASYNC_SIGNAL_NESTING.get() - 1);}}}
Allocators would use this API to make sure they were not called inside an async signal context:
debug_assert!(
!std::thread::is_async_signal_context(),"This allocator does not support allocating in async-signal-safe context.");
The allocator, or a custom wrapper, could alternatively choose to use assert! to run such checks even in release mode. The System allocator would have to use a different mechanism than debug_assert! if std is pre-built.
Users of raw fork, like Command::spawn, and signal handlers may enter the async signal context with AsyncSignalGuard:
These functions should be stubbed on #[cfg(not(unix))] so that allocators don't have to always use #[cfg(unix)] (e.g. because we may want to introduce such checks on other platforms).
Note that panic! inside pre_exec will not show traceback now (playground), and using this approach with allocator will produce a near-useless message "you allocate in async-signal-unsafe context" without telling where exactly that allocation happened. Though I believe one can add a breakpoint to panic and view traceback in a debugger, it has a less nicer experience than desired.
To sum up, this proposal consists of two parts:
A new public API, and
Updating spawn to enter an async signal context and the System allocator to panic when invoked in an async signal context.
Alternatives
Custom allocator published on crates.io
It is possible to write a custom allocator that would wrap System and register pthread_atfork handler to set a flag to disable allocations (this only detects fork and not _Fork; though another fork detection mechanism may not have this issue). However:
Users would need to actively opt in this allocator, and third-party tools lack good discoverability compared to first-party tools like Miri. To use this debug mechanism, you'd have to be aware of the possibility of allocations, but the main reason this problem is wide-spread is lack of this knowledge.
It does not open opportunity to mark other, non-allocating std functions as async-signal-unsafe, limiting the applicability.
There is a many-to-many correspondence between async signal contexts (fork, manual syscalls, manually set up signal handlers) and async-signal-unsafe functions (allocators, libc wrappers, unwinders). For the checks to work, both have to use the same registry to enter async signal contexts and assert safety, so there would have to be exactly one such crate in the ecosystem, with no possibility of switching without losing functionality. That crate might as well be std.
Replace "async signal context" with "no-alloc context"
Focusing on "no-alloc context" will make it impossible to track non-allocating async-signal-unsafe functions, e.g. functions using global mutexes or non-reentrant functions. However, "no-alloc context" is a platform-independent idea and might be useful outside of cfg(unix). It is not clear which approach is better.
Clippy can't see through functions, and a lot of APIs unknowingly allocate. The lint would have a lot of false negatives and therefore be near-useless. A more complicated analysis would just be an ad-hoc implementation of effects.
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
We think this problem seems worth solving, and the standard library might be the right place to solve it.
We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
Footnotes
Command can be materialized on unix in two ways -- Command::spawn and CommandExt::exec. All pre_exec madness about async-signal-safety applies only to spawn variant. ↩
Proposal
Problem statement
Allocating inside
CommandExt::pre_exec1 is a footgun.According to POSIX,
mallocis not async-signal-safe, meaning that it's not guaranteed to work after fork in a multi-thread process. In practice, this sometimes leads to deadlocks, although some libcs take liberty in making it fine to callmalloc/freeafter fork.2024 revision of POSIX says that "the application shall ensure that the child process only executes async-signal-safe operations until such time as one of the
execfunctions is successful". This does not clearly indicate that executing them is UB. However, the previous revision says that, under the same conditions, it is undefined behavior to invokeforkif async-signal-unsafepthread_atforkhandlers are registered. Since this is just invoking async-signal-unsafe functions with a layer of indirection, I believe this wording implies executing async-signal-unsafe operations afterforkis UB in general. Musl authors seemingly agree with this interpretation.Yet it can be tricky to know if the function is allocating, or at least not immediately obvious, as allocations are hidden and every Rust function can allocate. This problem is exacerbated by lack of tools to detect allocations inside
pre_exec: Miri can't analyzepre_exechooks, clippy does not lint against using these functions and Valgrind remains silent. We can't makepre_execsafe, but we can at least help people recognise bugs in their code.Motivating examples or use cases
During efforts to make panicking work normally after fork it was discovered that a simple
catch_unwindsolution does not reliably work across systems because of allocations and it was suggested to make global allocator abort after fork. Thoughstdmoved forward with the other solution to "don't unwind pastfork()in child", it still could be helpful to know where these allocations were.The
pre_execallocations footgun is mostly hit by external code, though. @purplesyringa recently found many projects that unknowingly use allocatingError::otherorError::newinpre_exechooks. It is not clear how developers could find this bug.chdir(2)is async-signal-safe, Rust'sstd::env::set_current_diris not, because it has to allocate to append NUL-byte to path. This applies to all filesystem-related functions. (Note thatCommand::current_dircould not be used in Non async-signal-safe closure inpre_execalacritty/alacritty#8751 because the intention was to swallow errors.)pre_exec, like filesystem accesses insideCgroup::add_taskin Allocations inpre_execopenanolis/cryptpilot#54, or could be abstracted away, like inloggerin Fix non-async-signal-safety inpre_execalacritty/alacritty#8756, so simple clippy lints could not help detect allocations.pre_execclosure reubeno/brush#777 (comment).Solution sketch
The idea is to make allocations inside
spawnvariantpre_execa library UB and integrate checks intoub-checksefforts. Although POSIX mentions only multi-threaded processes, std could possibly have created helper threads, and lack of helper threads should probably be considered an implementation detail by treating all programs as multi-threaded in thepre_execcase.While
malloc(3)is async-signal-unsafe, certain custom allocators may be guaranteed to work correctly afterfork(e.g. bump allocators). Therefore the check must be opt-in for the allocator.Since async-signal-safe contexts also appear in signal handlers, it would be prudent to make the API useful for that purpose as well. We don't have to expose it right away, but the developed solution should be forward-compatible.
All in all, the plan is to expose a function to query whether we are in an "async signal" context (like
std::thread::panicking) and a "guard" (likeDropGuard).Allocators would use this API to make sure they were not called inside an async signal context:
The allocator, or a custom wrapper, could alternatively choose to use
assert!to run such checks even in release mode. TheSystemallocator would have to use a different mechanism thandebug_assert!ifstdis pre-built.Users of raw
fork, likeCommand::spawn, and signal handlers may enter the async signal context withAsyncSignalGuard:These functions should be stubbed on
#[cfg(not(unix))]so that allocators don't have to always use#[cfg(unix)](e.g. because we may want to introduce such checks on other platforms).Note that
panic!insidepre_execwill not show traceback now (playground), and using this approach with allocator will produce a near-useless message "you allocate in async-signal-unsafe context" without telling where exactly that allocation happened. Though I believe one can add a breakpoint topanicand view traceback in a debugger, it has a less nicer experience than desired.To sum up, this proposal consists of two parts:
spawnto enter an async signal context and theSystemallocator to panic when invoked in an async signal context.Alternatives
Custom allocator published on
crates.ioIt is possible to write a custom allocator that would wrap
Systemand registerpthread_atforkhandler to set a flag to disable allocations (this only detectsforkand not_Fork; though another fork detection mechanism may not have this issue). However:stdfunctions as async-signal-unsafe, limiting the applicability.fork, manual syscalls, manually set up signal handlers) and async-signal-unsafe functions (allocators, libc wrappers, unwinders). For the checks to work, both have to use the same registry to enter async signal contexts and assert safety, so there would have to be exactly one such crate in the ecosystem, with no possibility of switching without losing functionality. That crate might as well bestd.Replace "async signal context" with "no-alloc context"
Focusing on "no-alloc context" will make it impossible to track non-allocating async-signal-unsafe functions, e.g. functions using global mutexes or non-reentrant functions. However, "no-alloc context" is a platform-independent idea and might be useful outside of
cfg(unix). It is not clear which approach is better.Just wait for effects
Suggested in bootc-dev/containers-image-proxy-rs#109 (comment)
Effects in Rust are not coming soon, whereas the proposed solution is simple enough for an incremental improvement.
Clippy lint against certain functions
Suggested in jamesmcm/vopono#340 (comment)
Clippy can't see through functions, and a lot of APIs unknowingly allocate. The lint would have a lot of false negatives and therefore be near-useless. A more complicated analysis would just be an ad-hoc implementation of effects.
Links and related work
fork: std::process::unix: Command: Do not unwind past fork(), in child rust#80263, Do not allocate or unwind after fork rust#81858, Tracking Issue for panic::always_abort() rust#84438Error::new/Error::other: Document Error::{new,other} as to be avoided in pre_exec rust#148971What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution:
Footnotes
Commandcan be materialized onunixin two ways --Command::spawnandCommandExt::exec. Allpre_execmadness about async-signal-safety applies only tospawnvariant. ↩