Skip to content

Fix erlang:raise/3 assert with built stacktrace#2252

Merged
bettio merged 2 commits intoatomvm:release-0.7from
bettio:fix-raise-built-stacktrace
Apr 3, 2026
Merged

Fix erlang:raise/3 assert with built stacktrace#2252
bettio merged 2 commits intoatomvm:release-0.7from
bettio:fix-raise-built-stacktrace

Conversation

@bettio
Copy link
Copy Markdown
Collaborator

@bettio bettio commented Apr 1, 2026

  • Fix assert in stacktrace_exception_class when erlang:raise/3 is called with a built stacktrace and the re-raised exception hits a non-matching catch clause (triggering the compiler-generated OP_RAISE)
  • Wrap the built list into a raw 6-tuple in stacktrace_create_raw_mfa
  • Route OP_RAW_RAISE through HANDLE_ERROR() so both the NIF path and the raw_raise opcode path go through the wrapping logic
  • Add early exit in stacktrace_build for pre-built stacktraces

Fixes #2185

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@bettio bettio force-pushed the fix-raise-built-stacktrace branch from 07263bb to 7e62129 Compare April 1, 2026 16:50
@bettio bettio added this to the v0.7.0 milestone Apr 1, 2026
@bettio bettio force-pushed the fix-raise-built-stacktrace branch from 7e62129 to 86ff1db Compare April 1, 2026 16:52
@bettio bettio force-pushed the fix-raise-built-stacktrace branch from 86ff1db to 949690a Compare April 1, 2026 19:32
@petermm
Copy link
Copy Markdown
Contributor

petermm commented Apr 2, 2026

https://ampcode.com/threads/T-019d4d69-3ac6-718e-8d7c-1ad32f287c7b

PR Review: Fix erlang:raise/3 assert with built stacktrace

Commit: 949690aFix erlang:raise/3 assert with built stacktrace
Author: Davide Bettio
Files changed: 6 (stacktrace.c, opcodesswitch.h, test_raise_built_stacktrace.erl, CMakeLists.txt, test.c, CHANGELOG.md)


Summary

When erlang:raise/3 is called with a built stacktrace (list of {M,F,A,Loc} tuples) and the
re-raised exception passes through a try/catch whose clauses do not match, the compiler-generated
catch-all (OP_RAISE) calls stacktrace_exception_class(), which asserts on a 6-tuple. The built
list was stored as-is, never wrapped.

The fix:

  1. Changes OP_RAW_RAISE from goto handle_error to HANDLE_ERROR() so stacktrace_create_raw_mfa() is called
  2. stacktrace_create_raw_mfa() detects a built stacktrace (non-empty list) and wraps it into a raw 6-tuple
  3. stacktrace_build() detects num_frames == 0 and returns the pre-built list directly

Verdict: ✅ Approve with suggestions

The core fix is correct and well-reasoned for the interpreter path. GC root handling is safe,
the num_frames == 0 sentinel is sound (normal stacktraces always have num_frames >= 1), and
TUPLE_SIZE(6) allocation matches. The test covers both the raw_raise opcode and NIF paths.


Issues Found

🔴 Critical: JIT raw_raise path is NOT fixed

The same bug exists in the JIT path. jit_raw_raise() calls jit_handle_error(ctx, jit_state, 0),
which only calls stacktrace_create_raw() when offset != 0 || term_is_invalid_term(stacktrace).
With a built list and offset=0, normalization is skipped — the list flows through unchanged and
will hit the same assertion in jit_raise()stacktrace_exception_class().

File: src/libAtomVM/jit.c

// Line 288-294 — jit_handle_error skips normalization for built lists
static Context *jit_handle_error(Context *ctx, JITState *jit_state, int offset)
{
    if (offset || term_is_invalid_term(ctx->exception_stacktrace)) {
        ctx->exception_stacktrace
            = stacktrace_create_raw(ctx, jit_state->module, offset);
    }
    // ← A built list (from raise/3) passes through here unchanged
// Line 415-422 — jit_raw_raise passes offset=0, built list survives
static Context *jit_raw_raise(Context *ctx, JITState *jit_state)
{
    context_set_exception_class(ctx, ctx->x[0]);
    ctx->exception_reason = ctx->x[1];
    ctx->exception_stacktrace = ctx->x[2];
    return jit_handle_error(ctx, jit_state, 0);
}

Suggested fix:

 static Context *jit_handle_error(Context *ctx, JITState *jit_state, int offset)
 {
-    if (offset || term_is_invalid_term(ctx->exception_stacktrace)) {
+    if (offset || term_is_invalid_term(ctx->exception_stacktrace)
+            || term_is_nonempty_list(ctx->exception_stacktrace)) {
         ctx->exception_stacktrace
             = stacktrace_create_raw(ctx, jit_state->module, offset);
     }

Also, jit_raise() at line 409 calls stacktrace_exception_class(stacktrace) directly on its
argument. If a built list reaches this path, it will assert-fail. This would also need the same
wrapping treatment.


🟡 Minor: Stale comment in OP_RAW_RAISE

The old comment at opcodesswitch.h:5248-5250 says:

"This is an optimization from the compiler where we don't need to call stacktrace_create_raw
here because the stack trace has already been created and set in x[2]."

This is now misleading — the whole point of the fix is that we do call stacktrace_create_raw
(via HANDLE_ERROR()) to normalize built lists. The comment should be updated.

Suggested fix:

             case OP_RAW_RAISE: {

                 TRACE("raw_raise/0\n");

-                // This is an optimization from the compiler where we don't need to call
-                // stacktrace_create_raw here because the stack trace has already been created
-                // and set in x[2].
+                // The compiler emits raw_raise when re-raising with an already
+                // captured stacktrace. We route through HANDLE_ERROR() so that
+                // stacktrace_create_raw_mfa normalizes any built stacktrace
+                // (list of {M,F,A,Loc} tuples) into the internal raw 6-tuple.
                 term ex_class = x_regs[0];

🟡 Minor: Empty stacktrace [] not preserved

erlang:raise(error, badarg, []) is valid in OTP but won't be detected by the
term_is_nonempty_list() check. The empty list will fall through to the normal stacktrace
construction path, producing a new stacktrace instead of preserving [].

This may be acceptable if AtomVM doesn't need full OTP compatibility here, but worth documenting
as a known limitation.


🟢 Nitpick: CHANGELOG wording

"causing an assert" reads slightly better as "causing an assertion failure":

-- Fixed `erlang:raise/3` with a built stacktrace causing an assert when the re-raised exception
+- Fixed `erlang:raise/3` with a built stacktrace causing an assertion failure when the re-raised exception
 passes through a non-matching catch clause

🟢 Nitpick: Test coverage could be stronger

The tests verify "no crash" but don't assert that the preserved stacktrace is correct. Adding a
check that the caught stacktrace remains a list matching the original would strengthen the test.
Also missing: exit/throw exception classes and raise(error, badarg, []).


What's Good

  • GC root handling is correctexception_stacktrace and exception_reason are properly
    rooted through ctx->x[0..1] before memory_ensure_free_with_roots
  • NOLINT(term-use-after-gc) for exception_class is safeexception_class is always an
    atom (immediate value), never a heap term, so GC cannot move it
  • num_frames == 0 sentinel is sound — normal stacktrace construction always does
    num_frames++ at line 211, so 0 is a safe sentinel for "pre-built"
  • No performance regression — the added branch is only on exception paths
  • Test covers both opcode and NIF paths — good coverage of the two entry points

@bettio bettio force-pushed the fix-raise-built-stacktrace branch from 949690a to b76f2c3 Compare April 2, 2026 12:48
@bettio
Copy link
Copy Markdown
Collaborator Author

bettio commented Apr 2, 2026

PR Review: Fix erlang:raise/3 assert with built stacktrace

I think I fixed all of them.

@petermm
Copy link
Copy Markdown
Contributor

petermm commented Apr 2, 2026

looks good, this is suggested for the codeql complaining, no strong preference on my part..

CodeQL Fix: Use-after-GC false positive in stacktrace.c

CodeQL flagged exception_class as potentially used after garbage collection
(memory_ensure_free_with_roots). While safe in practice (atoms are immediates, not heap-allocated),
the fix eliminates the pattern by reading context_exception_class(ctx) directly at point of use.

 term stacktrace_create_raw_mfa(Context *ctx, Module *mod, int current_offset, term module_atom, term function_atom, int arity)
 {
-    term exception_class = context_exception_class(ctx);
-
     if (term_is_list(ctx->exception_stacktrace)) {
         // Already a built stacktrace (possibly empty) from erlang:raise/3
         ...
         term_put_tuple_element(stack_info, 4, built_stacktrace);
-        // NOLINT(term-use-after-gc) exception_class is always an atom
-        term_put_tuple_element(stack_info, 5, exception_class);
+        term_put_tuple_element(stack_info, 5, context_exception_class(ctx));
         return stack_info;
     }

     ...
     ctx->x[live_x_regs - 1] = ctx->exception_reason;
-    // NOLINT(term-use-after-gc) exception_class is always an atom
     if (UNLIKELY(memory_ensure_free_with_roots(ctx, requested_size, live_x_regs, ctx->x, MEMORY_CAN_SHRINK) != MEMORY_GC_OK)) {
         ...

     ...
     term_put_tuple_element(stack_info, 4, raw_stacktrace);
-    term_put_tuple_element(stack_info, 5, exception_class);
+    term_put_tuple_element(stack_info, 5, context_exception_class(ctx));

     return stack_info;
 }

No behavioral change — context_exception_class() returns an atom (immediate value), so GC never
moves it. The local variable and NOLINT suppressions are simply no longer needed.

ctx->exception_stacktrace
= stacktrace_create_raw(ctx, jit_state->module, offset);
if (offset || term_is_invalid_term(ctx->exception_stacktrace)
|| term_is_nonempty_list(ctx->exception_stacktrace)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|| term_is_nonempty_list(ctx->exception_stacktrace)) {
|| term_is_list(ctx->exception_stacktrace)) {

ctx->x[1] = ctx->exception_reason;
if (UNLIKELY(memory_ensure_free_with_roots(ctx, TUPLE_SIZE(6), 2, ctx->x, MEMORY_CAN_SHRINK)
!= MEMORY_GC_OK)) {
return OUT_OF_MEMORY_ATOM;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't log here but below we do:

        fprintf(stderr, "WARNING: Unable to allocate heap space for raw stacktrace\n");

Arguably, this OOM is less likely.

// OP_RAISE can later process.
term ex_class = x_regs[0];
if (UNLIKELY(ex_class != ERROR_ATOM && ex_class != LOWERCASE_EXIT_ATOM && ex_class != THROW_ATOM)) {
x_regs[0] = BADARG_ATOM;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a bug here. We should raise. We should exercise this with a test.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove this with the OTP26+ change

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this still happen?

When erlang:raise/3 is called with a built stacktrace (list of
{M,F,A,Loc} tuples) and the re-raised exception passes through a
try/catch whose clauses do not match, the compiler-generated catch-all
(OP_RAISE) calls stacktrace_exception_class, which asserts on a
6-tuple. The built list was stored as-is, never wrapped.

Wrap the built stacktrace into a raw 6-tuple in stacktrace_create_raw_mfa
so OP_RAISE and stacktrace_build can process it. Route OP_RAW_RAISE
through HANDLE_ERROR() so it also hits the wrapping path.

Use term_invalid_term() instead of term_nil() as the
exception_stacktrace sentinel so that an empty stacktrace [] passed to
erlang:raise/3 is correctly detected and preserved, matching OTP
behavior.

Signed-off-by: Davide Bettio <davide@uninstall.it>
@bettio bettio force-pushed the fix-raise-built-stacktrace branch from b76f2c3 to 0cb6d10 Compare April 2, 2026 22:51
@bettio
Copy link
Copy Markdown
Collaborator Author

bettio commented Apr 2, 2026

Everything should be fixed now.

@bettio bettio force-pushed the fix-raise-built-stacktrace branch from 0cb6d10 to 1c34917 Compare April 2, 2026 22:52
@petermm
Copy link
Copy Markdown
Contributor

petermm commented Apr 3, 2026

1c34917 changes invalid erlang:raise/3 handling to raise error:badarg, but BEAM still returns the atom badarg as a normal value here. That affects both the NIF path and compiler-generated raw_raise, so test_exception_classes and test_raw_raise now fail because the tests were updated to expect an exception instead of a return value. This looks like a BEAM-parity regression rather than a platform-specific bug.`

@pguyot
Copy link
Copy Markdown
Collaborator

pguyot commented Apr 3, 2026

My bad. I thought BEAM raised and we had a bug, but it doesn't and we hadn't. Sorry!

As evaluating this function causes the process to terminate, it has no return value unless the arguments are invalid, in which case the function returns the error reason badarg. If you want to be sure not to return, you can call error(erlang:raise(Class, Reason, Stacktrace)) and hope to distinguish exceptions later.

See https://www.erlang.org/doc/apps/erts/erlang.html#raise/3

Signed-off-by: Davide Bettio <davide@uninstall.it>
@bettio bettio force-pushed the fix-raise-built-stacktrace branch from 1c34917 to 665cd64 Compare April 3, 2026 08:59
@bettio bettio merged commit d053d69 into atomvm:release-0.7 Apr 3, 2026
216 of 222 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants