Skip to content

Conversation

@teamchong
Copy link

@teamchong teamchong commented Dec 29, 2025

Problem

When wasm_exec_env_destroy() is called, exec_env_tls (thread-local storage used by signal handlers for hardware bounds checking) may still point to the exec_env being destroyed. On subsequent WASM executions in the same thread, if a signal occurs (e.g., SIGSEGV for bounds checking), the signal handler accesses freed memory and crashes.

Solution

Clear exec_env_tls if it points to the exec_env being destroyed. This is a simple defensive check that prevents dangling pointer issues.

#ifdef OS_ENABLE_HW_BOUND_CHECK
    WASMExecEnv *current_tls = wasm_runtime_get_exec_env_tls();
    if (current_tls == exec_env) {
        wasm_runtime_set_exec_env_tls(NULL);
    }
#endif

Use Case

Daemon-style execution patterns (like Cloudflare Workers) where the same thread runs multiple WASM modules sequentially without forking. Each module creates its own exec_env, runs, then destroys it. Without this fix, the TLS can point to a destroyed exec_env, causing crashes on subsequent runs.

Testing

  • Tested in production daemon-style execution with 100+ consecutive AOT runs without crashes
  • No regression in existing tests expected (the fix only adds a NULL check)

Related

When an exec_env is destroyed, check if it matches the current thread's
exec_env_tls and clear it to avoid dangling pointer issues.

Without this fix, in daemon-style execution where the same thread runs
multiple WASM modules sequentially (like Cloudflare Workers), the
exec_env_tls can point to freed memory after an exec_env is destroyed,
causing crashes on subsequent executions when the signal handler tries
to access it.

This is critical for AOT mode with hardware bounds checking enabled,
where signal handlers rely on exec_env_tls to handle SIGSEGV properly.
@lum1n0us
Copy link
Contributor

lum1n0us commented Jan 5, 2026

We are hoping to get more details about how the host side is using WAMR's APIs, especially regarding getting/setting exec_env_tls and calling WASM functions. A reproducible case would be great.

From my perspective, if you follow the pattern mentioned here, every call to a WASM function would have the proper exec_env_tls, and runtime_signal_handler() will not encounter a dangling pointer.

#include "core/iwasm/common/wasm_runtime_common.h"
call_worker ()
{
    exec_env_backup = wasm_runtime_get_exec_env_tls();
    wasm_runtime_set_exec_env_tls(NULL);  // clear
    call worker wasm module  // pass local exec_env, then call_wasm_with_hw_bound_check() will wasm_runtime_set_exec_env_tls(exec_env), and clean it like wasm_runtime_set_exec_env_tls(NULL) when finished execution.
    wasm_runtime_set_exec_env_tls(exec_env_backup ); // restore
}

@teamchong
Copy link
Author

We are hoping to get more details about how the host side is using WAMR's APIs, especially regarding getting/setting exec_env_tls and calling WASM functions. A reproducible case would be great.

From my perspective, if you follow the pattern mentioned here, every call to a WASM function would have the proper exec_env_tls, and runtime_signal_handler() will not encounter a dangling pointer.

#include "core/iwasm/common/wasm_runtime_common.h"
call_worker ()
{
    exec_env_backup = wasm_runtime_get_exec_env_tls();
    wasm_runtime_set_exec_env_tls(NULL);  // clear
    call worker wasm module  // pass local exec_env, then call_wasm_with_hw_bound_check() will wasm_runtime_set_exec_env_tls(exec_env), and clean it like wasm_runtime_set_exec_env_tls(NULL) when finished execution.
    wasm_runtime_set_exec_env_tls(exec_env_backup ); // restore
}

thanks for the feedback! I've added a reproducible test case that demonstrates the bug.

The Bug

The issue is in invoke_native_with_hw_bound_check (both aot_runtime.c and wasm_runtime.c):

// exec_env_tls is SET here
wasm_runtime_set_exec_env_tls(exec_env);

// Early return WITHOUT clearing exec_env_tls!
if (!wasm_runtime_detect_native_stack_overflow(exec_env)) {
   return false;  // BUG: TLS never cleared
}

When the native stack overflow check fails, the function returns early without clearing exec_env_tls. If the application then destroys the exec_env and creates a new one, subsequent WASM calls fail with "invalid exec env" because exec_env_tls still points to the destroyed exec_env.

Reproducible Test Case

Added tests/standalone/test-exec-env-tls/ with a test that:

  1. Creates exec_env_A
  2. Sets native_stack_boundary high to trigger overflow check failure
  3. Calls WASM → fails with "native stack overflow", but TLS is not cleared
  4. Destroys exec_env_A → TLS is now a dangling pointer
  5. Creates exec_env_B
  6. Calls WASM → fails with "invalid exec env" (without fix)

About the save/restore pattern

The save/restore pattern you mentioned would work if the application explicitly manages TLS. However, in this case:

  • The bug is inside WAMR's invoke_native_with_hw_bound_check function
  • The application just calls the public wasm_runtime_call_wasm API
  • The early return path doesn't clear TLS, leaving it in an inconsistent state

The fix is defensive cleanup in wasm_exec_env_destroy()
if TLS points to the exec_env being destroyed, clear it. This handles any case where TLS wasn't properly cleared.

Add test case that reproduces the bug where exec_env_tls is not cleared
on early return paths in invoke_native_with_hw_bound_check.

The test triggers native stack overflow check failure, which causes
wasm_runtime_call_wasm to return early after setting exec_env_tls but
without clearing it. This leaves exec_env_tls pointing to a destroyed
exec_env, causing subsequent calls to fail with "invalid exec env".

Test confirms the fix in wasm_exec_env_destroy correctly clears
exec_env_tls when destroying the exec_env it points to.
@teamchong teamchong force-pushed the fix-exec-env-tls-dangling-pointer branch from 0f18a9c to 9f73f59 Compare January 8, 2026 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants