Skip to content

Add corruption-detection test for probabilistic mitigations#848

Open
mjp41 wants to merge 2 commits intomicrosoft:mainfrom
mjp41:coverage_failures
Open

Add corruption-detection test for probabilistic mitigations#848
mjp41 wants to merge 2 commits intomicrosoft:mainfrom
mjp41:coverage_failures

Conversation

@mjp41
Copy link
Copy Markdown
Member

@mjp41 mjp41 commented May 10, 2026

Adds a new functional test, func/corruption_detection, that validates snmalloc's probabilistic memory-safety mitigations actually fire on the corruption patterns they are designed to catch. Without it, regressions that silently weakened a mitigation would be invisible to the existing suite, since every other test exercises only the non-failing arm of the integrity checks.

Each scenario runs in a forked child so the expected abort does not kill the harness. Detection is reported as the child being killed by SIGABRT/SIGSEGV/SIGBUS/SIGILL; a clean exit means the corruption went undetected and the test fails.

Six scenarios are covered, spanning the local-thread, remote-thread and large-allocation paths:

  • double_free - small alloc, two local frees of the same
    slot. Detected by freelist_backward_edge
    when the resulting cycle is later
    traversed.
  • uaf_freelist - small alloc, free, then write garbage
    into the freed slot's first two words
    (the obfuscated next/prev). Detected by
    check_prev on the next freelist
    consumption.
  • oob_into_neighbor - tiny allocs, free even slots, overrun
    from an odd live slot into freed
    neighbours. Detected by check_prev when
    the neighbour is later allocated.
  • remote_double_free - small alloc, free locally, then free
    again from a different thread (the
    second free travels via the remote
    message queue). Detected as
    !meta->is_unused() in the dealloc path.
  • remote_uaf - small alloc, free via a different
    thread, then write garbage through the
    dangling pointer while the slot sits on
    the owning allocator's pending-remote
    queue. Detected by check_prev during
    handle_message_queue_slow's drain - a
    code path no other test exercises.
  • large_double_free - allocation larger than any small
    sizeclass (handled by the chunk
    allocator and per-chunk metadata rather
    than the slab freelist), freed twice.
    Detected as !meta->is_unused() in the
    large-dealloc path.

The test is Linux-only (uses fork()/waitpid()) and is a no-op when SNMALLOC_CHECK_CLIENT is not defined, since the mitigations it relies on are then compiled out.

The test is also instrumented to cooperate with clang source-based coverage: the forked child re-resolves LLVM_PROFILE_FILE with its own pid (the parent's %p expansion is otherwise inherited and all children would write to the same file) and a signal handler flushes .profraw before re-raising the fatal signal. The runtime entry points are declared as weak symbols so the test still links in non-coverage builds.

Picked up automatically by make_tests so it runs as both func-corruption_detection-fast and func-corruption_detection-check; the fast variant immediately exits with the "skip" message because the mitigations are off.

mjp41 added 2 commits May 9, 2026 21:25
Adds a new functional test, func/corruption_detection, that
validates snmalloc's probabilistic memory-safety mitigations
actually fire on the corruption patterns they are designed to
catch. Without it, regressions that silently weakened a mitigation
would be invisible to the existing suite, since every other test
exercises only the non-failing arm of the integrity checks.

Each scenario runs in a forked child so the expected abort does not
kill the harness. Detection is reported as the child being killed
by SIGABRT/SIGSEGV/SIGBUS/SIGILL; a clean exit means the corruption
went undetected and the test fails.

Six scenarios are covered, spanning the local-thread, remote-thread
and large-allocation paths:

  * double_free          - small alloc, two local frees of the same
                           slot. Detected by freelist_backward_edge
                           when the resulting cycle is later
                           traversed.
  * uaf_freelist         - small alloc, free, then write garbage
                           into the freed slot's first two words
                           (the obfuscated next/prev). Detected by
                           check_prev on the next freelist
                           consumption.
  * oob_into_neighbor    - tiny allocs, free even slots, overrun
                           from an odd live slot into freed
                           neighbours. Detected by check_prev when
                           the neighbour is later allocated.
  * remote_double_free   - small alloc, free locally, then free
                           again from a different thread (the
                           second free travels via the remote
                           message queue). Detected as
                           !meta->is_unused() in the dealloc path.
  * remote_uaf           - small alloc, free via a different
                           thread, then write garbage through the
                           dangling pointer while the slot sits on
                           the owning allocator's pending-remote
                           queue. Detected by check_prev during
                           handle_message_queue_slow's drain - a
                           code path no other test exercises.
  * large_double_free    - allocation larger than any small
                           sizeclass (handled by the chunk
                           allocator and per-chunk metadata rather
                           than the slab freelist), freed twice.
                           Detected as !meta->is_unused() in the
                           large-dealloc path.

The test is Linux-only (uses fork()/waitpid()) and is a no-op when
SNMALLOC_CHECK_CLIENT is not defined, since the mitigations it
relies on are then compiled out.

The test is also instrumented to cooperate with clang source-based
coverage: the forked child re-resolves LLVM_PROFILE_FILE with its
own pid (the parent's %p expansion is otherwise inherited and all
children would write to the same file) and a signal handler flushes
.profraw before re-raising the fatal signal. The runtime entry
points are declared as weak symbols so the test still links in
non-coverage builds.

Picked up automatically by make_tests so it runs as both
func-corruption_detection-fast and func-corruption_detection-check;
the fast variant immediately exits with the "skip" message because
the mitigations are off.
Three issues surfaced in CI for the new corruption-detection test:

1. Linux: `large_double_free` did not detect any corruption. The
   subtest used `LARGE_SIZE = MIN_CHUNK_SIZE * 4 = 64 KiB`, which on
   the default Linux config is `MAX_SMALL_SIZECLASS_SIZE` — i.e.
   the largest *small* sizeclass — so the allocations went through
   the slab free-list path and never reached the chunk-allocator
   double-free check at all. Use `MAX_SMALL_SIZECLASS_SIZE * 2` so
   the size unambiguously falls into the large range. Once the test
   actually exercises the right path, the existing
   `is_backend_owned()` check in `dealloc_remote` (gated on the
   `sanity_checks` mitigation, which is part of `full_checks` in a
   default `SNMALLOC_CHECK_CLIENT` build) flags the double-free.

2. Mac: `-Wunused-function` errors for every `try_*` helper. The
   helpers are referenced only from `run_in_child`, which is
   already gated on `__linux__`. Move the helpers and the LLVM
   profile externs inside the same `#if defined(__linux__)` block
   so non-Linux builds compile cleanly. The non-Linux `main`
   already prints a "skipping" message and returns 0.

3. Windows: `__attribute__((weak))` is not portable to MSVC and
   there is no `SNMALLOC_WEAK` macro in `defines.h`. The weak
   symbols are only used by the Linux-only fork harness for
   coverage-flush, so gating them on `__linux__` is the natural
   fix.

Also use `static_cast<uintptr_t>(0xDEADBEEFu)`-style literals for
the UAF freelist-corruption writes so MSVC does not warn about
narrowing on 32-bit Windows (C4305/C4309). The exact bit pattern
does not matter: any non-zero garbage in the freelist node header
will fail domestication or the doubly-linked invariant check.

Verified locally: all 6 subtests now detect corruption (including
large_double_free, which detects via signal 4 / SIGILL from the
sanity_checks mitigation).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant