Skip to content

linux-user: Avoid PT_LOAD overlap on 16K hosts#304

Open
LaurenIsACoder wants to merge 1 commit into
lat-opensource:masterfrom
LaurenIsACoder:elfload-16k-ptload-overlap
Open

linux-user: Avoid PT_LOAD overlap on 16K hosts#304
LaurenIsACoder wants to merge 1 commit into
lat-opensource:masterfrom
LaurenIsACoder:elfload-16k-ptload-overlap

Conversation

@LaurenIsACoder
Copy link
Copy Markdown
Contributor

Summary

This PR fixes an ELF PT_LOAD loading bug on 16K-page LoongArch hosts
running 4K-page x86_64 guests.

The issue was reproduced with the official Claude Code Linux x64 release
asset claude-linux-x64.tar.gz from:

  • https://claude.ai/install.sh
  • https://github.com/anthropics/claude-code/releases

The root cause is not incorrect instruction translation. A later writable
PT_LOAD could be widened backward to the host-page boundary and overlap
the tail page of an earlier executable PT_LOAD, zeroing .plt/.iplt
bytes before guest execution begins.

This PR keeps host-page behavior for aligned non-overlapping segments,
and falls back to guest-page granularity only when host-page widening
would overlap another PT_LOAD.

It also adds focused regression tests for:

  1. overlap must fall back to guest-page granularity;
  2. aligned non-overlapping segments must still keep host-page behavior;
  3. small p_align must still use TARGET_PAGE_SIZE.
English Details

Problem

The issue was reproduced with the official Claude Code x86_64 Linux
native binary release asset claude-linux-x64.tar.gz.

Source:

  • official installer entry point: https://claude.ai/install.sh
  • official releases page: https://github.com/anthropics/claude-code/releases

On a 16K-page LoongArch kernel, the following minimal command crashed
immediately:

export LATX_AOT=0 LATX_KZT=0
./build64/latx-x86_64 /tmp/claude-code-test/claude --help

Before this patch:

  • SIGSEGV
  • exit code 139

After this patch:

  • exits successfully
  • exit code 0
  • prints the Claude Code help output

Symptom

At the crash site, execution reached guest bytes 00 00, which decode as:

add byte ptr [rax], al

With rax == 0, the translated LoongArch memory access faulted on
address 0.

That observation was correct, but it was only the symptom. The more
important question was why the guest reached zero-filled bytes there
at all.

Root Cause

The Claude binary contains two adjacent PT_LOAD segments:

  • an executable PT_LOAD (R E) ending at 0x60ec1a0
  • a writable PT_LOAD (RW) starting at 0x60ed1a0

From the guest 4K-page point of view, these are distinct guest pages:

0x60ec000 - 0x60ecfff   RX page
0x60ed000 - 0x60edfff   RW page

On a 16K host, both guest pages fall into the same host page:

0x60ec000 - 0x60effff

The old loader logic could choose host-page ELF loading granularity for
a segment if its p_align matched the host page size. For the later
writable PT_LOAD, p_align = 0x4000, so the loader considered it
eligible for 16K widening and computed:

PAGESTART(0x60ed1a0) = 0x60ec000

instead of the guest-page-granular:

PAGESTART(0x60ed1a0) = 0x60ed000

That caused the later writable segment to be loaded starting from the
same 16K host page that already contained the tail of the earlier
executable segment.

As a result, the earlier .plt/.iplt bytes were overwritten during ELF
loading.

This was verified at a breakpoint in do_init_thread, before guest
execution began: the bytes at 0x60ec080 and 0x60ec120 were already
all zeros before the guest ran, proving this was a loader bug rather
than a runtime corruption bug.

Why 4K Hosts Worked but 16K Hosts Failed

On a 4K host, the two guest pages above are also different host pages,
so they do not interfere.

On a 16K host, they share one host page:

0x60ec000 - 0x60effff

If the later writable segment is widened backward to 0x60ec000, it
overwrites the earlier executable tail page.

So:

  • 4K host: works
  • 16K host: fails

Design History / Risk Assessment

This area already has a non-trivial history in upstream QEMU.

  • In 2014, linux-user: Tell guest about big host page sizes
    (a70daba3771) exposed larger host page granularity via AT_PAGESZ
    so the guest would not assume mappings smaller than the host can honor.

  • In 2018, linux-user: fix ELF load alignment error
    (33143c446e) added the fallback to TARGET_PAGE_SIZE when a
    PT_LOAD's p_align is smaller than the host page size.

  • Later in 2018, linux-user: elf: mmap all the target-pages of hostpage for data segment (94894ff2d13) extended host-page behavior for
    aligned data segments so glibc could consume the remainder of the last
    host page.

However, the current LATX tree no longer fully preserves those original
upstream assumptions:

  1. AT_PAGESZ is forced back to TARGET_PAGE_SIZE, rather than exposing
    the larger host page size;
  2. file-backed PT_LOADs are no longer mapped by rounding the mapping
    length up to the widened host-page length.

So the current LATX tree is not implementing the full original
2014/2018 upstream behavior anymore.

To reduce risk, this patch does not globally force all
host > guest cases back to guest-page granularity. Instead:

  • if widening a PT_LOAD down to the host-page boundary does not
    overlap another PT_LOAD, host-page behavior is preserved;
  • only if widening would overlap another PT_LOAD, we fall back to
    TARGET_PAGE_SIZE for that segment.

Fix

This patch extracts page-granularity selection into:

  • linux-user/elfload-pagesize.h
  • linux-user/elfload-pagesize.c

The new logic is:

  1. Determine whether the current PT_LOAD is eligible for host-page
    alignment based on p_align.

  2. If it is eligible, check whether widening it downward to the host page
    start would overlap the memory range of any other PT_LOAD.

  3. If widening would overlap another PT_LOAD:

    • use TARGET_PAGE_SIZE
  4. Otherwise:

    • keep host-page ELF loading granularity

This does not remove host-page behavior in general. It only prevents a
later segment from extending into bytes that belong to an earlier one.

16K Host Page Layout

Affected host page:

0x60ec000 - 0x60effff

Before the fix:

16K host page: 0x60ec000 - 0x60effff

[0x60ec000 ---------------- 0x60ecfff]   tail of earlier RX PT_LOAD
                                         contains .plt/.iplt
                                         but gets overwritten by widening
                                         the later RW PT_LOAD downward

[0x60ed000 ---------------- 0x60edfff]   beginning of later RW PT_LOAD
[0x60ee000 ---------------- 0x60eefff]   RW PT_LOAD
[0x60ef000 ---------------- 0x60effff]   RW PT_LOAD

Observed broken bytes before guest execution:

0x60ec080: 00 00 00 00 ...
0x60ec120: 00 00 00 00 ...

After the fix:

16K host page: 0x60ec000 - 0x60effff

[0x60ec000 ---------------- 0x60ecfff]   preserved tail of earlier RX PT_LOAD
                                         .plt/.iplt bytes remain intact

[0x60ed000 ---------------- 0x60edfff]   later RW PT_LOAD starts here
[0x60ee000 ---------------- 0x60eefff]   RW PT_LOAD
[0x60ef000 ---------------- 0x60effff]   RW PT_LOAD

Observed correct bytes at do_init_thread after the fix:

0x60ec080:
ff 25 f2 07 0e 00 68 dc 01 00 00 e9 20 e2 ff ff

0x60ec120:
ff 25 a2 07 0e 00 68 00 00 00 00 e9 80 e1 ff ff

Tests

The current LATX tree did not contain an existing automated regression
test for this historical design area, so this PR adds a focused unit
test:

  • tests/unit/test-elfload-pagesize.c

It covers three cases:

  1. overlap must fall back to guest-page granularity;
  2. aligned non-overlapping segments must still keep host-page behavior;
  3. small p_align must still use TARGET_PAGE_SIZE.

Test result:

meson test -C build64 test-elfload-pagesize --print-errorlogs
# 1/1 test-elfload-pagesize OK

The original reproducer was also rerun successfully:

export LATX_AOT=0 LATX_KZT=0
./build64/latx-x86_64 /tmp/claude-code-test/claude --help
# exit code 0

Result

This PR fixes the Claude Code startup crash on 16K-page LoongArch hosts
caused by overlapping PT_LOAD widening during ELF loading.

After the fix:

  • .plt/.iplt is no longer zeroed during load;
  • execution no longer reaches 00 00 -> add [rax], al;
  • Claude --help runs successfully;
  • focused regression tests are added to cover both the overlap case and
    the older non-overlapping host-page-aligned behavior.
中文说明

问题概述

这个 PR 修复了 LoongArch 16K 宿主页环境下,linux-user ELF 装载阶段
对相邻 PT_LOAD 段处理不当的问题。

问题最初是通过 Claude Code 官方发布的 x86_64 Linux 原生二进制复现的,
对应发布资产名为:

claude-linux-x64.tar.gz

来源:

  • 官方安装入口:https://claude.ai/install.sh
  • 官方发布页:https://github.com/anthropics/claude-code/releases

在 16K 页大小的 LoongArch Linux 内核上,运行下面的最小命令会直接段错误:

export LATX_AOT=0 LATX_KZT=0
./build64/latx-x86_64 /tmp/claude-code-test/claude --help

修复前:

  • 直接 SIGSEGV
  • 返回码 139

修复后:

  • 正常退出
  • 返回码 0
  • 正常打印 Claude Code 的帮助信息

直接症状

崩溃时执行到了 guest 字节 00 00,它会被解码为:

add byte ptr [rax], al

此时 rax == 0,翻译后的 LoongArch 访存在地址 0 触发异常。

这个现象本身没有错,但它只是症状。真正的问题是:
为什么 guest 会执行到一片本不该是 0 的区域。

根因分析

Claude 二进制里有两个相邻的 PT_LOAD

  • 一个可执行 PT_LOADR E),结束于 0x60ec1a0
  • 一个可写 PT_LOADRW),开始于 0x60ed1a0

从 guest 4K 页视角看,它们分别是:

0x60ec000 - 0x60ecfff   RX 页
0x60ed000 - 0x60edfff   RW 页

这是合法布局。

但在 16K host 上,这两个 guest 4K 页都落进同一个 host 页:

0x60ec000 - 0x60effff

旧逻辑在 p_align 满足 host 页对齐时,会按 host 页粒度计算
PT_LOADPAGESTART。对后一个 RW PT_LOAD 来说,
p_align = 0x4000,于是会得到:

PAGESTART(0x60ed1a0) = 0x60ec000

而不是 guest 4K 语义下更合理的:

PAGESTART(0x60ed1a0) = 0x60ed000

这样后一个可写段在装载时就回踩到了前一个可执行段所在的
同一 16K 页前半部分,把 .plt/.iplt 覆盖掉了。

我在 do_init_thread 断点处检查过,确认在 guest 开始执行前:

  • 0x60ec080
  • 0x60ec120

这些本应属于 .plt/.iplt 的位置,修复前已经全是 0x00

也就是说:

  • 不是运行过程中踩坏
  • 不是 translator 翻译错了指令
  • 而是 ELF 装载阶段就已经把这一页装坏了

随后执行流进入这页零字节:

00 00

按 x86_64 语义正常解码为:

add byte ptr [rax], al

最后在 rax == 0 时触发段错误。

为什么 4K 内核正常,16K 内核出错

在 4K host 上:

  • 0x60ec000 - 0x60ecfff
  • 0x60ed000 - 0x60edfff

本来就是两个不同的 host 页,所以不会互相覆盖。

在 16K host 上:

  • 这两个 guest 页共处
    0x60ec000 - 0x60effff
    这个 host 页

如果后一个 PT_LOAD 被错误地向下扩展到 0x60ec000
就会覆盖前一个段的尾页内容。

因此:

  • 4K host:正常
  • 16K host:出错

历史设计背景 / 风险评估

这个区域在 upstream QEMU 里本来就有一段比较复杂的历史。

  • 2014 年,linux-user: Tell guest about big host page sizes
    (a70daba3771) 通过 AT_PAGESZ 把更大的 host 页粒度告诉 guest。

  • 2018 年,linux-user: fix ELF load alignment error
    (33143c446e) 又补了当 PT_LOADp_align 小于 host 页大小时,
    退回 TARGET_PAGE_SIZE 的逻辑。

  • 2018 年稍后,linux-user: elf: mmap all the target-pages of hostpage for data segment (94894ff2d13) 又继续保住 aligned data segment
    的 host-page 行为,避免 glibc 消费最后一个 host 页时出错。

但是当前 LATX 代码树已经不再完整保留 upstream 当年的那套前提:

  1. AT_PAGESZ 已经固定成 TARGET_PAGE_SIZE
    不再把 host 大页告诉 guest;
  2. 文件型 PT_LOAD 的装载路径也不再按放大后的 host 页长度整体 mmap。

所以当前 LATX 树并不是在完整沿用 2014/2018 upstream 那套
“大 host 页语义”设计。

为了降低风险,这个 PR 没有简单粗暴地把所有
host > guest 的场景都退回 guest 页,而是更窄地处理:

  • 如果某个 PT_LOAD 按 host 页向下扩展后不会与其它 PT_LOAD 重叠,
    就继续保留 host 页行为;
  • 只有当按 host 页向下扩展会重叠到别的 PT_LOAD 时,
    才退回 TARGET_PAGE_SIZE

修复方案

这次修复把页粒度判定逻辑抽成了独立 helper:

  • linux-user/elfload-pagesize.h
  • linux-user/elfload-pagesize.c

新逻辑如下:

  1. 先按原有规则判断当前段是否有资格使用 host 大页:

    • p_align 是否满足 host 页对齐要求
  2. 如果有资格,再额外检查:

    • 当前段如果按 host 页向下扩展,
    • 扩展出来的前半段地址范围,
    • 是否会和其它 PT_LOAD 的内存区间相交
  3. 若相交:

    • 该段退回 TARGET_PAGE_SIZE
  4. 若不相交:

    • 仍保留 host 页粒度

也就是说,这个 PR 修复的不是“host 大页行为本身”,
而是防止 host 大页对齐把后一个段错误地扩进前一个段里。

16K 页内布局图

受影响的 host 页是:

0x60ec000 - 0x60effff

修复前:

16K host page: 0x60ec000 - 0x60effff

[0x60ec000 ---------------- 0x60ecfff]   原本属于前一个 RX PT_LOAD 尾页
                                         其中包含 .plt/.iplt
                                         但被后一个 RW PT_LOAD 向下扩展后覆盖

[0x60ed000 ---------------- 0x60edfff]   后一个 RW PT_LOAD 的前部
[0x60ee000 ---------------- 0x60eefff]   RW PT_LOAD
[0x60ef000 ---------------- 0x60effff]   RW PT_LOAD

实际观测到的修复前内容:

0x60ec080: 00 00 00 00 ...
0x60ec120: 00 00 00 00 ...

修复后:

16K host page: 0x60ec000 - 0x60effff

[0x60ec000 ---------------- 0x60ecfff]   保留前一个 RX PT_LOAD 尾页
                                         .plt/.iplt 内容正确

[0x60ed000 ---------------- 0x60edfff]   后一个 RW PT_LOAD 从这里开始
[0x60ee000 ---------------- 0x60eefff]   RW PT_LOAD
[0x60ef000 ---------------- 0x60effff]   RW PT_LOAD

修复后在 do_init_thread 前检查到:

0x60ec080:
ff 25 f2 07 0e 00 68 dc 01 00 00 e9 20 e2 ff ff

0x60ec120:
ff 25 a2 07 0e 00 68 00 00 00 00 e9 80 e1 ff ff

这些字节与二进制文件中的 .plt/.iplt 一致,
说明装载阶段不再错误清零该页。

测试

当前代码树里没有现成覆盖这个历史设计问题的自动化测试,
所以这个 PR 新增了一个最小单元测试:

  • tests/unit/test-elfload-pagesize.c

覆盖了 3 个场景:

  1. overlap 场景必须回退到 guest 页;
  2. aligned 且不重叠的场景继续保留 host 大页行为;
  3. p_align 小于 host 页大小时仍回退到 guest 页。

运行结果:

meson test -C build64 test-elfload-pagesize --print-errorlogs
# 1/1 test-elfload-pagesize OK

同时原始复现也重新验证通过:

export LATX_AOT=0 LATX_KZT=0
./build64/latx-x86_64 /tmp/claude-code-test/claude --help
# exit code 0

结果

这个 PR 修复了 16K LoongArch 宿主上,Claude Code x86_64 二进制在启动阶段
因为相邻 PT_LOAD 回踩而导致的崩溃问题。

修复后:

  • 不再错误清零 .plt/.iplt
  • 不再执行到 00 00 -> add [rax], al
  • Claude --help 可正常运行
  • 同时新增了针对历史设计风险点的回归测试

On host pages larger than TARGET_PAGE_SIZE, keep using host-page
ELF load granularity only when widening a PT_LOAD down to the host
page boundary does not overlap another PT_LOAD.

The Claude Code x86_64 binary places the tail of an RX PT_LOAD and
start of a later RW PT_LOAD in the same 16K host page. Widening the
RW PT_LOAD backward to the host page start overwrites the earlier
.plt/.iplt bytes with zeros before guest execution begins. Execution
then reaches 0x00 0x00, decodes it as 'add byte ptr [rax], al', and
faults when the translated access touches address 0.

Fix this by selecting TARGET_PAGE_SIZE only for PT_LOADs whose
host-page widening would overlap another PT_LOAD. This preserves the
older host-page behavior for aligned non-overlapping data segments
while preventing a later writable segment from clobbering an earlier
executable tail page.

Add unit tests covering the overlap fallback, the aligned
non-overlapping host-page case, and the small-p_align fallback.
@LaurenIsACoder
Copy link
Copy Markdown
Contributor Author

@xiangzhai This PR should address the 16K-host-page Claude Code startup
crash you reported on LoongArch.

The fix is narrowed so that we only fall back to guest-page granularity
when widening a PT_LOAD to the host page would overlap another PT_LOAD;
aligned non-overlapping cases still keep the older host-page behavior.

If convenient, could you help verify it on your setup?

@xiangzhai
Copy link
Copy Markdown
Contributor

Hi @LaurenIsACoder

elfload正常工作了:

(gdb) x/20hex 0x60ec080
0x60ec080:	0x25ff	0x07f2	0x000e	0xdc68	0x0001	0xe900	0xe220	0xffff

.plt0x60ec080一样了:

00000000060ec080 <secure_getenv@plt>:
 60ec080:       ff 25 f2 07 0e 00       jmpq   *0xe07f2(%rip)        # 61cc878 <secure_getenv@GLIBC_2.17>

Thanks,
Leslie Zhai

@sunhaiyong1978
Copy link
Copy Markdown
Contributor

测试可以在16K页下跑起来了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants