Fix uninitialized variable in float_to_bf16_rtn_asm() causing incorrect rounding under -O3 by zhyajie · Pull Request #3715 · ROCm/composable_kernel

zhyajie · 2026-02-24T08:42:18Z

Summary

Fix uninitialized tmp variable in float_to_bf16_rtn_asm() that causes the compiler to incorrectly alias registers under -O3, producing wrong BF16 conversion results for ~50% of inputs.

Problem

float_to_bf16_rtn_asm() in bfloat16.hpp (used when CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT=3 / standard_asm) produces incorrect Round-to-Nearest-Even results when compiled with -O3.

Environment: hipcc (AMD clang 19.0.0, ROCm 6.4.3), -O3, gfx942

Root Cause

The inline assembly declares tmp with a "+v" (read+write) constraint but never initializes it:

uint32_t tmp;  // uninitialized
asm volatile("..."
    : "=s"(check_nan), "+v"(tmp), "+v"(u.fp32)   // %0, %1, %2
    : "v"(ROUND_BIAS_FOR_BF16), "v"(FP32_NAN));  // %3, %4

Under -O3, the compiler's register allocator treats tmp (%1) as having an undefined initial value and aggressively reuses registers, assigning %1 (tmp) and %3 (ROUND_BIAS_FOR_BF16 = 0x7fff) to the same VGPR.

-O3 generated assembly (BROKEN): %1 and %3 both mapped to v5

v_bfe_u32 v5, v4, 16, 1         ; v5 = lsb (overwrites the 0x7fff value!)
v_add3_u32 v5, v4, v5, v5       ; v5 = bits + lsb + lsb  (should be bits + lsb + 0x7fff)

-O0 generated assembly (CORRECT): %1 → v9, %3 → v12

v_bfe_u32 v9, v8, 16, 1         ; v9 = lsb
v_add3_u32 v9, v8, v9, v12      ; v9 = bits + lsb + 0x7fff

The v_bfe_u32 instruction writes to %1, destroying the value of %3 when they share the same register. This breaks the v_add3_u32 rounding computation, causing ~50% of FP32 to BF16 conversions to be off by 1 ULP.

…s under -O3 Initialize `tmp` to 0 in the inline assembly of `float_to_bf16_rtn_asm()`. Without initialization, the compiler under -O3 may alias the `tmp` operand (%1) with the ROUND_BIAS_FOR_BF16 input operand (%3) in the same VGPR, causing v_bfe_u32 to overwrite the 0x7fff bias before v_add3_u32 reads it. This produces incorrect BF16 rounding for ~50% of inputs.

zhyajie requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent, vidyasagar-amd and vpietila-amd as code owners February 24, 2026 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix uninitialized variable in float_to_bf16_rtn_asm() causing incorrect rounding under -O3#3715

Fix uninitialized variable in float_to_bf16_rtn_asm() causing incorrect rounding under -O3#3715
zhyajie wants to merge 1 commit intoROCm:developfrom
zhyajie:fix/bf16-asm-uninitialized-tmp

zhyajie commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhyajie commented Feb 24, 2026

Summary

Problem

Root Cause

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant