Skip to content

clamp angular weight span before narrowing to int#649

Open
sahvx655-wq wants to merge 1 commit into
ARM-software:mainfrom
sahvx655-wq:weight-span-int-clamp
Open

clamp angular weight span before narrowing to int#649
sahvx655-wq wants to merge 1 commit into
ARM-software:mainfrom
sahvx655-wq:weight-span-int-clamp

Conversation

@sahvx655-wq

Copy link
Copy Markdown
Contributor

compute_lowest_and_highest_weight() in the angular weight search narrows the per-step span with float_to_int(maxidx - minidx + 1). The span is derived from the block's ideal decimated weights, which in turn come from the input texels, so an HDR source carrying +/-inf or NaN (both legal in fp16/fp32 and routinely present in EXR content) makes the span non-finite and the conversion runs outside the representable int range. UBSan reports "inf is outside the range of representable values of type int" at the vecmathlib float_to_int, reached directly from astcenc_compress_image on a normal HDR compress; the value is silently garbage on the platforms where it does not trap.

The span is already clamped into [2, max_quant_steps + 3] on the next two lines, so I clamp the float into that same range before the conversion rather than altering the shared float_to_int primitive. That keeps the change at the one spot that can observe non-finite input and stays bit-identical for every finite in-range value, leaving the existing integer min/max untouched.

@solidpixel

Copy link
Copy Markdown
Contributor

This is the wrong fix for this - it slows down the codec for an unlikely corner case. Inf and NaN should instead be stripped as part of the LNS encoding on entry.

Can you share a reproducer that triggers this one?

@sahvx655-wq

Copy link
Copy Markdown
Contributor Author

Makes sense, and you know the codec internals better than I do here. I came at it from the narrowing site because that's where UBSan trapped, but scrubbing inf/NaN during the LNS conversion on entry is clearly the better place: it fixes it once at the boundary and keeps the hot angular loop untouched, so the per-step clamp I added becomes redundant. Happy to drop this in favour of that. I'll leave the PR open for now in case you'd rather I retarget it, but no objection to you handling it on the entry path.

On the reproducer: I hit it fuzzing astcenc_compress_image with HDR F32 inputs that carry non-finite lanes (legal in fp16/fp32/EXR). The span passed to float_to_int(maxidx - minidx + 1) in compute_lowest_and_highest_weight goes non-finite because the ideal decimated weights inherit the inf/NaN from the texels, and the conversion is then out of int range. Minimal trigger built against the core library with -fsanitize=undefined:

// repro.cpp
#include "astcenc.h"
#include <cstdint>
#include <limits>
#include <vector>

int main()
{
    const unsigned int dim = 4;
    astcenc_config config{};
    astcenc_config_init(ASTCENC_PRF_HDR, dim, dim, 1, ASTCENC_PRE_MEDIUM, 0, &config);

    astcenc_context* ctx = nullptr;
    astcenc_context_alloc(&config, 1, &ctx, nullptr);

    std::vector<float> texels(dim * dim * 4, 1.0f);
    texels[0]  =  std::numeric_limits<float>::infinity();   // R
    texels[5]  = -std::numeric_limits<float>::infinity();   // G
    texels[10] =  std::numeric_limits<float>::quiet_NaN();  // B

    void* slices[1] = { texels.data() };
    astcenc_image image{};
    image.dim_x = dim; image.dim_y = dim; image.dim_z = 1;
    image.data_type = ASTCENC_TYPE_F32;
    image.data = slices;

    astcenc_swizzle swz { ASTCENC_SWZ_R, ASTCENC_SWZ_G, ASTCENC_SWZ_B, ASTCENC_SWZ_A };

    uint8_t out[16];
    astcenc_compress_image(ctx, &image, &swz, out, sizeof(out), 0);
    astcenc_context_free(ctx);
    return 0;
}

Built with the default scalar float_to_int (ISA none), UBSan reports "inf is outside the range of representable values of type int" at astcenc_vecmathlib_none_4.h:985, with the stack running float_to_int -> compute_lowest_and_highest_weight (astcenc_weight_align.cpp:229) -> compute_angular_endpoints_for_quant_levels -> compute_angular_endpoints_2planes -> compress_symbolic_block_for_partition_2planes -> astcenc_compress_image. A short random fuzz loop over 4x4..12x12 HDR blocks with a few inf/NaN lanes trips it within a few thousand iterations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants