Three Hexagon bugs found by Gemini Pro. by mcourteaux · Pull Request #8989 · halide/Halide

mcourteaux · 2026-03-06T16:20:52Z

Below the relevant pieces of explanation of the LLM:

The Context

When Halide deinterleaves an RGB vector (stride 3), it translates to chained shuffle_vectors operations. Because a stride of 3 does not fit the power-of-two permutations required by Hexagon's butterfly network (vdelta/vrdelta), shuffle_vectors falls back to the generalized table lookup function: vlut(lut, indices).

For standard Hexagon operations (like 128-byte vectors), the concatenated input vectors (a and b) result in a lut containing 256 elements. Deinterleaving might even push this max_index higher if double-vectors are involved.

The Bugs

Inside CodeGen_Hexagon::vlut(Value *lut, Value *idx, int min_index, int max_index), when max_index >= 256, the code breaks the lookup into manageable 256-element chunks. However, there are two critical off-by-one errors in the chunking loop:

1. Truncating the Highest Index (`range_extent_i` vs `size`)

// Existing buggy code
int range_extent_i = std::min(max_index - min_index_i, 255);
Value *range_i = vlut256(slice_vector(lut, min_index_i, range_extent_i),
                         indices, 0, range_extent_i);

Here, range_extent_i is calculated to be a maximum of 255. It is then passed as the size argument to slice_vector.
If you slice 255 elements, the valid indices for that slice are 0 through 254. The element at relative index 255 is completely dropped.
When a vector lane requests index 255, it attempts an out-of-bounds access on the 255-element slice, loading garbage/undef memory into that lane.

2. Premature Loop Termination (Skipping the Final Chunk)

// Existing buggy code
for (int min_index_i = 0; min_index_i < max_index; min_index_i += 256) {

The loop condition uses strictly less-than (<). If max_index happens to be exactly a multiple of 256 (e.g., exactly 256), the loop reaches min_index_i = 256, evaluates 256 < 256 as false, and exits.
Because the chunk responsible for indices 256–511 is never generated, any lane requesting index 256 relies on the first chunk (0–255). But wait! The first chunk truncates 16-bit indices to 8-bit (256 & 255 = 0). Consequently, the lane requesting index 256 will silently read the value for index 0.

The Fix

To fix the interleave_rgb failure, the vlut chunking loop needs to be adjusted so that it operates inclusively (<= max_index) and slices the correct capacity (local_max_index + 1):

// Corrected loop
for (int min_index_i = 0; min_index_i <= max_index; min_index_i += 256) {
    // Make a vector of the indices shifted such that the min of
    // this range is at 0. Use 16-bit indices for this.
    Value *min_index_i_val = create_vector(i16x_t, min_index_i);
    Value *indices = builder->CreateSub(idx16, min_index_i_val);

    // Create a condition value for which elements of the range are valid
    // for this index.
    Value *use_index = builder->CreateICmpSGT(indices, minus_one);

    // After we've eliminated the invalid elements, we can
    // truncate to 8 bits, as vlut requires.
    indices = call_intrin(i8x_t, "halide.hexagon.pack.vh", {indices});

    // FIX: Calculate the max local index (up to 255) and the required slice size (+ 1)
    int local_max_index = std::min(max_index - min_index_i, 255);
    int slice_size = local_max_index + 1;
    
    Value *range_i = vlut256(slice_vector(lut, min_index_i, slice_size),
                             indices, 0, local_max_index);
    ranges.emplace_back(range_i, use_index);
}

This perfectly aligns the size of the sliced vector with the max_index expected internally by vlut256, preventing both the garbage memory reads and the index aliasing.

Based on an analysis of the provided Halide Hexagon backend code, there are two critical bugs and one missed optimization (potential logical bug) specifically related to how Shuffles are handled.

Both primary bugs revolve around the mishandling of -1 sentinel values, which Halide uses in shuffle masks to represent undef elements (i.e., "don't care" indices).

1. `min` index calculation ignores `undef` elements, breaking optimizations

In CodeGen_Hexagon::shuffle_vectors, there is an attempt to optimize shuffles that strictly access the second vector (b) by ignoring a.

// Try to rewrite shuffles that only access the elements of b.
int min = indices[0];
for (size_t i = 1; i < indices.size(); i++) {
    if (indices[i] != -1 && indices[i] < min) {
        min = indices[i];
    }
}
if (min >= a_elements) {
    ... // Re-write optimization
}

The Bug:
If the very first element of the shuffle mask happens to be an undef (i.e. indices[0] == -1), min is initialized to -1. Because all valid indices are >= 0, the loop condition indices[i] < min evaluates to indices[i] < -1, which is always false.

Consequently, min will remain -1 forever, and the condition if (min >= a_elements) will fail (since a_elements > 0). This quietly and permanently disables the b-only rewrite optimization for any shuffle mask that starts with an undef.

The Fix:
Initialize min to a sufficiently large value (e.g., INT_MAX) instead of indices[0].

3. Missing Optimization Fallback (Delegating to the Wrong Overload)

Going back to CodeGen_Hexagon::shuffle_vectors:

if (min >= a_elements) {
    vector<int> shifted_indices(indices);
    for (int &i : shifted_indices) {
        if (i != -1) { i -= a_elements; }
    }
    return shuffle_vectors(b, shifted_indices);
}

The Bug / Missed Optimization:
Notice that it recursively calls shuffle_vectors(b, shifted_indices). That is the two-argument overload. However, CodeGen_Hexagon only implements the three-argument override shuffle_vectors(Value *a, Value *b, ...).

Because of the using CodeGen_Posix::shuffle_vectors; directive at the top of the class, this delegates completely out of the Hexagon backend and directly to the parent LLVM POSIX implementation. As a result, the compiler emits standard, generic LLVM IR shufflevector instructions instead of trying to further lower this single-vector shuffle using Hexagon-specific optimized instructions (like vdelta, vlut, etc.) that are mapped inside the CodeGen_Hexagon scope.

The Fix:
This should likely be updated to call shuffle_vectors(b, b, shifted_indices); (or similar) so it stays within the CodeGen_Hexagon logic.

Co-authored-by: Gemini 3 Pro <gemini@aistudio.google.com>

alexreinking

It would be great if Gemini could also derive regression tests for these bugs.

alexreinking · 2026-03-06T18:15:07Z

src/CodeGen_Hexagon.cpp

+    int min = INT_MAX;
+    for (int idx : indices) {
+        if (idx != -1 && idx < min) {
+            min = idx;
        }
    }


This is exactly a "premature optimization is the root of all evil" situation.

I do miss Haskell sometimes... minimum (filter (/= -1) indices)

…care indices.

mcourteaux · 2026-03-07T13:01:50Z

I tried to get it to produce a test, but not much luck. It basically independently rewrote interleave_rgb, being convinced that would trigger it. I don't know if I'm missing a small difference between interleave_rgb and the new LLM-provided test, but I wouldn't expect them to behave differently. I should maybe still try that test on main.

mcourteaux and others added 2 commits March 6, 2026 17:20

Two bugs found by Gemini Pro.

6f71253

Co-authored-by: Gemini 3 Pro <gemini@aistudio.google.com>

Another bug found by Gemini Pro.

7d9370d

Co-authored-by: Gemini 3 Pro <gemini@aistudio.google.com>

mcourteaux requested review from abadams and alexreinking March 6, 2026 16:22

mcourteaux changed the title ~~Two bugs found by Gemini Pro.~~ Three bugs found by Gemini Pro. Mar 6, 2026

mcourteaux changed the title ~~Three bugs found by Gemini Pro.~~ Three Hexagon bugs found by Gemini Pro. Mar 6, 2026

alexreinking approved these changes Mar 6, 2026

View reviewed changes

Fix infinite recursion on shuffles of vectors with exclusively don't-…

3c56378

…care indices.

mcourteaux requested a review from halidebuildbots March 6, 2026 22:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Three Hexagon bugs found by Gemini Pro.#8989

Three Hexagon bugs found by Gemini Pro.#8989
mcourteaux wants to merge 3 commits intohalide:mainfrom
mcourteaux:hvx-bugs

mcourteaux commented Mar 6, 2026 •

edited

Loading

Uh oh!

alexreinking left a comment

Uh oh!

alexreinking Mar 6, 2026

Uh oh!

alexreinking Mar 6, 2026

Uh oh!

mcourteaux commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mcourteaux commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Context

The Bugs

1. Truncating the Highest Index (range_extent_i vs size)

2. Premature Loop Termination (Skipping the Final Chunk)

The Fix

1. min index calculation ignores undef elements, breaking optimizations

3. Missing Optimization Fallback (Delegating to the Wrong Overload)

Uh oh!

alexreinking left a comment

Choose a reason for hiding this comment

Uh oh!

alexreinking Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

alexreinking Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

mcourteaux commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mcourteaux commented Mar 6, 2026 •

edited

Loading

1. Truncating the Highest Index (`range_extent_i` vs `size`)

1. `min` index calculation ignores `undef` elements, breaking optimizations