Three Hexagon bugs found by Gemini Pro.#8989
Open
mcourteaux wants to merge 3 commits intohalide:mainfrom
Open
Conversation
Co-authored-by: Gemini 3 Pro <gemini@aistudio.google.com>
Co-authored-by: Gemini 3 Pro <gemini@aistudio.google.com>
alexreinking
approved these changes
Mar 6, 2026
Member
alexreinking
left a comment
There was a problem hiding this comment.
It would be great if Gemini could also derive regression tests for these bugs.
Comment on lines
+1161
to
1166
| int min = INT_MAX; | ||
| for (int idx : indices) { | ||
| if (idx != -1 && idx < min) { | ||
| min = idx; | ||
| } | ||
| } |
Member
There was a problem hiding this comment.
This is exactly a "premature optimization is the root of all evil" situation.
Member
There was a problem hiding this comment.
I do miss Haskell sometimes... minimum (filter (/= -1) indices)
Contributor
Author
|
I tried to get it to produce a test, but not much luck. It basically independently rewrote interleave_rgb, being convinced that would trigger it. I don't know if I'm missing a small difference between interleave_rgb and the new LLM-provided test, but I wouldn't expect them to behave differently. I should maybe still try that test on main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Below the relevant pieces of explanation of the LLM:
The Context
When Halide deinterleaves an RGB vector (stride 3), it translates to chained
shuffle_vectorsoperations. Because a stride of 3 does not fit the power-of-two permutations required by Hexagon's butterfly network (vdelta/vrdelta),shuffle_vectorsfalls back to the generalized table lookup function:vlut(lut, indices).For standard Hexagon operations (like
128-byte vectors), the concatenated input vectors (aandb) result in alutcontaining256elements. Deinterleaving might even push thismax_indexhigher if double-vectors are involved.The Bugs
Inside
CodeGen_Hexagon::vlut(Value *lut, Value *idx, int min_index, int max_index), whenmax_index >= 256, the code breaks the lookup into manageable 256-element chunks. However, there are two critical off-by-one errors in the chunking loop:1. Truncating the Highest Index (
range_extent_ivssize)Here,
range_extent_iis calculated to be a maximum of255. It is then passed as thesizeargument toslice_vector.If you slice
255elements, the valid indices for that slice are0through254. The element at relative index255is completely dropped.When a vector lane requests index
255, it attempts an out-of-bounds access on the 255-element slice, loading garbage/undef memory into that lane.2. Premature Loop Termination (Skipping the Final Chunk)
The loop condition uses strictly less-than (
<). Ifmax_indexhappens to be exactly a multiple of 256 (e.g., exactly256), the loop reachesmin_index_i = 256, evaluates256 < 256asfalse, and exits.Because the chunk responsible for indices
256–511is never generated, any lane requesting index256relies on the first chunk (0–255). But wait! The first chunk truncates 16-bit indices to 8-bit (256 & 255 = 0). Consequently, the lane requesting index256will silently read the value for index0.The Fix
To fix the
interleave_rgbfailure, thevlutchunking loop needs to be adjusted so that it operates inclusively (<= max_index) and slices the correct capacity (local_max_index + 1):This perfectly aligns the size of the sliced vector with the
max_indexexpected internally byvlut256, preventing both the garbage memory reads and the index aliasing.Based on an analysis of the provided Halide Hexagon backend code, there are two critical bugs and one missed optimization (potential logical bug) specifically related to how Shuffles are handled.
Both primary bugs revolve around the mishandling of
-1sentinel values, which Halide uses in shuffle masks to representundefelements (i.e., "don't care" indices).1.
minindex calculation ignoresundefelements, breaking optimizationsIn
CodeGen_Hexagon::shuffle_vectors, there is an attempt to optimize shuffles that strictly access the second vector (b) by ignoringa.The Bug:
If the very first element of the shuffle mask happens to be an
undef(i.e.indices[0] == -1),minis initialized to-1. Because all valid indices are>= 0, the loop conditionindices[i] < minevaluates toindices[i] < -1, which is always false.Consequently,
minwill remain-1forever, and the conditionif (min >= a_elements)will fail (sincea_elements > 0). This quietly and permanently disables theb-only rewrite optimization for any shuffle mask that starts with anundef.The Fix:
Initialize
minto a sufficiently large value (e.g.,INT_MAX) instead ofindices[0].3. Missing Optimization Fallback (Delegating to the Wrong Overload)
Going back to
CodeGen_Hexagon::shuffle_vectors:The Bug / Missed Optimization:
Notice that it recursively calls
shuffle_vectors(b, shifted_indices). That is the two-argument overload. However,CodeGen_Hexagononly implements the three-argument overrideshuffle_vectors(Value *a, Value *b, ...).Because of the
using CodeGen_Posix::shuffle_vectors;directive at the top of the class, this delegates completely out of the Hexagon backend and directly to the parent LLVM POSIX implementation. As a result, the compiler emits standard, generic LLVM IRshufflevectorinstructions instead of trying to further lower this single-vector shuffle using Hexagon-specific optimized instructions (likevdelta,vlut, etc.) that are mapped inside theCodeGen_Hexagonscope.The Fix:
This should likely be updated to call
shuffle_vectors(b, b, shifted_indices);(or similar) so it stays within theCodeGen_Hexagonlogic.