Skip to content

[AIMIGRAPHX-885] Add slice squeeze matcher#5004

Merged
causten merged 10 commits into
developfrom
add_slice_squeeze_matcher_clean
Jun 29, 2026
Merged

[AIMIGRAPHX-885] Add slice squeeze matcher#5004
causten merged 10 commits into
developfrom
add_slice_squeeze_matcher_clean

Conversation

@TedThemistokleous

@TedThemistokleous TedThemistokleous commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Motivation

Further split of the changes regarding the MLP tower components

#4723

Handle the slice squeeze matching such that we can

rewrite slice->squeeze->pointwise/reduce into slice->pointwise/reduce->squeeze

This is used in the MLP tower before we perform a horizontal fusion down the pipeline for the pointwise activation op. In that case a SILU, but this can be used for other activations

Technical Details

Changelog Category

Add a CHANGELOG.md entry for any option other than Not Applicable

    • Added: New functionality.
    • Changed: Changes to existing functionality.
    • Removed: Functionality or support that has been removed. (Compared to a previous release)
    • Optimized: Component performance that has been optimized or improved.
    • Resolved Issues: Known issues from a previous version that have been resolved.
    • Not Applicable: This PR is not to be included in the changelog.

Port the find_slice_squeeze matcher from the MLP_prediction_towers branch.
This matcher rewrites slice->squeeze->pointwise/reduce into
slice->pointwise/reduce->squeeze (unsqueezing the other inputs), which lets
the squeeze propagate downstream and parallel slice branches merge back
together. Includes the associated unit tests.
Replace the brittle "not pointwise" check with the shared is_reduce helper
from find_op_shape_transform_op so reduce/argmin/argmax detection is precise
and consistent with the rest of the pass.
Delegate the reduce/argmin axis remapping in find_slice_squeeze to the shared
insert() helper by building a source->common axes map for the unsqueeze.
This removes the hand-rolled axis-shifting logic, keeps behavior consistent
with find_op_shape_transform_op, and additionally handles layout permutations
for free.
@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5004      +/-   ##
===========================================
+ Coverage    92.69%   92.69%   +0.01%     
===========================================
  Files          596      596              
  Lines        31603    31631      +28     
===========================================
+ Hits         29292    29320      +28     
  Misses        2311     2311              
Files with missing lines Coverage Δ
src/simplify_reshapes.cpp 98.06% <100.00%> (+0.06%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TedThemistokleous TedThemistokleous added the bugfix Fixes a bug found in the code. label Jun 22, 2026
@gh-app-migraphx-bot-pr-write

gh-app-migraphx-bot-pr-write Bot commented Jun 23, 2026

Copy link
Copy Markdown
Test Batch New Rate (5ee371) Old Rate (90b175)* Diff Status
torchvision-resnet50 64 3,154.84 886.14 256.02% 🔆
torchvision-resnet50_fp16 64 6,639.57 1,735.02 282.68% 🔆
torchvision-densenet121 32 2,695.44 700.76 284.65% 🔆
torchvision-densenet121_fp16 32 4,536.64 1,439.55 215.14% 🔆
torchvision-inceptionv3 32 1,799.18 538.77 233.94% 🔆
torchvision-inceptionv3_fp16 32 2,819.61 182.21 1447.41% 🔆
cadene-inceptionv4 16 822.82 205.30 300.78% 🔆
cadene-resnext64x4 16 783.45 179.33 336.88% 🔆
slim-mobilenet 64 8,389.35 1,806.11 364.50% 🔆
slim-nasnetalarge 64 228.93 115.20 98.73% 🔆
slim-resnet50v2 64 3,165.04 1,029.11 207.55% 🔆
bert-mrpc-onnx 8 1,171.24 137.82 749.83% 🔆
bert-mrpc-tf 1 485.62 17.79 2629.72% 🔆
pytorch-examples-wlang-gru 1 329.27 35.57 825.58% 🔆
pytorch-examples-wlang-lstm 1 450.98 145.48 209.99% 🔆
torchvision-resnet50_1 1 748.50 45.24 1554.46% 🔆
cadene-dpn92_1 1 441.14 27.50 1504.13% 🔆
cadene-resnext101_1 1 366.42 28.35 1192.51% 🔆
onnx-taau-downsample 1 400.33 103.47 286.89% 🔆
dlrm-criteoterabyte 1 32.40 23.55 37.59% 🔆
dlrm-criteoterabyte_fp16 1 51.90 28.67 81.03% 🔆
agentmodel 1 10,218.53 640.03 1496.56% 🔆
unet_fp16 2 57.01 12.84 344.07% 🔆
resnet50v1_fp16 1 936.73 186.82 401.42% 🔆
resnet50v1_int8 1 941.90 150.02 527.86% 🔆
bert_base_cased_fp16 64 1,097.77 424.43 158.65% 🔆
bert_large_uncased_fp16 32 346.48 191.64 80.79% 🔆
bert_large_fp16 1 203.74 203.46 0.14%
distilgpt2_fp16 16 2,094.56 791.60 164.60% 🔆
yolov5s 1 594.95 157.53 277.67% 🔆
tinyllama 1 16.94 28.67 -40.91% 🔴
vicuna-fastchat 1 43.87 33.14 32.39% 🔆
whisper-tiny-encoder 1 417.64 418.36 -0.17%
whisper-tiny-decoder 1 415.17 128.92 222.04% 🔆
llama2_7b 1 20.34 11.70 73.90% 🔆
qwen1.5-7b 1 13.50 15.16 -10.97% 🔴
phi3-3.8b 1 26.70 13.69 95.02% 🔆
llama3-8b 1 8.55 15.19 -43.73% 🔴
whisper-large-encoder 1 4.15 5.78 -28.24% 🔴
whisper-large-decoder 1 24.42 31.17 -21.65% 🔴
mistral-7b 1 23.75 21.52 10.38% 🔆
FLUX.1-schnell 1 771.91 586.06 31.71% 🔆

Regressions detected 🔴

* No develop baseline was found for this PR's branch point; compared against the latest available develop run instead.

@gh-app-migraphx-bot-pr-write

gh-app-migraphx-bot-pr-write Bot commented Jun 23, 2026

Copy link
Copy Markdown
Test Status Result
bert-mrpc-onnx PASSED: MIGraphX meets tolerance
bert-mrpc-tf PASSED: MIGraphX meets tolerance
pytorch-examples-wlang-gru PASSED: MIGraphX meets tolerance
pytorch-examples-wlang-lstm PASSED: MIGraphX meets tolerance
dlrm-criteoterabyte PASSED: MIGraphX meets tolerance
agentmodel PASSED: MIGraphX meets tolerance
unet PASSED: MIGraphX meets tolerance
resnet50v1 PASSED: MIGraphX meets tolerance
bert_base_cased_fp16 PASSED: MIGraphX meets tolerance
bert_large_uncased_fp16 🔴 FAILED: MIGraphX is not within tolerance - check verbose output
bert_large PASSED: MIGraphX meets tolerance
yolov5s PASSED: MIGraphX meets tolerance
tinyllama PASSED: MIGraphX meets tolerance
vicuna-fastchat PASSED: MIGraphX meets tolerance
whisper-tiny-encoder PASSED: MIGraphX meets tolerance
whisper-tiny-decoder PASSED: MIGraphX meets tolerance
distilgpt2_fp16 🔴 FAILED: MIGraphX is not within tolerance - check verbose output
llama2_7b PASSED: MIGraphX meets tolerance
qwen1.5-7b PASSED: MIGraphX meets tolerance
phi3-3.8b PASSED: MIGraphX meets tolerance
llama3-8b PASSED: MIGraphX meets tolerance
whisper-large-decoder PASSED: MIGraphX meets tolerance
mistral-7b PASSED: MIGraphX meets tolerance
FLUX.1-schnell PASSED: MIGraphX meets tolerance

@TedThemistokleous TedThemistokleous changed the title [AIMIGRAPHX-885] Add slice squeeze matcher clean [AIMIGRAPHX-885] Add slice squeeze matcher Jun 23, 2026
@TedThemistokleous TedThemistokleous self-assigned this Jun 23, 2026
@TedThemistokleous TedThemistokleous added high priority A PR with high priority for review and merging. and removed bugfix Fixes a bug found in the code. labels Jun 23, 2026
@TedThemistokleous TedThemistokleous marked this pull request as ready for review June 23, 2026 04:13
@TedThemistokleous TedThemistokleous requested a review from a team as a code owner June 23, 2026 04:19
Comment thread src/simplify_reshapes.cpp Outdated
Comment thread src/simplify_reshapes.cpp Outdated
@causten causten requested a review from pfultz2 June 25, 2026 20:43
Comment thread test/optimize_module_test.cpp
@causten causten merged commit 2b90a79 into develop Jun 29, 2026
38 of 40 checks passed
@causten causten deleted the add_slice_squeeze_matcher_clean branch June 29, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

high priority A PR with high priority for review and merging.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants