fix(cloud.infer): reduce Qwen3-MoE export OOM risk#821
Merged
quic-rishinr merged 5 commits intoquic:mainfrom Mar 11, 2026
Merged
fix(cloud.infer): reduce Qwen3-MoE export OOM risk#821quic-rishinr merged 5 commits intoquic:mainfrom
quic-rishinr merged 5 commits intoquic:mainfrom
Conversation
Auto-enable ONNX subfunctions for qwen3_moe in cloud.infer when not explicitly set, while allowing explicit override via --use-onnx-subfunctions/--no-use-onnx-subfunctions. Also add focused infer unit tests and update quick-start/text-generation docs for the new behavior. Fixes quic#702 Signed-off-by: jd316 <jd316biswas@gmail.com>
vbaddi
requested changes
Mar 4, 2026
Signed-off-by: jd316 <jd316biswas@gmail.com>
b09495a to
28f0a90
Compare
Contributor
Author
Contributor
Author
|
PR description has been updated to match the latest implementation in 28f0a90 (explicit opt-in only for ONNX subfunctions). Remaining pending items are reviewer approval and maintainer workflow approval. |
Contributor
|
@jd316 there is a format issue in tests/cloud/test_infer.py could you get it resolved? |
Contributor
Author
|
@quic-rishinr Updating! |
Signed-off-by: jd316 <jd316biswas@gmail.com>
f60cd06 to
a6c59eb
Compare
Contributor
Author
|
@vbaddi @quic-rishinr pushed follow-up commit a6c59eb to address the test |
quic-rishinr
approved these changes
Mar 10, 2026
Contributor
|
Thanks @jd316 for adding the changes. Merging the PR |
quic-dhirajku
pushed a commit
to asmigosw/efficient-transformers
that referenced
this pull request
Mar 13, 2026
Summary - Keep `use_onnx_subfunctions` disabled by default in `QEfficient.cloud.infer` - Provide explicit opt-in via `--use-onnx-subfunctions` only - Remove `--no-use-onnx-subfunctions` - Update infer unit tests for explicit-enable and default-disabled behavior - Update quick-start and text-generation docs to reflect explicit opt-in behavior Why - Align infer behavior with reviewer feedback to keep defaults unchanged and avoid model-specific auto-enable behavior. Fixes - Fixes quic#702 Validation - `python -m py_compile QEfficient/cloud/infer.py tests/cloud/test_infer.py` - `ruff check QEfficient/cloud/infer.py tests/cloud/test_infer.py` - `pytest -q tests/cloud/test_infer.py -m "not on_qaic"` (2 passed, 5 deselected) --------- Signed-off-by: jd316 <jd316biswas@gmail.com> Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
qcdipankar
pushed a commit
to qcdipankar/efficient-transformers
that referenced
this pull request
Mar 17, 2026
Summary - Keep `use_onnx_subfunctions` disabled by default in `QEfficient.cloud.infer` - Provide explicit opt-in via `--use-onnx-subfunctions` only - Remove `--no-use-onnx-subfunctions` - Update infer unit tests for explicit-enable and default-disabled behavior - Update quick-start and text-generation docs to reflect explicit opt-in behavior Why - Align infer behavior with reviewer feedback to keep defaults unchanged and avoid model-specific auto-enable behavior. Fixes - Fixes quic#702 Validation - `python -m py_compile QEfficient/cloud/infer.py tests/cloud/test_infer.py` - `ruff check QEfficient/cloud/infer.py tests/cloud/test_infer.py` - `pytest -q tests/cloud/test_infer.py -m "not on_qaic"` (2 passed, 5 deselected) --------- Signed-off-by: jd316 <jd316biswas@gmail.com>
tv-karthikeya
pushed a commit
to tv-karthikeya/efficient-transformers
that referenced
this pull request
Mar 25, 2026
Summary - Keep `use_onnx_subfunctions` disabled by default in `QEfficient.cloud.infer` - Provide explicit opt-in via `--use-onnx-subfunctions` only - Remove `--no-use-onnx-subfunctions` - Update infer unit tests for explicit-enable and default-disabled behavior - Update quick-start and text-generation docs to reflect explicit opt-in behavior Why - Align infer behavior with reviewer feedback to keep defaults unchanged and avoid model-specific auto-enable behavior. Fixes - Fixes quic#702 Validation - `python -m py_compile QEfficient/cloud/infer.py tests/cloud/test_infer.py` - `ruff check QEfficient/cloud/infer.py tests/cloud/test_infer.py` - `pytest -q tests/cloud/test_infer.py -m "not on_qaic"` (2 passed, 5 deselected) --------- Signed-off-by: jd316 <jd316biswas@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
use_onnx_subfunctionsdisabled by default inQEfficient.cloud.infer--use-onnx-subfunctionsonly--no-use-onnx-subfunctionsWhy
Fixes
Validation
python -m py_compile QEfficient/cloud/infer.py tests/cloud/test_infer.pyruff check QEfficient/cloud/infer.py tests/cloud/test_infer.pypytest -q tests/cloud/test_infer.py -m "not on_qaic"(2 passed, 5 deselected)