Skip to content

fix(cloud.infer): reduce Qwen3-MoE export OOM risk#821

Merged
quic-rishinr merged 5 commits intoquic:mainfrom
jd316:fix-702-onnx-subfunctions-infer
Mar 11, 2026
Merged

fix(cloud.infer): reduce Qwen3-MoE export OOM risk#821
quic-rishinr merged 5 commits intoquic:mainfrom
jd316:fix-702-onnx-subfunctions-infer

Conversation

@jd316
Copy link
Copy Markdown
Contributor

@jd316 jd316 commented Mar 2, 2026

Summary

  • Keep use_onnx_subfunctions disabled by default in QEfficient.cloud.infer
  • Provide explicit opt-in via --use-onnx-subfunctions only
  • Remove --no-use-onnx-subfunctions
  • Update infer unit tests for explicit-enable and default-disabled behavior
  • Update quick-start and text-generation docs to reflect explicit opt-in behavior

Why

  • Align infer behavior with reviewer feedback to keep defaults unchanged and avoid model-specific auto-enable behavior.

Fixes

Validation

  • python -m py_compile QEfficient/cloud/infer.py tests/cloud/test_infer.py
  • ruff check QEfficient/cloud/infer.py tests/cloud/test_infer.py
  • pytest -q tests/cloud/test_infer.py -m "not on_qaic" (2 passed, 5 deselected)

Auto-enable ONNX subfunctions for qwen3_moe in cloud.infer when not explicitly set, while allowing explicit override via --use-onnx-subfunctions/--no-use-onnx-subfunctions.

Also add focused infer unit tests and update quick-start/text-generation docs for the new behavior.

Fixes quic#702

Signed-off-by: jd316 <jd316biswas@gmail.com>
Comment thread QEfficient/cloud/infer.py Outdated
Comment thread QEfficient/cloud/infer.py Outdated
Signed-off-by: jd316 <jd316biswas@gmail.com>
@jd316 jd316 force-pushed the fix-702-onnx-subfunctions-infer branch from b09495a to 28f0a90 Compare March 4, 2026 14:02
@jd316
Copy link
Copy Markdown
Contributor Author

jd316 commented Mar 4, 2026

@vbaddi

Addressed in 28f0a90: kept use_onnx_subfunctions disabled by default, removed --no-use-onnx-subfunctions, and updated tests/docs accordingly. DCO is also fixed. Could you please re-review? 🙏

@jd316
Copy link
Copy Markdown
Contributor Author

jd316 commented Mar 4, 2026

PR description has been updated to match the latest implementation in 28f0a90 (explicit opt-in only for ONNX subfunctions). Remaining pending items are reviewer approval and maintainer workflow approval.

@jd316 jd316 requested a review from vbaddi March 5, 2026 10:52
@quic-rishinr
Copy link
Copy Markdown
Contributor

@jd316 there is a format issue in tests/cloud/test_infer.py could you get it resolved?

@jd316
Copy link
Copy Markdown
Contributor Author

jd316 commented Mar 9, 2026

@quic-rishinr Updating!

Signed-off-by: jd316 <jd316biswas@gmail.com>
@jd316 jd316 force-pushed the fix-702-onnx-subfunctions-infer branch from f60cd06 to a6c59eb Compare March 9, 2026 11:28
@jd316
Copy link
Copy Markdown
Contributor Author

jd316 commented Mar 9, 2026

@vbaddi @quic-rishinr pushed follow-up commit a6c59eb to address the test
formatting issue and fix the missing DCO sign-off. Local validation passed for
the affected files: py_compile, ruff format/check, and pytest -q tests/cloud/
test_infer.py -m "not on_qaic". Could you please re-review once the maintainer
workflow approval is in place?

@jd316 jd316 requested a review from quic-rishinr March 10, 2026 16:33
@quic-rishinr
Copy link
Copy Markdown
Contributor

Thanks @jd316 for adding the changes. Merging the PR

@quic-rishinr quic-rishinr merged commit 815309e into quic:main Mar 11, 2026
1 check passed
quic-dhirajku pushed a commit to asmigosw/efficient-transformers that referenced this pull request Mar 13, 2026
Summary
- Keep `use_onnx_subfunctions` disabled by default in
`QEfficient.cloud.infer`
- Provide explicit opt-in via `--use-onnx-subfunctions` only
- Remove `--no-use-onnx-subfunctions`
- Update infer unit tests for explicit-enable and default-disabled
behavior
- Update quick-start and text-generation docs to reflect explicit opt-in
behavior

Why
- Align infer behavior with reviewer feedback to keep defaults unchanged
and avoid model-specific auto-enable behavior.

Fixes
- Fixes quic#702

Validation
- `python -m py_compile QEfficient/cloud/infer.py
tests/cloud/test_infer.py`
- `ruff check QEfficient/cloud/infer.py tests/cloud/test_infer.py`
- `pytest -q tests/cloud/test_infer.py -m "not on_qaic"` (2 passed, 5
deselected)

---------

Signed-off-by: jd316 <jd316biswas@gmail.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
qcdipankar pushed a commit to qcdipankar/efficient-transformers that referenced this pull request Mar 17, 2026
Summary
- Keep `use_onnx_subfunctions` disabled by default in
`QEfficient.cloud.infer`
- Provide explicit opt-in via `--use-onnx-subfunctions` only
- Remove `--no-use-onnx-subfunctions`
- Update infer unit tests for explicit-enable and default-disabled
behavior
- Update quick-start and text-generation docs to reflect explicit opt-in
behavior

Why
- Align infer behavior with reviewer feedback to keep defaults unchanged
and avoid model-specific auto-enable behavior.

Fixes
- Fixes quic#702

Validation
- `python -m py_compile QEfficient/cloud/infer.py
tests/cloud/test_infer.py`
- `ruff check QEfficient/cloud/infer.py tests/cloud/test_infer.py`
- `pytest -q tests/cloud/test_infer.py -m "not on_qaic"` (2 passed, 5
deselected)

---------

Signed-off-by: jd316 <jd316biswas@gmail.com>
tv-karthikeya pushed a commit to tv-karthikeya/efficient-transformers that referenced this pull request Mar 25, 2026
Summary
- Keep `use_onnx_subfunctions` disabled by default in
`QEfficient.cloud.infer`
- Provide explicit opt-in via `--use-onnx-subfunctions` only
- Remove `--no-use-onnx-subfunctions`
- Update infer unit tests for explicit-enable and default-disabled
behavior
- Update quick-start and text-generation docs to reflect explicit opt-in
behavior

Why
- Align infer behavior with reviewer feedback to keep defaults unchanged
and avoid model-specific auto-enable behavior.

Fixes
- Fixes quic#702

Validation
- `python -m py_compile QEfficient/cloud/infer.py
tests/cloud/test_infer.py`
- `ruff check QEfficient/cloud/infer.py tests/cloud/test_infer.py`
- `pytest -q tests/cloud/test_infer.py -m "not on_qaic"` (2 passed, 5
deselected)

---------

Signed-off-by: jd316 <jd316biswas@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Process killed (OOM?) when compiling Qwen3-30B-A3B

3 participants