[ExecuTorch][WebGPU] Add et_vk.embedding_q4gsw (4-bit groupwise-symmetric quantized embedding) by pytorchbot · Pull Request #20414 · pytorch/executorch

pytorchbot · 2026-06-22T06:47:01Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #20263 by @JulianCloudNTH
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/25/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/25/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/25/orig

@diff-train-skip-merge

…tric quantized embedding) Pull Request resolved: #20263 Adds the WebGPU backend handler for `et_vk.embedding_q4gsw.default` (a 4-bit groupwise-symmetric quantized embedding gather) plus the host-side integer-input infra it requires. The op is a single compute dispatch composed of one stage: one thread per 32-element block of each gathered row dequantizes the packed 4-bit table (`q = (nibble - 8) * scale`; even dim = high nibble, odd dim = low) into the fp32 output, mirroring the Vulkan `embedding_q4gsw` reference (flat buffer-backed weight; `is_linear_weight=true` is unsupported and throws). The workgroup size is a `wg_size` pipeline-override constant clamped to the device limit via `WebGPUUtils::clamp_workgroup_size`, the 1D dispatch count goes through `WebGPUUtils::compute_1d_workgroup_count` (validated before any GPU-object allocation), and the embedded WGSL string header is generated by `gen_wgsl_headers.py`. Embedding indices arrive as int64 at the program boundary but the serialized graph stores them as int32, so the shared input path is extended with a host-side `InputData` view (`{data, nbytes, host_is_int64}`) and `copy_inputs` gains three branches: a byte-for-byte fast path when host and GPU sizes match, an int64->int32 narrowing copy when the buffer is int32 and the host input is twice as wide (mirrors the Vulkan `kLong`->`kInt` staging cast), and a fail-loud throw otherwise. `WebGPUTensor` gains `elem_size`/`is_int` to drive the narrowing decision, and `update_symints_from_inputs` takes the same `InputData` vector so `execute()` builds a single input list consumed by both. ghstack-source-id: 395549280 @exported-using-ghexport Differential Revision: [D108428753](https://our.internmc.facebook.com/intern/diff/D108428753/)

pytorch-bot · 2026-06-22T06:47:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20414

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm] MI350 CI jobs will have longer queue times due to CI migration

❌ 3 New Failures, 3 Unrelated Failures

As of commit 9f1fb83 with merge base 0e65ba6 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t b0e6a2fa5a671297ced8ada15cf3c4009f4510fb4525a47d5c9b227311a83709 /exec failed with exit code 1
pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_vit_model
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t 7ed9944f3b630376b341a1b2993a1ed052d230f1a26d6a9a5adec10ba2cd32a7 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh) (trunk failure)
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh) (trunk failure)
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-22T06:47:46Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

pytorchbot requested review from kirklandsign and larryliu0820 as code owners June 22, 2026 06:47

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 22, 2026

pytorchbot temporarily deployed to cadence June 22, 2026 06:47 — with GitHub Actions Inactive

JulianCloudNTH self-requested a review June 22, 2026 16:26

JulianCloudNTH approved these changes Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] Add et_vk.embedding_q4gsw (4-bit groupwise-symmetric quantized embedding)#20414

[ExecuTorch][WebGPU] Add et_vk.embedding_q4gsw (4-bit groupwise-symmetric quantized embedding)#20414
pytorchbot wants to merge 1 commit into
mainfrom
gh/JulianCloudNTH/25/orig

pytorchbot commented Jun 22, 2026

Uh oh!

pytorch-bot Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pytorchbot commented Jun 22, 2026

Uh oh!

pytorch-bot Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20414

❗ 1 Active SEVs

❌ 3 New Failures, 3 Unrelated Failures

Uh oh!

github-actions Bot commented Jun 22, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented Jun 22, 2026 •

edited

Loading

This PR needs a `release notes:` label