[ExecuTorch][WebGPU] Add et_vk.embedding_q4gsw (4-bit groupwise-symmetric quantized embedding)#20414
[ExecuTorch][WebGPU] Add et_vk.embedding_q4gsw (4-bit groupwise-symmetric quantized embedding)#20414pytorchbot wants to merge 1 commit into
Conversation
…tric quantized embedding) Pull Request resolved: #20263 Adds the WebGPU backend handler for `et_vk.embedding_q4gsw.default` (a 4-bit groupwise-symmetric quantized embedding gather) plus the host-side integer-input infra it requires. The op is a single compute dispatch composed of one stage: one thread per 32-element block of each gathered row dequantizes the packed 4-bit table (`q = (nibble - 8) * scale`; even dim = high nibble, odd dim = low) into the fp32 output, mirroring the Vulkan `embedding_q4gsw` reference (flat buffer-backed weight; `is_linear_weight=true` is unsupported and throws). The workgroup size is a `wg_size` pipeline-override constant clamped to the device limit via `WebGPUUtils::clamp_workgroup_size`, the 1D dispatch count goes through `WebGPUUtils::compute_1d_workgroup_count` (validated before any GPU-object allocation), and the embedded WGSL string header is generated by `gen_wgsl_headers.py`. Embedding indices arrive as int64 at the program boundary but the serialized graph stores them as int32, so the shared input path is extended with a host-side `InputData` view (`{data, nbytes, host_is_int64}`) and `copy_inputs` gains three branches: a byte-for-byte fast path when host and GPU sizes match, an int64->int32 narrowing copy when the buffer is int32 and the host input is twice as wide (mirrors the Vulkan `kLong`->`kInt` staging cast), and a fail-loud throw otherwise. `WebGPUTensor` gains `elem_size`/`is_int` to drive the narrowing decision, and `update_symints_from_inputs` takes the same `InputData` vector so `execute()` builds a single input list consumed by both. ghstack-source-id: 395549280 @exported-using-ghexport Differential Revision: [D108428753](https://our.internmc.facebook.com/intern/diff/D108428753/)
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20414
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 3 New Failures, 3 Unrelated FailuresAs of commit 9f1fb83 with merge base 0e65ba6 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #20263 by @JulianCloudNTH
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/25/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/25/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/25/orig
@diff-train-skip-merge