[ExecuTorch][WebGPU] Add clone op (aten.clone.default) by pytorchbot · Pull Request #20568 · pytorch/executorch

pytorchbot · 2026-06-27T21:26:55Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #20463 by @JulianCloudNTH
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/58/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/58/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/50/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/58/orig

@diff-train-skip-merge

Pull Request resolved: #20463 `aten.clone.default` is a pure flat copy on the buffer-only WebGPU backend, identical to `view_copy`: `clone_impl` reuses the existing `add_flat_copy` helper (`output[i] = input[i]`) and registers a handler under `aten.clone.default`. No new shader, generated WGSL header, or CMake source — it shares the `view_copy` flat-copy compute pipeline. Required for end-to-end Llama 3.2 1B (4-bit, KV cache): the exported model serializes 2 `aten.clone.default` ops into its runtime operator chain (the RoPE-frequency clones reused across all 16 transformer layers), so without a handler the partition graph-breaks at those nodes. Mirrors the Vulkan delegate, which registers the same op and routes a buffer clone to a flat view-copy. ghstack-source-id: 397534700 @exported-using-ghexport @diff-train-skip-merge Differential Revision: [D109477717](https://our.internmc.facebook.com/intern/diff/D109477717/)

pytorch-bot · 2026-06-27T21:26:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20568

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Pull Request resolved: #20464 Adds the WebGPU delegate handler for aten.index.Tensor, the 1D-self advanced-index gather out[i] = self[index[i]] (output shape == index shape). This is the form the VulkanPartitioner delegates -- it requires a 1D self and exactly one non-None index (op_registry.py); 2D mask/freqs gathers stay on CPU. It mirrors the Vulkan delegate's index_tensor op (IndexTensor.cpp + index_tensor_buffer.glsl) as a single compute dispatch over the output elements, each reading the int32 index and gathering the corresponding fp32 self element. The op is composed as: - index.wgsl: one workgroup-strided pass, out[i] = self[u32(index[i])], guarded by a numel bound; buffer-only, fp32 self/out, int32 index, 1D dispatch via the shared WebGPUUtils helpers (clamp workgroup size + 1D count). - Index.cpp: validates the args (self/out tensors; indices ValueList with exactly one index tensor; fp32 self/out; int32 index; out numel == index numel), failing loud on any violation, then records the dispatch. row_width is dropped (always 1 for 1D self). ghstack-source-id: 397756251 @exported-using-ghexport @diff-train-skip-merge Differential Revision: [D109478967](https://our.internmc.facebook.com/intern/diff/D109478967/)

…lden) Pull Request resolved: #20465 Adds the test suite for the aten.index.Tensor op (stacked on the op diff): - test/ops/index/test_index.py: exports a module computing x[idx] through VulkanPartitioner for four configs (reorder/repeat indices over distinct self values, so a wrong-gather is visible), asserts a VulkanBackend delegate with index.Tensor absorbed (not a CPU fallback), and writes per-config .pte + .self/.idx/.golden.bin. - test/native/test_index.cpp: a standalone Dawn binary that loads each .pte, feeds self (fp32) + index (int64 at the program boundary, narrowed to the int32 buffer) and compares the gather against the torch golden at 1e-3, with a single-output shape guard. - Wired into CMake (webgpu_index_test), test/TARGETS (python_unittest test_index), and the Dawn native CI script. ghstack-source-id: 397763261 @exported-using-ghexport @diff-train-skip-merge Differential Revision: [D109479000](https://our.internmc.facebook.com/intern/diff/D109479000/)

pytorchbot had a problem deploying to cadence June 27, 2026 21:27 — with GitHub Actions Error

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 27, 2026

JulianCloudNTH added 2 commits June 27, 2026 14:32

JulianCloudNTH requested review from kirklandsign and larryliu0820 as code owners June 27, 2026 21:32

JulianCloudNTH merged commit c6b60e8 into gh/JulianCloudNTH/50/orig Jun 27, 2026
11 checks passed

JulianCloudNTH deleted the gh/JulianCloudNTH/58/orig branch June 27, 2026 21:32

JulianCloudNTH temporarily deployed to cadence June 27, 2026 21:33 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ExecuTorch][WebGPU] Add clone op (aten.clone.default)#20568

[ExecuTorch][WebGPU] Add clone op (aten.clone.default)#20568
JulianCloudNTH merged 3 commits into
gh/JulianCloudNTH/50/origfrom
gh/JulianCloudNTH/58/orig

pytorchbot commented Jun 27, 2026

Uh oh!

pytorch-bot Bot commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pytorchbot commented Jun 27, 2026

Uh oh!

pytorch-bot Bot commented Jun 27, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20568

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants