[ExecuTorch][WebGPU] cat op test suite (cases.py op-test framework) by JulianCloudNTH · Pull Request #20399 · pytorch/executorch

JulianCloudNTH · 2026-06-18T21:36:03Z

Stack from ghstack (oldest at bottom):

Registers aten.cat.default in the cases.py op-test framework: a _cat_suite of 4 configs (3 inputs along dim 0, 2 inputs along dim 1, 3 inputs along dim 2, uneven split along dim 1) that generate_op_tests exports via VulkanPartitioner and compares to a torch golden on Dawn. Also adds test/ops/cat/test_cat.py (CatModule + CONFIGS + _op_delegated smoke test, with distinct per-input value ranges to catch cross-slab contamination) and the aten.cat.default partitioner-allowlist entry in tester.py.
@exported-using-ghexport

Differential Revision: D108793163

[ghstack-poisoned]

pytorch-bot · 2026-06-18T21:36:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20399

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 77 Pending

As of commit 84a44d0 with merge base 200f64a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-18T21:36:52Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

JulianCloudNTH · 2026-06-18T21:54:44Z

@claude review

claude · 2026-06-18T21:55:28Z

Claude finished @JulianCloudNTH's task in 3m 25s —— View job

Review: WebGPU `cat` op-test suite

Read the full diff and compared it against the sibling op suites (permute, squeeze, mul) and the generator (generate_op_tests.py). This is a clean, well-scoped addition that follows the established op-test conventions closely. No blocking issues.

What's good

CatModule + CONFIGS + smoke-test layout mirrors test_permute.py / test_squeeze.py exactly, and the cases.py registration matches the _squeeze_suite/_permute_suite shape (construct={"dim": dim}, golden_dtype="float32" with the copy-is-bit-identical rationale). Good consistency.
This is the first variadic / multi-input op in the framework. CatModule.forward(self, *xs) + inputs=tuple(shapes) correctly exercises the generator's module(*inputs) / torch.export.export(module, inputs) path with multiple differently-shaped inputs — a genuinely new code path through the suite. The uneven config ((2,1,4)+(2,3,4)+(2,2,4) along dim 1) is a nice touch for catching per-input offset bugs.
tester.py allowlist entry (aten.cat.default) is in the right place.

Minor notes (non-blocking)

The "distinct per-input value range" contamination trick only applies to the local smoke test, not the Dawn suite. _det_inputs (the base += 1000.0 ramps) is used by TestCat in test_cat.py, but the _cat_suite cases in cases.py pass bare shape tuples, so generate_op_tests._materialize produces them via torch.randn (all ~N(0,1), overlapping ranges). Cross-slab contamination is still detectable against the golden, but the explicit distinct-range property described in the PR body doesn't carry into the generated Dawn comparison. Worth a one-line clarification in the description, or consider an InputSpec(gen=...) ramp for one suite case if you want the strong signal on Dawn too.
_op_delegated's substring check is fragile (backends/webgpu/test/ops/cat/test_cat.py:60). "cat" in str(target) will also match scatter, concat, etc. Harmless here (a cat module has no such ops), and it's the same pattern as the siblings — but "cat" is a shorter, more collision-prone needle than "permute". If you ever generalize this helper, matching the full edge-op target (e.g. aten.cat) would be safer.
Helper duplication (_lower / _delegated / _op_delegated) is copy-pasted across every ops/*/test_*.py. This PR follows the existing convention, so it's fine — just flagging that this family of files is accumulating identical boilerplate that a shared _smoke_helpers.py could collapse later.

Note: I was unable to run pytest backends/webgpu/test/ops/cat/test_cat.py in this environment (command needed approval), so correctness here is by inspection only. The export/delegation logic is identical to the already-green sibling tests, so I'd expect it to pass.
· branch gh/JulianCloudNTH/48/head

[ghstack-poisoned]

Pull Request resolved: #20399 Registers `aten.cat.default` in the `cases.py` op-test framework: a `_cat_suite` of 4 configs (3 inputs along dim 0, 2 inputs along dim 1, 3 inputs along dim 2, uneven split along dim 1) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/cat/test_cat.py` (`CatModule` + `CONFIGS` + `_op_delegated` smoke test, with distinct per-input value ranges to catch cross-slab contamination) and the `aten.cat.default` partitioner-allowlist entry in `tester.py`. ghstack-source-id: 395017932 @exported-using-ghexport Differential Revision: [D108793163](https://our.internmc.facebook.com/intern/diff/D108793163/)

[ghstack-poisoned]

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * #20398 * #20397 * #20396 * #20395 * #20394 * #20393 * #20392 * #20391 * #20390 * #20363 * __->__ #20362 * #20361 * #20360 * #20359 Adds `aten.select_copy.int` to the WebGPU delegate as a gather: picks a fixed index along one dim, producing an output of rank (input rank - 1). Composition (single dispatch): - `select/Select.cpp` — reads `[self, dim, index, out]` (static `Int` via `read_scalar`; throws on dynamic `SymInt`), normalizes + bounds-checks dim/index, builds 2 `TensorMeta` UBOs + a `SelectParams{dim,index}`, fp32 guard, 1D-dispatch over `numel`, releases uniforms after the bind group. - `select/select.wgsl` — seeds the input offset with `index * in.strides[dim]`, delinearizes the output index, maps each out dim to its in dim (shifted past the selected dim), relinearizes on input strides. @exported-using-ghexport Differential Revision: [D108793166](https://our.internmc.facebook.com/intern/diff/D108793166/) Differential Revision: [D108793166](https://our.internmc.facebook.com/intern/diff/D108793166)

…ework) (#20363) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * #20398 * #20397 * #20396 * #20395 * #20394 * #20393 * #20392 * #20391 * #20390 * __->__ #20363 * #20362 * #20361 * #20360 * #20359 Registers `aten.select_copy.int` in the `cases.py` op-test framework: a `_select_suite` of 4 configs (leading/middle/last dim + negative index) that `generate_op_tests` exports and compares to a torch golden on Dawn. Also adds `test/ops/select/test_select.py` (`SelectModule` + `CONFIGS` + an export-delegation/eager smoke test) and the `aten.select_copy.int` partitioner-allowlist entry in `tester.py`. @exported-using-ghexport Differential Revision: [D108793161](https://our.internmc.facebook.com/intern/diff/D108793161/) Differential Revision: [D108793161](https://our.internmc.facebook.com/intern/diff/D108793161)

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * #20398 * #20397 * #20396 * #20395 * #20394 * #20393 * #20392 * #20391 * __->__ #20390 * #20363 * #20362 * #20361 * #20360 * #20359 Adds `aten.sigmoid.default` to the WebGPU delegate: element-wise `1/(1+exp(-x))` over a flat fp32 buffer. On the Llama critical path (`F.silu` -> `sigmoid` + `mul`). Composition (single dispatch): - `sigmoid/UnaryOp.cpp` — binds input (storage, read-only) + output (storage) + a `Params{num_elements}` uniform, 1D-dispatches over `num_elements` with `override wg_size` (clamped to the device limit); mirrors the `add` op (uniform mapped-at-creation, released after the bind group). - `sigmoid/sigmoid.wgsl` — guards `idx >= num_elements` and writes the logistic of each element. @exported-using-ghexport Differential Revision: [D108793157](https://our.internmc.facebook.com/intern/diff/D108793157/) Differential Revision: [D108793157](https://our.internmc.facebook.com/intern/diff/D108793157)

…k) (#20391) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * #20398 * #20397 * #20396 * #20395 * #20394 * #20393 * #20392 * __->__ #20391 * #20390 * #20363 * #20362 * #20361 * #20360 * #20359 Registers `aten.sigmoid.default` in the `cases.py` op-test framework: a `_sigmoid_suite` (hard-coded shapes + a saturation case over a `linspace(-12, 12)` input) that `generate_op_tests` exports and compares to an fp64 torch golden on Dawn. Also adds `test/ops/sigmoid/test_sigmoid.py` (`SigmoidModule` + `N` + `_det_input` + an export-delegation/eager smoke test) and the `aten.sigmoid.default` partitioner-allowlist entry in `tester.py`. @exported-using-ghexport Differential Revision: [D108793159](https://our.internmc.facebook.com/intern/diff/D108793159/) Differential Revision: [D108793159](https://our.internmc.facebook.com/intern/diff/D108793159)

…20392) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * #20398 * #20397 * #20396 * #20395 * #20394 * #20393 * __->__ #20392 * #20391 * #20390 * #20363 * #20362 * #20361 * #20360 * #20359 Adds `aten.squeeze_copy.dims` and `aten.unsqueeze_copy.default` to the WebGPU delegate. Both are numel-preserving shape ops; on a dense row-major buffer backend they are the same flat copy as `view_copy` — only the shape metadata differs (mirrors the Vulkan delegate, which routes both through `add_view_copy_node`). Composition (no new kernel): - `squeeze/Squeeze.cpp` — reads `args = [self, dims, out]`, ignores the AOT-fixed `dims`, calls `add_flat_copy(graph, in, out)` from `runtime/ops/view_copy/view_copy.h`. - `unsqueeze/Unsqueeze.cpp` — reads `args = [self, dim, out]`, ignores the AOT-fixed `dim`, calls `add_flat_copy(graph, in, out)`. @exported-using-ghexport Differential Revision: [D108793153](https://our.internmc.facebook.com/intern/diff/D108793153/) Differential Revision: [D108793153](https://our.internmc.facebook.com/intern/diff/D108793153)

….py op-test framework) (#20393) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * #20398 * #20397 * #20396 * #20395 * #20394 * __->__ #20393 * #20392 * #20391 * #20390 * #20363 * #20362 * #20361 * #20360 * #20359 Registers `aten.squeeze_copy.dims` and `aten.unsqueeze_copy.default` in the `cases.py` op-test framework: a `_squeeze_suite` of 3 configs (squeeze leading/middle/multiple size-1 dims) and a `_unsqueeze_suite` of 3 configs (insert dim at front/middle/last) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/squeeze/test_squeeze.py` (`SqueezeModule` + `CONFIGS` + `_op_delegated` smoke test), `test/ops/unsqueeze/test_unsqueeze.py` (`UnsqueezeModule` + `CONFIGS` + `_op_delegated` smoke test), and the two partitioner-allowlist entries in `tester.py`. @exported-using-ghexport Differential Revision: [D108793152](https://our.internmc.facebook.com/intern/diff/D108793152/) Differential Revision: [D108793152](https://our.internmc.facebook.com/intern/diff/D108793152)

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * #20398 * #20397 * #20396 * #20395 * __->__ #20394 * #20393 * #20392 * #20391 * #20390 * #20363 * #20362 * #20361 * #20360 * #20359 Adds `aten.slice_copy.Tensor` to the WebGPU delegate as a gather: each output element is mapped back to its source input element along the sliced dim via `start + coord * step`. Composition (single compute dispatch): - `runtime/ops/slice/Slice.cpp` — reads `args = [self, dim, start, end, step, out]` via `read_scalar` (static `Int`/`Null`-sentinel default; throws on dynamic `SymInt`); normalizes negative `dim`/`start`, clamps `start` to `[0, in_size]`; builds two `TensorMeta` UBOs + a `SliceParams{dim, start, step}` uniform; guards fp32; dispatches over `compute_1d_workgroup_count(out.numel)` with `override wg_size`; releases all uniforms after the bind group. - `runtime/ops/slice/slice.wgsl` — delinearizes the output index over the contiguous output strides, maps the sliced-dim coordinate back to the input (`start + coord*step`), relinearizes over the input strides. @exported-using-ghexport Differential Revision: [D108793168](https://our.internmc.facebook.com/intern/diff/D108793168/) Differential Revision: [D108793168](https://our.internmc.facebook.com/intern/diff/D108793168)

…work) (#20395) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * #20398 * #20397 * #20396 * __->__ #20395 * #20394 * #20393 * #20392 * #20391 * #20390 * #20363 * #20362 * #20361 * #20360 * #20359 Registers `aten.slice_copy.Tensor` in the `cases.py` op-test framework: a `_slice_suite` of 4 configs (leading-dim slice `[:,1:5]`, last-dim slice `[...,1:3]`, step-2 `[:,0:8:2]`, negative-end `[:,1:-1]`) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/slice/test_slice.py` (`SliceModule` + `CONFIGS` + export-delegation/eager smoke test) and the `aten.slice_copy.Tensor` partitioner-allowlist entry in `tester.py`. @exported-using-ghexport Differential Revision: [D108793151](https://our.internmc.facebook.com/intern/diff/D108793151/) Differential Revision: [D108793151](https://our.internmc.facebook.com/intern/diff/D108793151)

…ermute_copy.default) (#20396) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * #20398 * #20397 * __->__ #20396 * #20395 * #20394 * #20393 * #20392 * #20391 * #20390 * #20363 * #20362 * #20361 * #20360 * #20359 Adds `aten.permute_copy.default` (a coordinate-reorder gather) to the WebGPU delegate, and the `IntList` graph value type it needs to read its `dims` argument. Composition: - `runtime/WebGPUGraph.{h,cpp}` — adds `ValueType::IntList` backed by `std::vector<std::vector<int64_t>> int_lists_` + `get_int_list(int)`; `build()` deserializes `vkgraph::GraphTypes::IntList` via `value_as_IntList()->items()` (int64, matching the FlatBuffer `[long]`); mirrors the existing scalar value plumbing. - `runtime/ops/permute/Permute.cpp` — reads the permutation via `get_int_list`, normalizes negative dims, validates it is a permutation of `[0, ndim)`, builds two `TensorMeta` UBOs + a `PermuteParams{perm: vec4<u32>}` uniform, guards fp32 + rank≤4, dispatches over `compute_1d_workgroup_count(out.numel)` with `override wg_size`; releases all uniforms after the bind group. - `runtime/ops/permute/permute.wgsl` — delinearizes the output index over the contiguous output strides, reads `input` at `in.strides[perm[d]]` per dim (mirrors Vulkan `permute_buffer.glsl`). - Registers both `aten.permute_copy.default` and `aten.permute.default` to the same handler. @exported-using-ghexport Differential Revision: [D108793162](https://our.internmc.facebook.com/intern/diff/D108793162/) Differential Revision: [D108793162](https://our.internmc.facebook.com/intern/diff/D108793162)

…mework) (#20397) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * #20398 * __->__ #20397 * #20396 * #20395 * #20394 * #20393 * #20392 * #20391 * #20390 * #20363 * #20362 * #20361 * #20360 * #20359 Registers `aten.permute_copy.default` in the `cases.py` op-test framework: a `_permute_suite` of 4 configs (3D rotation, 4D middle-dim transpose, 2D transpose, full 4D shuffle) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/permute/test_permute.py` (`PermuteModule` + `CONFIGS` + `_op_delegated` smoke test) and the `aten.permute_copy.default` partitioner-allowlist entry in `tester.py`. @exported-using-ghexport Differential Revision: [D108793156](https://our.internmc.facebook.com/intern/diff/D108793156/) Differential Revision: [D108793156](https://our.internmc.facebook.com/intern/diff/D108793156)

[ghstack-poisoned]

…efault) (#20398) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * #20435 * #20399 * __->__ #20398 Adds `aten.cat.default` to the WebGPU delegate as an index-math scatter, and the `ValueList` graph value type it needs to read its tensor-list argument. Composition (one dispatch per input): - `runtime/WebGPUGraph.{h,cpp}` — adds `ValueType::ValueList` backed by `std::vector<std::vector<int>> value_lists_` + `get_value_list(int)`; `build()` deserializes `vkgraph::GraphTypes::ValueList` via `value_as_ValueList()->items()`; mirrors the existing scalar value plumbing. - `runtime/ops/cat/Cat.cpp` — reads the input-tensor list via `get_value_list`, reads the static `Int` dim, validates each input rank + non-concat dims against the output; builds one shared `out_meta` `TensorMeta` uniform + a shared bind-group/pipeline layout; per input builds a fresh pipeline + bind group with `in_meta` + `CatParams{concat_dim, off_k}`, dispatches over `compute_1d_workgroup_count(in.numel)`; releases per-input uniforms after each bind group, the shared `out_meta` after the loop. - `runtime/ops/cat/cat.wgsl` — delinearizes the input index over the input strides, shifts the concat-dim coordinate by the host-computed `off_k` (running sum of prior input sizes), relinearizes over the output strides to scatter `output[out] = input[in]`. - The N disjoint-slab writes share one output buffer across separate dispatches and rely on the existing per-pass `execute()` ordering. @exported-using-ghexport Differential Revision: [D108793165](https://our.internmc.facebook.com/intern/diff/D108793165/) Differential Revision: [D108793165](https://our.internmc.facebook.com/intern/diff/D108793165)

…>.py (#20435) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * #20463 * __->__ #20435 * #20399 * #20398 Pure test-file relocation: moves the already-landed ops' tests from nested `test/ops/<op>/test_<op>.py` to flat `test/ops/test_<op>.py`, matching the ExecuTorch convention (XNNPACK uses flat `test/ops/test_<op>.py`; Vulkan uses flat `test/test_*.py`) and completing the flatten applied to the new ops in the stack below. Drops the per-op `__init__.py`; the parent `test/ops/__init__.py` is kept. Ops: `add`, `rms_norm`, `sdpa` (`test_sdpa` + `test_update_cache`), `dispatch_order`, `quantized_linear`, `embedding_q4gsw`, `rope`, `prepack`. No behavior change — the test modules and their export/golden functions are unchanged; only their path moves. Every reference to the old paths is updated: the `cases.py` op-test imports (`add`, `rms_norm`), `test/TARGETS` (`test_add` srcs), `test/ops/test_dispatch_order.py`'s internal `rms_norm` import, and the build/CI scripts that import the per-op export functions (`test/test_build_webgpu.sh`, `scripts/test_webgpu_native_ci.sh`). Nothing required the per-op subdirectory: the codegen framework imports only `cases.py`, the one buck target uses a literal path, and the native-golden scripts import the modules by path — each resolves identically at the flat path. @exported-using-ghexport Differential Revision: [D109349894](https://our.internmc.facebook.com/intern/diff/D109349894/) Differential Revision: [D109349894](https://our.internmc.facebook.com/intern/diff/D109349894)

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * #20464 * __->__ #20463 * #20435 * #20399 * #20398 `aten.clone.default` is a pure flat copy on the buffer-only WebGPU backend, identical to `view_copy`: `clone_impl` reuses the existing `add_flat_copy` helper (`output[i] = input[i]`) and registers a handler under `aten.clone.default`. No new shader, generated WGSL header, or CMake source — it shares the `view_copy` flat-copy compute pipeline. Required for end-to-end Llama 3.2 1B (4-bit, KV cache): the exported model serializes 2 `aten.clone.default` ops into its runtime operator chain (the RoPE-frequency clones reused across all 16 transformer layers), so without a handler the partition graph-breaks at those nodes. Mirrors the Vulkan delegate, which registers the same op and routes a buffer clone to a flat view-copy. @exported-using-ghexport Differential Revision: [D109477717](https://our.internmc.facebook.com/intern/diff/D109477717/) Differential Revision: [D109477717](https://our.internmc.facebook.com/intern/diff/D109477717)

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #20465 * __->__ #20464 * #20463 * #20435 * #20399 * #20398 Adds the WebGPU delegate handler for aten.index.Tensor, the 1D-self advanced-index gather out[i] = self[index[i]] (output shape == index shape). This is the form the VulkanPartitioner delegates -- it requires a 1D self and exactly one non-None index (op_registry.py); 2D mask/freqs gathers stay on CPU. It mirrors the Vulkan delegate's index_tensor op (IndexTensor.cpp + index_tensor_buffer.glsl) as a single compute dispatch over the output elements, each reading the int32 index and gathering the corresponding fp32 self element. The op is composed as: - index.wgsl: one workgroup-strided pass, out[i] = self[u32(index[i])], guarded by a numel bound; buffer-only, fp32 self/out, int32 index, 1D dispatch via the shared WebGPUUtils helpers (clamp workgroup size + 1D count). - Index.cpp: validates the args (self/out tensors; indices ValueList with exactly one index tensor; fp32 self/out; int32 index; out numel == index numel), failing loud on any violation, then records the dispatch. row_width is dropped (always 1 for 1D self). @exported-using-ghexport Differential Revision: [D109478967](https://our.internmc.facebook.com/intern/diff/D109478967/) Differential Revision: [D109478967](https://our.internmc.facebook.com/intern/diff/D109478967)

…lden) (#20465) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * __->__ #20465 * #20464 * #20463 * #20435 * #20399 * #20398 Adds the test suite for the aten.index.Tensor op (stacked on the op diff): - test/ops/index/test_index.py: exports a module computing x[idx] through VulkanPartitioner for four configs (reorder/repeat indices over distinct self values, so a wrong-gather is visible), asserts a VulkanBackend delegate with index.Tensor absorbed (not a CPU fallback), and writes per-config .pte + .self/.idx/.golden.bin. - test/native/test_index.cpp: a standalone Dawn binary that loads each .pte, feeds self (fp32) + index (int64 at the program boundary, narrowed to the int32 buffer) and compares the gather against the torch golden at 1e-3, with a single-output shape guard. - Wired into CMake (webgpu_index_test), test/TARGETS (python_unittest test_index), and the Dawn native CI script. @exported-using-ghexport Differential Revision: [D109479000](https://our.internmc.facebook.com/intern/diff/D109479000/) Differential Revision: [D109479000](https://our.internmc.facebook.com/intern/diff/D109479000)

Pull Request resolved: #20399 Registers `aten.cat.default` in the `cases.py` op-test framework: a `_cat_suite` of 4 configs (3 inputs along dim 0, 2 inputs along dim 1, 3 inputs along dim 2, uneven split along dim 1) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/cat/test_cat.py` (`CatModule` + `CONFIGS` + `_op_delegated` smoke test, with distinct per-input value ranges to catch cross-slab contamination) and the `aten.cat.default` partitioner-allowlist entry in `tester.py`. ghstack-source-id: 397534688 @exported-using-ghexport @diff-train-skip-merge Differential Revision: [D108793163](https://our.internmc.facebook.com/intern/diff/D108793163/)

Update

14ba473

[ghstack-poisoned]

This was referenced Jun 18, 2026

[ExecuTorch][WebGPU] mul op test suite (cases.py op-test framework) #20359

Merged

[ExecuTorch][WebGPU] Add view_copy op (aten.view_copy.default) #20360

Merged

JulianCloudNTH temporarily deployed to cadence June 18, 2026 21:36 — with GitHub Actions Inactive

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2026

Update

cb565be

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 18, 2026 22:25 — with GitHub Actions Inactive

meta-codesync Bot added the meta-exported label Jun 18, 2026

Update

ff781f2

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 22, 2026 20:40 — with GitHub Actions Inactive

Update

22ded57

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 25, 2026 17:24 — with GitHub Actions Inactive

psiddh approved these changes Jun 26, 2026

View reviewed changes

Update

c1baa4b

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 26, 2026 21:21 — with GitHub Actions Inactive

Update

84a44d0

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 27, 2026 20:36 — with GitHub Actions Inactive

JulianCloudNTH merged commit 3007317 into gh/JulianCloudNTH/48/base Jun 27, 2026
178 of 180 checks passed

JulianCloudNTH deleted the gh/JulianCloudNTH/48/head branch June 27, 2026 21:25

JulianCloudNTH temporarily deployed to cherry-pick-bot June 27, 2026 21:25 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Jun 27, 2026

[ExecuTorch][WebGPU] cat op test suite (cases.py op-test framework) #20566

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ExecuTorch][WebGPU] cat op test suite (cases.py op-test framework)#20399

[ExecuTorch][WebGPU] cat op test suite (cases.py op-test framework)#20399
JulianCloudNTH merged 8 commits into
gh/JulianCloudNTH/48/basefrom
gh/JulianCloudNTH/48/head

JulianCloudNTH commented Jun 18, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

JulianCloudNTH commented Jun 18, 2026

Uh oh!

claude Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JulianCloudNTH commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20399

⏳ No Failures, 77 Pending

Uh oh!

github-actions Bot commented Jun 18, 2026

This PR needs a release notes: label

Uh oh!

JulianCloudNTH commented Jun 18, 2026

Uh oh!

claude Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: WebGPU cat op-test suite

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JulianCloudNTH commented Jun 18, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 18, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 18, 2026 •

edited

Loading

Review: WebGPU `cat` op-test suite