Graph Safe Current Scaling Support for GroupedLinear Module/Ops + Fix CUBLAS GGEMM heuristics for Wgrad#3143
Conversation
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
for more information, see https://pre-commit.ci
Removed details about FP8 current scaling methods. Signed-off-by: vthumbe1503 <vthumbe@nvidia.com>
|
/te-ci pytorch |
Greptile SummaryThis PR extends the graph-safe
Confidence Score: 5/5The changes are well-scoped and additive: the correctness fix for the rowwise_data guard is straightforward, the heuristics change only affects algorithm selection (not numerical output), and the test additions provide direct coverage of the new path. The core logic change — guarding rowwise_data/scale_inv clearing behind columnwise_data is not None — is clearly correct and resolves the previously flagged bug. The cuBLAS heuristic corrections are performance-only (wrong hints lead to suboptimal algorithm selection, not wrong results). No data-loss or correctness regressions are introduced by this PR. tests/pytorch/test_grouped_mlp.py contains a stale skip message; all other files look correct. Important Files Changed
Reviews (6): Last reviewed commit: "Merge branch 'main' into nvfp4_and_fp8_c..." | Re-trigger Greptile |
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
… weight being cuda graphable Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
…3/TransformerEngine into nvfp4_and_fp8_current_scaling
|
/te-ci pytorch |
Signed-off-by: Varun Thumbe <vthumbe@nvidia.com>
|
/te-ci pytorch |
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: