variable search spaces for gemm autotuning #126220

nmacchioni · 2024-05-14T22:29:30Z

add a switch to change the gemm autotuning search space between the default (the current set of hardcoded configs) and an exhaustive search space that enumerates all block sizes in [16, 32, 64, 128, 256], stages in [1, 2, 3, 4, 5], and warps in [2, 4, 6]

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

add a switch to change the gemm autotuning search space between the default (the current set of hardcoded configs) and an exhaustive search space that enumerates all block sizes in [16, 32, 64, 128, 256], stages in [1, 2, 3, 4, 5], and warps in [2, 4, 6]

pytorch-bot · 2024-05-14T22:29:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126220

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (8 Unrelated Failures)

As of commit 2114dc0 with merge base b522e65 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 1, 3, linux.8xlarge.nvidia.gpu) (gh) (trunk failure)
distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 2, 3, linux.8xlarge.nvidia.gpu) (gh) (trunk failure)
distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

nmacchioni · 2024-05-17T04:47:37Z

@pytorchbot merge

pytorchmergebot · 2024-05-17T04:49:28Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

nmacchioni · 2024-05-17T05:22:27Z

@pytorchbot label "topic: not user facing"

nmacchioni · 2024-05-17T05:22:33Z

@pytorchbot merge

pytorchmergebot · 2024-05-17T05:25:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

add a switch to change the gemm autotuning search space between the default (the current set of hardcoded configs) and an exhaustive search space that enumerates all block sizes in [16, 32, 64, 128, 256], stages in [1, 2, 3, 4, 5], and warps in [2, 4, 6] Pull Request resolved: pytorch#126220 Approved by: https://github.com/eellison

Summary: similar to pytorch/pytorch#126220 we added exhaustive option for int8mm and scaled_mm kernels in torchao Note that there seems to be native int8mm and scaled_mm support in pytorch: https://github.com/pytorch/pytorch/blob/0610b9730e27d066e26396a2d655ba0d98c2012d/torch/_inductor/kernel/mm.py#L305 for int8mm and https://github.com/pytorch/pytorch/blob/0610b9730e27d066e26396a2d655ba0d98c2012d/torch/_inductor/kernel/mm_scaled.py#L575 for scaled mm maybe we should use that at some point. Test Plan: ``` cd benchmarks TORCHAO_AUTOTUNER_ENABLE=1 python intmm.py --file_path intmm_shapes.csv TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_SEARCH_SPACE=EXHAUSTIVE TORCHAO_AUTOTUNER_ENABLE=1 python intmm.py --file_path intmm_shapes.csv ``` Reviewers: Subscribers: Tasks: Tags:

* Add exhaustive config option to intmm kernel Summary: similar to pytorch/pytorch#126220 we added exhaustive option for int8mm and scaled_mm kernels in torchao Note that there seems to be native int8mm and scaled_mm support in pytorch: https://github.com/pytorch/pytorch/blob/0610b9730e27d066e26396a2d655ba0d98c2012d/torch/_inductor/kernel/mm.py#L305 for int8mm and https://github.com/pytorch/pytorch/blob/0610b9730e27d066e26396a2d655ba0d98c2012d/torch/_inductor/kernel/mm_scaled.py#L575 for scaled mm maybe we should use that at some point. Test Plan: ``` cd benchmarks TORCHAO_AUTOTUNER_ENABLE=1 python intmm.py --file_path intmm_shapes.csv TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_SEARCH_SPACE=EXHAUSTIVE TORCHAO_AUTOTUNER_ENABLE=1 python intmm.py --file_path intmm_shapes.csv ``` Reviewers: Subscribers: Tasks: Tags: * remove unused * enable all autoquant qtensor * guard float8 qtensor subclass * guard exhaustive config torch version

pytorch-bot bot added ciflow/inductor module: inductor labels May 14, 2024

changes to mm_common configs

7287786

nmacchioni marked this pull request as ready for review May 14, 2024 22:32

nmacchioni requested a review from eellison May 15, 2024 01:24

nmacchioni added 2 commits May 16, 2024 14:41

fix lint for config.py

c604dcb

fix lint for mm_common.py

2114dc0

eellison approved these changes May 16, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 17, 2024

pytorchmergebot added the merging label May 17, 2024

pytorchmergebot removed the merging label May 17, 2024

pytorch-bot bot added the topic: not user facing topic category label May 17, 2024

pytorchmergebot added the merging label May 17, 2024

pytorchmergebot added the Merged label May 17, 2024

pytorchmergebot closed this in 8619fe6 May 17, 2024

pytorchmergebot removed the merging label May 17, 2024

github-actions bot deleted the nmacchioni-patch-2 branch June 17, 2024 01:57

jerryzh168 mentioned this pull request Dec 2, 2024

[Tracker] autoquant tracker pytorch/ao#1215

Open

jerryzh168 mentioned this pull request Dec 9, 2024

Add exhaustive config option to intmm kernel pytorch/ao#1392

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

variable search spaces for gemm autotuning #126220

variable search spaces for gemm autotuning #126220

Uh oh!

nmacchioni commented May 14, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented May 14, 2024 •

edited

Loading

Uh oh!

nmacchioni commented May 17, 2024

Uh oh!

pytorchmergebot commented May 17, 2024

Uh oh!

nmacchioni commented May 17, 2024

Uh oh!

nmacchioni commented May 17, 2024

Uh oh!

pytorchmergebot commented May 17, 2024

Uh oh!

Uh oh!

variable search spaces for gemm autotuning #126220

variable search spaces for gemm autotuning #126220

Uh oh!

Conversation

nmacchioni commented May 14, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126220

✅ You can merge normally! (8 Unrelated Failures)

Uh oh!

nmacchioni commented May 17, 2024

Uh oh!

pytorchmergebot commented May 17, 2024

Merge failed

Uh oh!

nmacchioni commented May 17, 2024

Uh oh!

nmacchioni commented May 17, 2024

Uh oh!

pytorchmergebot commented May 17, 2024

Merge started

Uh oh!

Uh oh!

nmacchioni commented May 14, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 14, 2024 •

edited

Loading