Skip to content

Conversation

nmacchioni
Copy link
Contributor

@nmacchioni nmacchioni commented May 14, 2024

add a switch to change the gemm autotuning search space between the default (the current set of hardcoded configs) and an exhaustive search space that enumerates all block sizes in [16, 32, 64, 128, 256], stages in [1, 2, 3, 4, 5], and warps in [2, 4, 6]

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

add a switch to change the gemm autotuning search space between the default (the current set of hardcoded configs) and an exhaustive search space that enumerates all block sizes in [16, 32, 64, 128, 256], stages in [1, 2, 3, 4, 5], and warps in [2, 4, 6]
Copy link

pytorch-bot bot commented May 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126220

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (8 Unrelated Failures)

As of commit 2114dc0 with merge base b522e65 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@nmacchioni nmacchioni marked this pull request as ready for review May 14, 2024 22:32
@nmacchioni nmacchioni requested a review from eellison May 15, 2024 01:24
@nmacchioni
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 17, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@nmacchioni
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@nmacchioni
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

ZelboK pushed a commit to ZelboK/pytorch that referenced this pull request May 19, 2024
add a switch to change the gemm autotuning search space between the default (the current set of hardcoded configs) and an exhaustive search space that enumerates all block sizes in [16, 32, 64, 128, 256], stages in [1, 2, 3, 4, 5], and warps in [2, 4, 6]

Pull Request resolved: pytorch#126220
Approved by: https://github.com/eellison
@github-actions github-actions bot deleted the nmacchioni-patch-2 branch June 17, 2024 01:57
jerryzh168 added a commit to jerryzh168/ao that referenced this pull request Dec 9, 2024
Summary:
similar to pytorch/pytorch#126220 we added exhaustive option for int8mm and scaled_mm kernels in torchao

Note that there seems to be native int8mm and scaled_mm support in pytorch:
https://github.com/pytorch/pytorch/blob/0610b9730e27d066e26396a2d655ba0d98c2012d/torch/_inductor/kernel/mm.py#L305 for int8mm and https://github.com/pytorch/pytorch/blob/0610b9730e27d066e26396a2d655ba0d98c2012d/torch/_inductor/kernel/mm_scaled.py#L575 for scaled mm
maybe we should use that at some point.

Test Plan:
```
cd benchmarks
TORCHAO_AUTOTUNER_ENABLE=1 python intmm.py --file_path intmm_shapes.csv
TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_SEARCH_SPACE=EXHAUSTIVE TORCHAO_AUTOTUNER_ENABLE=1 python intmm.py --file_path intmm_shapes.csv
```
Reviewers:

Subscribers:

Tasks:

Tags:
jerryzh168 added a commit to pytorch/ao that referenced this pull request Dec 11, 2024
* Add exhaustive config option to intmm kernel

Summary:
similar to pytorch/pytorch#126220 we added exhaustive option for int8mm and scaled_mm kernels in torchao

Note that there seems to be native int8mm and scaled_mm support in pytorch:
https://github.com/pytorch/pytorch/blob/0610b9730e27d066e26396a2d655ba0d98c2012d/torch/_inductor/kernel/mm.py#L305 for int8mm and https://github.com/pytorch/pytorch/blob/0610b9730e27d066e26396a2d655ba0d98c2012d/torch/_inductor/kernel/mm_scaled.py#L575 for scaled mm
maybe we should use that at some point.

Test Plan:
```
cd benchmarks
TORCHAO_AUTOTUNER_ENABLE=1 python intmm.py --file_path intmm_shapes.csv
TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_SEARCH_SPACE=EXHAUSTIVE TORCHAO_AUTOTUNER_ENABLE=1 python intmm.py --file_path intmm_shapes.csv
```
Reviewers:

Subscribers:

Tasks:

Tags:

* remove unused

* enable all autoquant qtensor

* guard float8 qtensor subclass

* guard exhaustive config torch version
amdfaa pushed a commit to pytorch/ao that referenced this pull request Jan 10, 2025
* Add exhaustive config option to intmm kernel

Summary:
similar to pytorch/pytorch#126220 we added exhaustive option for int8mm and scaled_mm kernels in torchao

Note that there seems to be native int8mm and scaled_mm support in pytorch:
https://github.com/pytorch/pytorch/blob/0610b9730e27d066e26396a2d655ba0d98c2012d/torch/_inductor/kernel/mm.py#L305 for int8mm and https://github.com/pytorch/pytorch/blob/0610b9730e27d066e26396a2d655ba0d98c2012d/torch/_inductor/kernel/mm_scaled.py#L575 for scaled mm
maybe we should use that at some point.

Test Plan:
```
cd benchmarks
TORCHAO_AUTOTUNER_ENABLE=1 python intmm.py --file_path intmm_shapes.csv
TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_SEARCH_SPACE=EXHAUSTIVE TORCHAO_AUTOTUNER_ENABLE=1 python intmm.py --file_path intmm_shapes.csv
```
Reviewers:

Subscribers:

Tasks:

Tags:

* remove unused

* enable all autoquant qtensor

* guard float8 qtensor subclass

* guard exhaustive config torch version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants