-
Notifications
You must be signed in to change notification settings - Fork 317
Benchamarking #1353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Benchamarking #1353
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Copyright (c) Meta Platforms, Inc. and affiliates. | ||
# All rights reserved. | ||
# | ||
# This source code is licensed under the license found in the | ||
# LICENSE file in the root directory of this source tree. | ||
|
||
import glob | ||
import subprocess | ||
import tempfile | ||
import torch | ||
|
||
def cmake_build_torchao_ops(cmake_lists_path, temp_build_dir): | ||
from distutils.sysconfig import get_python_lib | ||
print("Building torchao ops for ATen target") | ||
cmake_prefix_path = get_python_lib() | ||
subprocess.run( | ||
[ | ||
"cmake", | ||
"-DCMAKE_PREFIX_PATH=" + cmake_prefix_path, | ||
"-DCMAKE_INSTALL_PREFIX=" + temp_build_dir.name, | ||
"-S " + cmake_lists_path, | ||
"-B " + temp_build_dir.name, | ||
] | ||
) | ||
subprocess.run( | ||
[ | ||
"cmake", | ||
"--build", | ||
temp_build_dir.name, | ||
"-j 16", | ||
"--target install", | ||
"--config Release", | ||
] | ||
) | ||
|
||
def temp_build_and_load_torchao_ops(cmake_lists_path): | ||
temp_build_dir = tempfile.TemporaryDirectory() | ||
cmake_build_torchao_ops(cmake_lists_path, temp_build_dir) | ||
libs = glob.glob(f"{temp_build_dir.name}/lib/libtorchao_ops_aten.*") | ||
libs = list(filter(lambda l: (l.endswith("so") or l.endswith("dylib")), libs)) | ||
assert len(libs) == 1 | ||
torch.ops.load_library(libs[0]) | ||
print(f"TorchAO ops are loaded from {libs[0]}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont follow why we are doing this way? Why can we not just try to load the ops and if not found raise exception with build steps needed? Ideally we can detect the platform and build this deps as part of some pre-req for running benchmarks under _model directory. When we are able to ship these kernels as part of pip package, we may not need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the blocker for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to move out of the experimental. I think we need to follow up on this. I believe we have sufficient evidence now? @supriyar ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess by "just try to load the ops" you mean something like this: https://github.com/pytorch/executorch/blob/main/extension/llm/custom_ops/sdpa_with_kv_cache.py#L21-L34 (but with the except block replaced by install instructions)
We could do that, but in the current setup, won't the try block always fail when running this benchmarking script (unless we build/load the ops in setup.py)? I'm also not sure what the instructions would say to make the script runnable without telling the user to modify the script by adding a torch.load_library line. I guess we could ask them to define an environment variable with the library location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@metascroy Yes, but I dont follow the second part. Why user needs to modify the script. Can we not just
import torchao.experimental.lowbit_ops
which internally does try/except? But I do kind of get what you are doing here because any build scripts will ahve to figure out where to put the build artifact (.so) and then we need to load from there.Ideally it should be installed as part of the setup instructions or pip package. So we can also follow something like https://github.com/pytorch/ao/blob/main/setup.py#L53 and add extra option to build with experimental lowbit quant features. So if user invokes the benchmarking script without building experimental kernels you can suggest please do
python setup.py --use_experimental_low_bit_kernels
orpip install . --use_experimental_low_bit_kernels
. This feels a bit cleaner to me but curious to know your thoughts and also from @msaroufimThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern is whether we can reuse the cmake setup stuff we already have (e.g., this function that sets up the parallel compute: https://github.com/pytorch/ao/blob/main/torchao/experimental/Utils.cmake#L7)? If we bring in KleidiAI/CPUInfo via CMake, that will be more stuff to worry about.
I haven't used the torch CppExtension in setup.py, but it looks fairly simplistic compared to cmake. Perhaps we could do something like what this blog does, if it does not already exist in PyTorch: https://martinopilia.com/posts/2018/09/15/building-python-extension.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing something but Executorch's pybinding extension with xnnpack builds xnnpack which build with cpuinfo and pthreadpool and everything. Granted that is a whole lot to build but it does work in ET. So likely there is a nuance that you are worried about that I am not understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this goes back to one of my earlier comments to @jerryzh168: the current setup.py in torchao does not really support cmake, and it would require a good amount of refactoring to support it. Currently setup.py in torchao is built on utilities in torch.utils.cpp_extension, which look somewhat simplistic and as far as I can tell, do not support cmake.
ET's setup.py defines a custom extension to support cmake and doing something similar to torchao looks like a sizable refactor to their setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will have to rely on your answer for this since I havent looked at all the details of cpp_extension. I do remember it being simple but I dont know if there are ways to add deps as part of cpp_extension. I doubt but see if possible.
If not, I think it is worth proposing this for the sake of making ARM kernels available on mac builds. @drisspg any thoughts? discussion here is largely around more complex setup.py in order to allow us to build cpp package extensions that package cpu kernels and make it available as part of python package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited setup.py to build the experimental kernels here: D67777662
Need feedback from torchao folks on whether the changes are acceptable.