-
Notifications
You must be signed in to change notification settings - Fork 765
Add easyconfigs for ROCm 6.4.1 using ROCm-LLVM toolchain #23542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Updated software
|
|
There seem to be a few things missing still (like
|
easybuild/easyconfigs/r/ROCmInfo/ROCmInfo-6.4.1-LLVMtc-ROCm-6.4.1.eb
Outdated
Show resolved
Hide resolved
| 'dirs': [], | ||
| } | ||
| sanity_check_commands = [ | ||
| 'rocminfo --help', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--help does not exist, and the check will likely fail if no GPU is present (to be verified).
|
Generally, I think trying to sync the integration would make sense as well. I might try to update my ECs to ROCm 6.4.2 (or add them to 14.3.0 to make them available on our AMD nodes), but don't know when I find the time for that. |
|
You'll need to port the patch for ROCm-LLVM from #23304 as well, or else building will fail after the merge of easybuilders/easybuild-easyblocks#3799. This fixes both the build failure and another issue when using the installation later on, e.g. for OpenMP offload. |
easybuild/easyconfigs/a/AMDSMI/AMDSMI-6.4.1-LLVMtc-ROCm-6.4.1.eb
Outdated
Show resolved
Hide resolved
| monitor performance, and retrieve information about the system's drivers and GPUs.""" | ||
| docurls = ['https://rocmdocs.amd.com/projects/amdsmi/'] | ||
|
|
||
| toolchain = {'name': 'LLVMtc', 'version': 'ROCm-6.4.1'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| toolchain = {'name': 'LLVMtc', 'version': 'ROCm-6.4.1'} | |
| toolchain = {'name': 'GCCcore', 'version': '13.3.0'} |
There should be nothing that requires building amdsmi with ROCm-LLVM, see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note the esmi components, which are required (building may fail otherwise) and should not be pulled in during the build. We can provide them ourselves.
easybuild/easyconfigs/c/CppHeaderParser/CppHeaderParser-2.7.4-GCCcore-13.3.0.eb
Outdated
Show resolved
Hide resolved
|
I feel like all components should use the actual version instead of simply specifying We might need to figure out if all components provide a tarball / tag for "their" releases as well or only for ROCm releases though. |
|
Short discussion results from the EESSI slack with @Zeldhoron:
|
| easyblock = 'Bundle' | ||
|
|
||
| name = 'LLVMtc' | ||
| version = 'ROCm-6.4.1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering how does interact with the logic in the toolchain to filter out flags that are not supported by flang
https://github.com/easybuilders/easybuild-framework/blob/7b14ab77662dbead8f4eb0e08d925373ed6135db/easybuild/toolchains/compiler/llvm.py#L59-L66
https://github.com/easybuilders/easybuild-framework/blob/bb62f27df2d3873480b1afe1eba81d7c00db101a/easybuild/toolchains/compiler/llvm.py#L162-L181
We should check that this does not cause LooseVersion to Fail and possibly implement some way to keep the version of LLVM to know what to filter (or add a list of filters for ROCm versions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first check is basically for flags which are not understood until LLVM 21? If so, then we probably have to wait for a first ROCm version based on LLVM 21.
We could also have a ROCmtc, based on LLVMtc. That's yet another toolchain, but would make handling such versioning things easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put 21 there as it was not out yet when made the toolchain but have to test if the flang in 21 actually supports those flags or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first check is basically for flags which are not understood until LLVM 21? If so, then we probably have to wait for a first ROCm version based on LLVM 21.
Not necessarily those checks are to allow general flags to be passed to bot clang and flang but than filter them out if flang does not supported them.
The other way would be to split the flags from generic into C and F _OPTIONS but it would still require some level of version control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe the better approach is to use the LLVM version as the version and then suffix it with ROCm-<rocm_version>, that way we can reuse the logic to check allowed options
|
I've finished updating everything to match the results of the discussion we had. I didn't have the time to test all the changes I made, but there shouldn't be any major difficulties. There are still a couple of easyconfigs that need to be added ( |
|
The problem with excessive memory usage is also being looked at by Gentoo, see https://bugs.gentoo.org/960513 In a recent commit made in the context of that issue, they introduced using |
If I remember correctly, @Zeldhoron was already using this variable, as I also have been doing for hipBLASLt. It just looks like that some GPU targets consume excessive amounts of memory compared to others. Building for I've not tried building for other targets so far. That will probably come once we're a bit further with the ROCm stack, as I don't want to open draft PRs for all components until we've at least the ROCm LLVM itself merged... |
This PR adds easyconfigs to EB for ROCm support.
An overview of everything that's added:
What's left to be added:
This PR builds further upon the following PRs:
amdgcn-capabilitiesconfiguration option andamdgcn_capabilitieseasyconfig parameter + related templates, similar tocuda-compute-capabilitieseasybuild-framework#4860amdgcn_capabilitiesbuild option easybuild-easyblocks#3824