Skip to content

Conversation

@Thyre
Copy link
Collaborator

@Thyre Thyre commented Apr 25, 2025

Summary

This PR aims to implement a similar option to cuda-compute-capabilities (and related options) for AMD GPUs.
The option can then replace the manual handling done in some EasyBlocks, e.g. Clang & LLVM, allowing to enable (some) GPU builds without the need to alter the EasyConfig.

Most of the handling was copied from CUDA, while some options were skipped as they don't make much sense, e.g. cuda_cc_space_sep_no_period.

The used regex should support all GPU architectures starting from gfx600, including the more recent generic targets.
Actual compiler support then needs to be present in the compiler consuming these architectures. Both GCC and LLVM accept the same naming, i.e. gfx[...], including generic targets.


Missing features compared to CUDA

  • cuda_cache_dir option is missing. I haven't found something similar for HIP yet, but may simply have missed it
  • "int only" options are missing, though hard to provide with generic targets and targets like gfx90a
    • Maybe a target without gfx?

More to be determined.

Known issues

  • The regex for generic targets is not perfect, allowing e.g. gfx10--generic to pass, even though it is not allowed.

Resolves #4829

@Thyre Thyre force-pushed the support-passing-amdgcn branch from 958ad0a to bff1bfb Compare April 25, 2025 21:39
@Thyre Thyre force-pushed the support-passing-amdgcn branch from bff1bfb to 0e7aaf3 Compare April 25, 2025 22:55
@Thyre Thyre changed the title Add AMDGCN options similar to cuda-compute-capabilities Add AMDGCN option similar to cuda-compute-capabilities Apr 25, 2025
@boegel boegel added this to the 5.x milestone May 7, 2025
@Thyre Thyre force-pushed the support-passing-amdgcn branch from 0e7aaf3 to d4ba387 Compare May 10, 2025 12:14
@Thyre
Copy link
Collaborator Author

Thyre commented May 10, 2025

Started to create a test set of EasyConfig & EasyBlock changes to test the option, starting with LLVM & CMake...
The next logical step would be to build some HIP application with CMake, and maybe try something more special like AdaptiveCpp. I'll use a system ROCm for this, but at the end, everything should also work with an EB built ROCm.

Let's see if this works the way I expect.

https://github.com/Thyre/easybuild-custom/tree/support-passing-amdgcn

Thyre and others added 4 commits July 7, 2025 06:28
AMD doesn't name this compute capabilities, and amdhsa is only used when
lowering to HSA (but amdpal & mesa3d are also possible). Therefore,
simple the name option 'amdgcn-capabilities'.

Signed-off-by: Jan Andre Reuter <[email protected]>
This allows users to handle cases like LLVM, where building with GPU
support is optional, but users might still want to install the software
without GPU support.

Signed-off-by: Jan Andre Reuter <[email protected]>
Signed-off-by: Jan André Reuter <[email protected]>
@Thyre Thyre force-pushed the support-passing-amdgcn branch from db9a681 to 4af19e3 Compare July 7, 2025 04:32
@Thyre Thyre force-pushed the support-passing-amdgcn branch from 6e32eac to afa6558 Compare July 7, 2025 04:44
Micket
Micket previously approved these changes Jul 15, 2025
Copy link
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

I really don't have any hardware to test any of this on. I trust you have tested this quite a bit?

@Micket
Copy link
Contributor

Micket commented Jul 15, 2025

We are hitting rate limits (again?)
We need to rethink those frameworks tests. Bunch of issues like this

ERROR: test_fetch_easyconfigs_from_commit (test.framework.github.GithubTest)
Test fetch_easyconfigs_from_commit function.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/runner/a2a179c7e5ef5c3a44bda1281d10211f5940d494/lib/python3.8/site-packages/test/framework/github.py", line 561, in test_fetch_easyconfigs_from_commit
    res = fetch_easyconfigs_from_commit(test_commit)
  File "/tmp/runner/a2a179c7e5ef5c3a44bda1281d10211f5940d494/lib/python3.8/site-packages/easybuild/tools/github.py", line 807, in fetch_easyconfigs_from_commit
    return fetch_files_from_commit(commit, files=files, path=path, github_repo=GITHUB_EASYCONFIGS_REPO)
  File "/tmp/runner/a2a179c7e5ef5c3a44bda1281d10211f5940d494/lib/python3.8/site-packages/easybuild/tools/github.py", line 748, in fetch_files_from_commit
    raise EasyBuildError(error_msg, exit_code=EasyBuildExit.FAIL_GITHUB)
easybuild.tools.build_log.EasyBuildError: 'Failed to download diff for easybuilders/easybuild-easyconfigs commit 6515b44cd84a20fe7876cb4bdaf3c0080e688566! (HTTP Error 403: rate limit exceeded)'

@Thyre
Copy link
Collaborator Author

Thyre commented Jul 15, 2025

lgtm

I really don't have any hardware to test any of this on. I trust you have tested this quite a bit?

I've basically used this to build all of the ROCm software on two separate machines which I'm trying to bring to EasyBuild (after my vacation).

You'll find quite a few test reports from my Arch Linux machine (or jrc0850) with the config parameter being in the config.

Some test reports:

What I haven‘t explicitly tested (again) is using the generic targets, also because they‘re still quite new in ROCm.
Let me try that (and explicitly passing nothing to ensure that e.g. LLVM 19 works with ’gfx1201` in the config file) works as expected. That will have to wait until next week though.

@Micket
Copy link
Contributor

Micket commented Jul 15, 2025

OK so i'll let you also test that before merging then? I'll also be away traveling after this week, so if anyone else wants to hit merge please go ahead.

@Thyre
Copy link
Collaborator Author

Thyre commented Jul 15, 2025

Yeah, I'll test those things once I'm back home. If everything works, I'll ping in our merge-sprint channel 😄

Do not allow patterns like `gfx10--generic`

Co-authored-by: Davide Grassano <[email protected]>
@Thyre
Copy link
Collaborator Author

Thyre commented Jul 24, 2025

I don't know how we could test this in the test suite itself, but this is the result with our regex pattern now:

 jreuter@Linux  ~  eb --amdgcn-capabilities= GCC-14.3.0.eb --rebuild 1>/dev/null               
 jreuter@Linux  ~  eb --amdgcn-capabilities=gfx1101,gfx1201 GCC-14.3.0.eb --rebuild 1>/dev/null
 jreuter@Linux  ~  eb --amdgcn-capabilities=gfx11-generic,gfx1201 GCC-14.3.0.eb --rebuild 1>/dev/null
 jreuter@Linux  ~  eb --amdgcn-capabilities=gfx10-4-generic,gfx1201 GCC-14.3.0.eb --rebuild 1>/dev/null
 jreuter@Linux  ~  eb --amdgcn-capabilities=gfx10--generic,gfx1201 GCC-14.3.0.eb --rebuild 1>/dev/null
ERROR: Failed to parse configuration options: "Found problems validating the options: Incorrect values in --amdgcn-capabilities (expected pattern: 'gfx[0-9]+[a-z]?$' or 'gfx[0-9]+(\\-[0-9])?-generic$'): gfx10--generic"
 ✘ jreuter@Linux  ~  eb --amdgcn-capabilities=gfx--generic,gfx1201 GCC-14.3.0.eb --rebuild 1>/dev/null  
ERROR: Failed to parse configuration options: "Found problems validating the options: Incorrect values in --amdgcn-capabilities (expected pattern: 'gfx[0-9]+[a-z]?$' or 'gfx[0-9]+(\\-[0-9])?-generic$'): gfx--generic"
 ✘ jreuter@Linux  ~  eb --amdgcn-capabilities=foobar GCC-14.3.0.eb --rebuild 1>/dev/null
ERROR: Failed to parse configuration options: "Found problems validating the options: Incorrect values in --amdgcn-capabilities (expected pattern: 'gfx[0-9]+[a-z]?$' or 'gfx[0-9]+(\\-[0-9])?-generic$'): foobar"

@Crivella
Copy link
Contributor

I don't know how we could test this in the test suite itself, but this is the result with our regex pattern now:

Yeah played around with it a little bit but every function i tried really does not seem to want to run the validate method 😅

Copy link
Contributor

@Crivella Crivella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Crivella Crivella modified the milestones: 5.x, release after 5.1.1 Jul 25, 2025
@Crivella
Copy link
Contributor

Going in, thanks @Thyre!

@Crivella Crivella merged commit c5019c1 into easybuilders:develop Jul 25, 2025
37 checks passed
@boegel boegel changed the title Add AMDGCN option similar to cuda-compute-capabilities Add support for amdgcn-capabilities configuration option and amdgcn_capabilities easyconfig parameter + related templates, similar to cuda-compute-capabilities Jul 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Introduce cuda_compute_capabilities (and related) options for AMD GPU architectures

4 participants