Skip to content

Conversation

@Thyre
Copy link
Collaborator

@Thyre Thyre commented Jul 3, 2025

(created using eb --new-pr)

Requires:


This is the first major step towards building ROCm and it's components. ROCm-LLVM is a fork of upstream LLVM with several major additions. This complicates the build process a lot, e.g. requiring to basically build ROCm-LLVM twice just to get some offload components working correctly.

Most of the heavy lifting was done by @bedroge, most of my work was just to bring this to GCCcore/14.2.0, figure out build issues and adding the patch to get the OpenMP Tools Interface working.

We skip building and running tests, as the build already takes ages.

One important difference to "upstream" (i.e. AMD's fork) is that we build amdflang from LLVM. We therefore choose to adopt the new flang earlier than AMD does for their official packages. I don't see a reason to still try to go with the old flang.

@Thyre Thyre marked this pull request as draft July 3, 2025 18:56
@github-actions github-actions bot added the new label Jul 3, 2025
…Cm-LLVM-6.4.1_llvm-project-19.0.0_fix-offload-build.patch

Co-Authored-By: Bob Dröge <[email protected]>
@Thyre Thyre force-pushed the 20250703205604_new_pr_ROCm-LLVM641 branch from 2bf841b to 7c46968 Compare July 3, 2025 19:03
@Thyre
Copy link
Collaborator Author

Thyre commented Jul 28, 2025

MI250X node with --amdgcn-capabilities=gfx90a and --cuda-compute-capabilities=7.5,8.0

Test report by @Thyre
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jrc0850.jureca - Linux Rocky Linux 9.5 (Blue Onyx), x86_64, AMD EPYC 7443 24-Core Processor (zen3), Python 3.9.21
See https://gist.github.com/Thyre/7779db041e343ac7d2cd3408787be6c4 for a full test report.

@Thyre
Copy link
Collaborator Author

Thyre commented Jul 28, 2025

@boegelbot please test @ jsc-zen3
EB_ARGS="--include-easyblocks-from-pr=3823 --amdgcn-capabilities=gfx90a --installpath=/tmp/$USER/pr23304"

@boegelbot
Copy link
Collaborator

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23304 EB_ARGS="--include-easyblocks-from-pr=3823 --amdgcn-capabilities=gfx90a --installpath=/tmp/$USER/pr23304" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23304 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7376

Test results coming soon (I hope)...

- notification for comment with ID 3126806295 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/532c6af9a7fd099de71a95e064894054 for a full test report.

@bedroge
Copy link
Contributor

bedroge commented Jul 29, 2025

MI100+MI210 node with --amdgcn-capabilities=gfx90a,gfx908.

Test report by @bedroge
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
il-c04 - Linux Rocky Linux 9.5 (Blue Onyx), x86_64, AMD EPYC 7542 32-Core Processor, 1 x AMD Arcturus GL-XL [Instinct MI100] (model: 0x738c, driver: "6.8.5"), 1 x AMD Instinct MI210 (model: 0x740f, driver: "6.8.5"), Python 3.9.21
See https://gist.github.com/bedroge/e7b726fa933f4f7f83065786d5fe22b8 for a full test report.

@Thyre Thyre marked this pull request as ready for review July 29, 2025 11:42
@Thyre Thyre added the 2025a issues & PRs related to 2025a common toolchains label Aug 1, 2025
@boegel boegel added this to the 5.x milestone Aug 18, 2025
@Thyre Thyre changed the title {tools}[GCCcore/14.2.0] ROCm-LLVM v6.4.1 {tools}[GCCcore/14.2.0] ROCm-LLVM v19.0.0 w/ ROCm 6.4.1 Oct 22, 2025
@Thyre
Copy link
Collaborator Author

Thyre commented Oct 24, 2025

@boegelbot please test @ jsc-zen3
EB_ARGS="--include-easyblocks-from-pr=3823 --amdgcn-capabilities=gfx90a --installpath=/tmp/$USER/pr23304"

@boegelbot
Copy link
Collaborator

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23304 EB_ARGS="--include-easyblocks-from-pr=3823 --amdgcn-capabilities=gfx90a --installpath=/tmp/$USER/pr23304" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23304 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8540

Test results coming soon (I hope)...

- notification for comment with ID 3444812706 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/d3dbf1e3a2ecff938dd662e159b5a75a for a full test report.

Signed-off-by: Jan André Reuter <[email protected]>
@Thyre
Copy link
Collaborator Author

Thyre commented Oct 24, 2025

@boegelbot please test @ jsc-zen3
EB_ARGS="--include-easyblocks-from-pr=3823 --amdgcn-capabilities=gfx90a --installpath=/tmp/$USER/pr23304"

@boegelbot
Copy link
Collaborator

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23304 EB_ARGS="--include-easyblocks-from-pr=3823 --amdgcn-capabilities=gfx90a --installpath=/tmp/$USER/pr23304" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23304 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8541

Test results coming soon (I hope)...

- notification for comment with ID 3444883950 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/2da3b69bcbe235dce38526c83c693e02 for a full test report.

@Thyre
Copy link
Collaborator Author

Thyre commented Oct 25, 2025

Forgot --amdgcn-capabilities


Test report by @Thyre
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
jrc0850.jureca - Linux Rocky Linux 9.6 (Blue Onyx), x86_64, AMD EPYC 7443 24-Core Processor (zen3), 8 x AMD AMD Instinct MI250X / MI250 (model: 0x740c, driver: "6.12.12"), Python 3.9.21
See https://gist.github.com/Thyre/5f6f0a29cf2fc5189f637b206be35782 for a full test report.

@Thyre
Copy link
Collaborator Author

Thyre commented Oct 25, 2025

Test report by @Thyre
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jrc0850.jureca - Linux Rocky Linux 9.6 (Blue Onyx), x86_64, AMD EPYC 7443 24-Core Processor (zen3), 8 x AMD AMD Instinct MI250X / MI250 (model: 0x740c, driver: "6.12.12"), Python 3.9.21
See https://gist.github.com/Thyre/c3ff50c5996dcdfa061653141483dbdb for a full test report.

@Thyre
Copy link
Collaborator Author

Thyre commented Oct 25, 2025

@boegelbot please test @ jsc-zen3
EB_ARGS="--include-easyblocks-from-pr=3823 --amdgcn-capabilities=gfx90a --installpath=/tmp/$USER/pr23304"

@boegelbot
Copy link
Collaborator

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23304 EB_ARGS="--include-easyblocks-from-pr=3823 --amdgcn-capabilities=gfx90a --installpath=/tmp/$USER/pr23304" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23304 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8551

Test results coming soon (I hope)...

- notification for comment with ID 3447807268 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/4b0c7aae9e7538ba53397155d8a510c8 for a full test report.

@Thyre
Copy link
Collaborator Author

Thyre commented Oct 26, 2025

Test report by boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
FAILED Build succeeded for 0 out of 1 (1 easyconfigs in total) jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/4b0c7aae9e7538ba53397155d8a510c8 for a full test report.

== 2025-10-26 04:05:04,861 build_log.py:226 ERROR EasyBuild encountered an error (at easybuild/easybuild-framework/easybuild/base/exceptions.py:126 in __init__): Sanity check failed: sanity check command cd /tmp/eb-b4mnuj_0/tmp39chs30w && g++ minimal.cpp -o minimal_cpp $(llvm-config --link-static --system-libs all) failed with exit code 1 (output: /project/def-maintainers/boegelbot/rocky9/zen3/software/binutils/2.42-GCCcore-14.2.0/bin/ld: cannot find -lxml2: No such file or directory
collect2: error: ld returned 1 exit status
) (at easybuild/easybuild-framework/easybuild/framework/easyblock.py:4407 in _sanity_check_step)

Hm, we already have libxml2 as a dependency, but it still couldn't find it? I'll check on jsc-zen3 if the libxml2 module is looking okay.


The module is looking entirely fine, and trying to manually link a similar program with -lxml2 also works. I don't know what happened here. I'll try another build which includes libxml2...

@Thyre
Copy link
Collaborator Author

Thyre commented Oct 26, 2025

@boegelbot please test @ jsc-zen3
EB_ARGS="--include-easyblocks-from-pr=3823 --amdgcn-capabilities=gfx90a --installpath=/tmp/$USER/pr23304 libxml2-2.13.4-GCCcore-14.2.0.eb ROCm-LLVM-19.0.0-GCCcore-14.2.0-ROCm-6.4.1.eb"

@boegelbot
Copy link
Collaborator

@Thyre: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23304 EB_ARGS="--include-easyblocks-from-pr=3823 --amdgcn-capabilities=gfx90a --installpath=/tmp/$USER/pr23304 libxml2-2.13.4-GCCcore-14.2.0.eb ROCm-LLVM-19.0.0-GCCcore-14.2.0-ROCm-6.4.1.eb" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23304 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8553

Test results coming soon (I hope)...

- notification for comment with ID 3448276629 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@bedroge
Copy link
Contributor

bedroge commented Oct 26, 2025

The test build on my machine got stuck, there seems to be an issue with the GPU (even amd-smi/rocm-smi gets stuck). Once that's fixed I'll retry.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/a9c2f5116cc20ba4dc7c5c5b61b8cc16 for a full test report.

@bedroge
Copy link
Contributor

bedroge commented Oct 26, 2025

Test report by @bedroge
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
il-c04 - Linux Rocky Linux 9.5 (Blue Onyx), x86_64, AMD EPYC 7542 32-Core Processor, 1 x AMD Arcturus GL-XL [Instinct MI100] (model: 0x738c, driver: "6.8.5"), 1 x AMD Instinct MI210 (model: 0x740f, driver: "6.8.5"), Python 3.9.21
See https://gist.github.com/bedroge/0cc5be3e80d9f5117bf790d8fbac9377 for a full test report.

@Thyre
Copy link
Collaborator Author

Thyre commented Oct 26, 2025

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3823
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total) jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/a9c2f5116cc20ba4dc7c5c5b61b8cc16 for a full test report.

Exactly the same failure...
Since the logs are unfortunately gone, I can't really figure out why this happens.
The module is there and the existing logs at least give the impression that libxml2 is loaded. So why does this still fail? Maybe llvm-config returns something bogus on this machine?

I guess we need to do a "manual" build on jsc-zen3 to debug this further. Builds on other machines seem to work...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025a issues & PRs related to 2025a common toolchains new

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants