Enabling end-user GPU support and ABI-compatible overrides

# Accelerator support

In `2021.03` we added the machinery to enable adding GPU support to EESSI by adding a new trusted directory ( where the GPU runtime libraries are expected to be found) to the compatability loader. This approach doesn't do any sanity-checking though and leaves the door open to things not working (for example because the driver is too old for the CUDA shipped with EESSI). I think we can do better than this:

- Add a _single_ variant symlink to the CMVFS config that points (by default) to `/opt/eessi`
    - Since the pilot version number occurs right at the root, this will have to be something like `/cvmfs/pilot.eessi-hpc.org/host_injections`
- Create a script that controls the structure underneath this folder
    - Driver libraries are related to the compatability layer so I would suggest, e.g., `/cvmfs/pilot.eessi-hpc.org/host_injections/2021.03/compat/linux/x86_64/lib` (and this will be added as a trusted glibc dir for the compatability layer)
    - The script will check that driver libraries are adequate to support the requested stack (and tell you what to do if they are not)
    - It will search for all the libraries required to use the CUDA-enable EESSI stack (see [recent OpenSSL easyblock](https://github.com/easybuilders/easybuild-easyblocks/pull/2429) PR for the machinery to do this)
    - It will place symlinks to the libraries in the designated subfolder
    - It will perform sanity checks to verify that the setup works, e.g.,
      ```
      LD_LIBRARY_PATH=/cvmfs/pilot.eessi-hpc.org/host_injections/2021.03/compat/linux/x86_64/lib my_cuda_verification_exec
      ```
- The script can tell the user how to add all the CUDA-enabled modules to their environment
    - I think CUDA-enabled _modules_ should be installed in a separate path, for a flat naming scheme then making the modules available just means an additional `module use ...`. For a hierarchical scheme, it should mean setting an additional environment variable.
    - EasyBuild has all the machinery needed to support this, we just need to build this configuration into our hooks.

# RPATH overrides

We can also use this approach to allow for controlled ABI-compatible overrides of the software EESSI provides:

- For RPATH injections, this is related to the software installations themselves so should follow the existing subdirectory structure of EESSI underneath the symlink, e.g., `/cvmfs/pilot.eessi-hpc.org/host_injections/2021.03/software/linux/x86_64/intel/skylake_avx512/rpath_overrides/{software}/{version?}/lib`
- We then follow similar approach as for the accelerator support: find the libraries required for the override, symlink them, sanity check that executables use the override libraries and not the EESSI installations (again see [recent OpenSSL easyblock](https://github.com/easybuilders/easybuild-easyblocks/pull/2429))
- For security, we should limit the ability to override the installation and support overrides on a case by case basis. A definite use case is MPI, IO is another likely candidate.
    - We can control the use of overrides via our EESSI EasyBuild hook. For example for OpenMPI, we only use the override if OpenMPI is part of the toolchain being used or is listed as a dependency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling end-user GPU support and ABI-compatible overrides #108

Accelerator support

RPATH overrides

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enabling end-user GPU support and ABI-compatible overrides #108

Description

Accelerator support

RPATH overrides

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions