Initial support CUDARequirement extension #1581

tetron · 2021-12-22T20:21:37Z

Express that a CommandLineTool requires NVIDA CUDA GPU support. Includes support for Docker and singularity. Supports requesting multiple GPUs, but currently does not attempt to allocate GPUs among multiple concurrent processes.

Example command line tool:

cwlVersion: v1.2
class: CommandLineTool
baseCommand: nvidia-smi
inputs: []
outputs: []
$namespaces:
  cwltool: http://commonwl.org/cwltool#
hints:
  cwltool:CUDARequirement:
    cudaVersionMin: "11.4"
    cudaComputeCapabilityMin: "3.0"
    deviceCountMin: 1
    deviceCountMax: 8
  DockerRequirement:
    dockerPull: nvidia/cuda:11.4.0-base-ubuntu20.04

codecov · 2021-12-22T20:54:51Z

Codecov Report

Merging #1581 (6e4f428) into main (62fe629) will decrease coverage by 0.21%.
The diff coverage is 39.21%.

@@            Coverage Diff             @@
##             main    #1581      +/-   ##
==========================================
- Coverage   66.12%   65.90%   -0.22%     
==========================================
  Files          91       93       +2     
  Lines       16345    16447     +102     
  Branches     4344     4358      +14     
==========================================
+ Hits        10808    10840      +32     
- Misses       4391     4455      +64     
- Partials     1146     1152       +6

Impacted Files	Coverage Δ
cwltool/process.py	`86.08% <ø> (ø)`
cwltool/job.py	`76.37% <33.33%> (-0.92%)`	⬇️
cwltool/cuda.py	`35.29% <35.29%> (ø)`
cwltool/docker.py	`68.37% <40.00%> (-0.62%)`	⬇️
cwltool/singularity.py	`58.60% <50.00%> (-0.17%)`	⬇️
cwltool/main.py	`66.97% <100.00%> (+0.08%)`	⬆️
job.py	`61.53% <0.00%> (-0.34%)`	⬇️
docker.py	`49.57% <0.00%> (-0.21%)`	⬇️
singularity.py	`52.55% <0.00%> (-0.05%)`	⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62fe629...6e4f428. Read the comment docs.

cwltool/cuda.py

mr-c

Huzzah! Can we have some tests? I'll try to find an CUDA box someplace to run this

mr-c

The singularity version of the tests passed for me on the BAZIS cluster: https://libguides.vu.nl/rdm/high-performance-computing

tetron · 2022-01-03T17:15:34Z

Maybe add a comment that nvidia-smi returns non-zero if there is no GPU detected?

I'm not entirely sure what it does, but it seems reasonable to assume that either it'll exit non-zero, or it will return XML with <attached_gpus>0</attached_gpus> and in either case it'll be detected as CUDA not available.

lgtm-com · 2022-01-03T17:22:51Z

This pull request introduces 1 alert when merging 06992d9 into 62fe629 - view on LGTM.com

new alerts:

1 for Unused import

mr-c · 2022-01-03T17:45:05Z

Maybe add a comment that nvidia-smi returns non-zero if there is no GPU detected?

I'm not entirely sure what it does, but it seems reasonable to assume that either it'll exit non-zero, or it will return XML with <attached_gpus>0</attached_gpus> and in either case it'll be detected as CUDA not available.

Yes. I was referring to the nvidia-smi invocation within the tests themselves: https://github.com/common-workflow-language/cwltool/pull/1581/files#diff-3dac8debea16928fd49923af94fa8e1af78788a1c44d86eb3b6112bbc7516993R11

tetron · 2022-01-03T18:23:47Z

Oh, I see what you mean. Yes, I'm trying to confirm what it does if it has the libraries but doesn't find any devices.

tetron · 2022-01-03T18:45:44Z

It's annoying because it turns out nvidia-smi is one of the things that gets injected from the host system into the container, so the standard CUDA runtime container doesn't actually have it.

lgtm-com · 2022-01-03T19:06:34Z

This pull request introduces 1 alert when merging 6e4f428 into 62fe629 - view on LGTM.com

new alerts:

1 for Unused import

mr-c · 2022-08-29T13:54:04Z

FYI, the proposal here was updated in #1629

The differences being

cudaComputeCapabilityMin is now cudaComputeCapability: a single value is a minimum capability, or if a list of values is specifed then only GPUs with compute capabilities in the list should be considered.
deviceCountMin was renamed cudaDeviceCountMin and can also be an expression in addition to an int. Likewise for deviceCountMax becoming cudaDeviceCountMax

tetron added 5 commits December 22, 2021 15:20

Initial support CUDARequirement extension

9c34d1a

Add CUDARequirement to supported requirement list

151d757

Fix typo

85cf081

Fix --gpus

ad9ab08

Add to other extensions.yml

42f6a09

tetron added 4 commits December 22, 2021 15:56

Fix using extension schema for 1.2, log CUDA check errors

5be1dec

Mypy & format fixes

2b6b015

Suppress bandit warnings related to invoking nvidia-smi

f96e680

Simpler with check_output?

6306664

mr-c reviewed Dec 22, 2021

View reviewed changes

cwltool/cuda.py Show resolved Hide resolved

tetron marked this pull request as ready for review December 22, 2021 23:20

tetron added 2 commits December 23, 2021 11:17

Merge branch 'main' into cuda-requirement

cde99ff

Merge branch 'main' into cuda-requirement

1703632

mr-c requested changes Jan 3, 2022

View reviewed changes

Add CUDA test

22e2c29

mr-c approved these changes Jan 3, 2022

View reviewed changes

Move cuda check into job _setup

06992d9

Add note about nvidia-smi behavior.

6e4f428

tetron enabled auto-merge (squash) January 3, 2022 18:55

tetron merged commit cbc79e0 into main Jan 3, 2022

tetron deleted the cuda-requirement branch January 3, 2022 20:10

mr-c mentioned this pull request Jan 5, 2022

CWL: Implement the new cwltool:CUDARequirement resource requirement DataBiosphere/toil#3982

Closed

mr-c mentioned this pull request Jan 17, 2022

GPU support (discussion) common-workflow-language/common-workflow-language#587

Open

fmigneault mentioned this pull request Jan 17, 2022

Support GPU enabled process execution crim-ca/weaver#104

Closed

mvdbeek mentioned this pull request May 30, 2022

Adding GPU support to Galaxy galaxyproject/galaxy#12775

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Initial support CUDARequirement extension #1581

Initial support CUDARequirement extension #1581

Uh oh!

tetron commented Dec 22, 2021 •

edited

Loading

Uh oh!

codecov bot commented Dec 22, 2021 •

edited

Loading

Uh oh!

Uh oh!

mr-c left a comment

Uh oh!

mr-c left a comment

Uh oh!

tetron commented Jan 3, 2022

Uh oh!

lgtm-com bot commented Jan 3, 2022

Uh oh!

mr-c commented Jan 3, 2022

Uh oh!

tetron commented Jan 3, 2022

Uh oh!

tetron commented Jan 3, 2022

Uh oh!

lgtm-com bot commented Jan 3, 2022

Uh oh!

mr-c commented Aug 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Initial support CUDARequirement extension #1581

Initial support CUDARequirement extension #1581

Uh oh!

Conversation

tetron commented Dec 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

mr-c left a comment

Choose a reason for hiding this comment

Uh oh!

mr-c left a comment

Choose a reason for hiding this comment

Uh oh!

tetron commented Jan 3, 2022

Uh oh!

lgtm-com bot commented Jan 3, 2022

Uh oh!

mr-c commented Jan 3, 2022

Uh oh!

tetron commented Jan 3, 2022

Uh oh!

tetron commented Jan 3, 2022

Uh oh!

lgtm-com bot commented Jan 3, 2022

Uh oh!

mr-c commented Aug 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tetron commented Dec 22, 2021 •

edited

Loading

codecov bot commented Dec 22, 2021 •

edited

Loading