Skip to content

Conversation

abhilash1910
Copy link
Contributor

@abhilash1910 abhilash1910 commented Aug 26, 2025

Description

Abstract : The cuda/bindings backend of Cuda python has NVVM support through libnvvm api . However the frontend of cuda python does not support nvvm ir as input source. Since cuda python allows users to leverage a "pythonic dsl" format for writing the host code (taking care of launch parameters etc), it makes sense to also allow NVVM IR as an alternative input to the already included list of inputs {ptx, c++, lto ir} etc.

Discussion Link: #906

Fix #452

Changes made {to be made} in this PR:

  • Added cuda core linkage to cuda bindings nvvm counterpart
  • Cosmetic changes in user interface to use existing nvvm backend of cuda bindings.

Checklist

  • [ TBD ] New tests needed to be added to cover these changes.
  • [ TBD ] The documentation needs to be updated with these changes.

cc @leofang

Copy link
Contributor

copy-pr-bot bot commented Aug 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@abhilash1910 abhilash1910 marked this pull request as draft August 26, 2025 13:51
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low-level review: Apart from the bare except, this looks good to me.

I defer to @leofang for the high-level take.

@leofang leofang self-requested a review August 26, 2025 17:48
@leofang leofang added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Aug 26, 2025
@leofang leofang added this to the cuda.core beta 7 milestone Aug 26, 2025
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @abhilash1910, left some quick comments, will circle back later.

@abhilash1910 abhilash1910 marked this pull request as ready for review September 1, 2025 17:34
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @abhilash1910! I have reviewed the PR including the tests.

btw please also fix the linter errors. You can check them locally via pre-commit run -a.

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abhilash1910! Looks very good! A few minor comments for completeness. Let me trigger the CI in the meanwhile.

@abhilash1910
Copy link
Contributor Author

pre-commit.ci autofix

@leofang
Copy link
Member

leofang commented Sep 17, 2025

/ok to test e5b5ea4

@leofang leofang enabled auto-merge (squash) September 17, 2025 21:54
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at the high-level structure; based on that, and given that we want to do #980: looks good to me.

@leofang
Copy link
Member

leofang commented Sep 17, 2025

All CI is green except for H100, which is known to have an unusual long queue currently (see nv-gha-runners discussion). I am impatient and let me admin-merge before calling a day.

@leofang leofang disabled auto-merge September 17, 2025 22:38
@leofang leofang merged commit fcfeba0 into NVIDIA:main Sep 17, 2025
48 checks passed
@leofang
Copy link
Member

leofang commented Sep 17, 2025

Thanks a lot, @abhilash1910, and also @gmarkall @kkraus14 @rwgk !

@abhilash1910
Copy link
Contributor Author

Thanks a lot @leofang , and @gmarkall @rwgk @kkraus14 for all the reviews. Will follow-up on #981.

@joker-eph
Copy link
Member

Using textual LLVM IR as an input to libNVVM is documented as deprecated, so I'm quite concerned that cuda-python is adding a new usage of this.

Another issue is that LLVM textual assembly format is more unstable than the bitcode and has no backward compatibility guarantee (contrario to the LLVM bitcode), which also likely why this was all deprecated in libNVVM.
I would think that this would be documented as, and restricted, to LLVM bitcode input only here.

Even with LLVM bitcode, there is a quite large issue of underlying compatibility with the libNVVM version: contrary to the analogy with C++ and NVRTC or PTX: the LLVM IR isn't versioned in the same way across cuda versions.

@leofang
Copy link
Member

leofang commented Sep 25, 2025

Hi @joker-eph:

Using textual LLVM IR as an input to libNVVM is documented as deprecated, so I'm quite concerned that cuda-python is adding a new usage of this.

  1. libNVVM has been stating that the text IR is deprecated for several years, but we have not received any notice that it'd be actually removed
  2. if it is removed now, numba-cuda will break right away
  3. I think both bc and text formats are already supported by Program through this PR, because the underlying binding does not care which format that a sequence of Python bytes contains. I did tell Abhilash offline that it’s better to get bc tested as well. I think this ball was dropped along the way.

I would think that this would be documented as, and restricted, to LLVM bitcode input only here.

I am unable to parse this sentence 😛 Could you elaborate?

Even with LLVM bitcode, there is a quite large issue of underlying compatibility with the libNVVM version: contrary to the analogy with C++ and NVRTC or PTX: the LLVM IR isn't versioned in the same way across cuda versions.

This is understood. See how we generate compatible IR in the test and also this thread.

@joker-eph
Copy link
Member

libNVVM has been stating that the text IR is deprecated for several years, but we have not received any notice that it'd be actually removed

I know that, I'm not sure how that addresses my comment though.

if it is removed now, numba-cuda will break right away

Why is numba-cuda using textual IR instead of encoding to bitcode?

I am unable to parse this sentence 😛 Could you elaborate?

Two parts to my sentence:

  1. The documentation for this API should be "This is expected to use with bitcode"
  2. The code could enforce that we only use bitcode.

This is understood. See how we generate compatible IR in the test and also #907 (comment).

This is understood by you maybe, what about the user that gets exposed to some unsafe APIs and that we may break in very subtle ways with future updates?
My concerns is that there is huge footgun hidden in there, and that it isn't a good API to add at all.

@leofang
Copy link
Member

leofang commented Oct 2, 2025

if it is removed now, numba-cuda will break right away

Why is numba-cuda using textual IR instead of encoding to bitcode?

The short answer is that it needs to patch LLVM text IR. It'd be better if we move this conversation to either the NVIDIA/numba-cuda repo, or the internal numba dev channel, happy to continue elsewhere. It is irrelevant here.

  1. The documentation for this API should be "This is expected to use with bitcode"
  2. The code could enforce that we only use bitcode.

Ah ok, thanks. 1 can be added I think, with a note that the text IR is deprecated upstream. 2 is not possible as already explained earlier (we can't tell if the user provides text or bitcode IR, without leaking the magic header in the public).

This is understood by you maybe, what about the user that gets exposed to some unsafe APIs and that we may break in very subtle ways with future updates? My concerns is that there is huge footgun hidden in there, and that it isn't a good API to add at all.

Our mission is to offer pythonic access to all CUDA components such that whatever users can do in C/C++, then can also do so without leaving Python. Unless I misunderstood what you meant, to me it sounds like the concern is "we should not make it easy to access libNVVM in Python," if so I'd wholeheartedly disagree 🙂

@joker-eph
Copy link
Member

with a note that the text IR is deprecated upstream.

Why "upstream"? In general I use "upstream" to refer to LLVM codebase, but are you referring to libNVVM? This is weird to me to refer to the underlying library exposed here as "upstream": it the same product we ship and code-python should just expose it to python IMO.

(we can't tell if the user provides text or bitcode IR, without leaking the magic header in the public).

I don't quite understand what you mean by "leaking the magic header"? Checking if the input is bitcode seem like a trivial check to me: https://github.com/llvm/llvm-project/blob/04c01ff144a172230c053d73eb15831a4120db81/llvm/include/llvm/Bitcode/BitcodeReader.h#L244-L274

Our mission is to offer pythonic access to all CUDA components such that whatever users can do in C/C++, then can also do so without leaving Python. Unless I misunderstood what you meant, to me it sounds like the concern is "we should not make it easy to access libNVVM in Python," if so I'd wholeheartedly disagree 🙂

You are clearly misrepresenting what I wrote: there is a difference between "exposing python access to all CUDA components" and providing direct footguns to users. This is an important part of API design is to understand these footguns and think the API to avoid them. Just the fact that you use "pythonic" shows that you're already ready to deviate from just directly binding and exposing a "raw" direct access to anything: this should be about "feature" instead.
More importantly you're again putting aside the wrinkle that this is deprecated (and this was introduced at a time where libNVVM has a single version of LLVM as input, the fact that it is now a moving target is entirely new).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support NVVM IRs as input to Program

6 participants