-
Notifications
You must be signed in to change notification settings - Fork 214
[NVVM IR] NVVM IR Integration #907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Low-level review: Apart from the bare except
, this looks good to me.
I defer to @leofang for the high-level take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @abhilash1910, left some quick comments, will circle back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @abhilash1910! I have reviewed the PR including the tests.
btw please also fix the linter errors. You can check them locally via pre-commit run -a
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @abhilash1910! Looks very good! A few minor comments for completeness. Let me trigger the CI in the meanwhile.
pre-commit.ci autofix |
/ok to test e5b5ea4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only looked at the high-level structure; based on that, and given that we want to do #980: looks good to me.
All CI is green except for H100, which is known to have an unusual long queue currently (see nv-gha-runners discussion). I am impatient and let me admin-merge before calling a day. |
Thanks a lot, @abhilash1910, and also @gmarkall @kkraus14 @rwgk ! |
Using textual LLVM IR as an input to libNVVM is documented as deprecated, so I'm quite concerned that cuda-python is adding a new usage of this. Another issue is that LLVM textual assembly format is more unstable than the bitcode and has no backward compatibility guarantee (contrario to the LLVM bitcode), which also likely why this was all deprecated in libNVVM. Even with LLVM bitcode, there is a quite large issue of underlying compatibility with the libNVVM version: contrary to the analogy with C++ and NVRTC or PTX: the LLVM IR isn't versioned in the same way across cuda versions. |
Hi @joker-eph:
I am unable to parse this sentence 😛 Could you elaborate?
This is understood. See how we generate compatible IR in the test and also this thread. |
I know that, I'm not sure how that addresses my comment though.
Why is numba-cuda using textual IR instead of encoding to bitcode?
Two parts to my sentence:
This is understood by you maybe, what about the user that gets exposed to some unsafe APIs and that we may break in very subtle ways with future updates? |
The short answer is that it needs to patch LLVM text IR. It'd be better if we move this conversation to either the NVIDIA/numba-cuda repo, or the internal numba dev channel, happy to continue elsewhere. It is irrelevant here.
Ah ok, thanks. 1 can be added I think, with a note that the text IR is deprecated upstream. 2 is not possible as already explained earlier (we can't tell if the user provides text or bitcode IR, without leaking the magic header in the public).
Our mission is to offer pythonic access to all CUDA components such that whatever users can do in C/C++, then can also do so without leaving Python. Unless I misunderstood what you meant, to me it sounds like the concern is "we should not make it easy to access libNVVM in Python," if so I'd wholeheartedly disagree 🙂 |
Why "upstream"? In general I use "upstream" to refer to LLVM codebase, but are you referring to libNVVM? This is weird to me to refer to the underlying library exposed here as "upstream": it the same product we ship and code-python should just expose it to python IMO.
I don't quite understand what you mean by "leaking the magic header"? Checking if the input is bitcode seem like a trivial check to me: https://github.com/llvm/llvm-project/blob/04c01ff144a172230c053d73eb15831a4120db81/llvm/include/llvm/Bitcode/BitcodeReader.h#L244-L274
You are clearly misrepresenting what I wrote: there is a difference between "exposing python access to all CUDA components" and providing direct footguns to users. This is an important part of API design is to understand these footguns and think the API to avoid them. Just the fact that you use "pythonic" shows that you're already ready to deviate from just directly binding and exposing a "raw" direct access to anything: this should be about "feature" instead. |
Description
Abstract : The cuda/bindings backend of Cuda python has NVVM support through libnvvm api . However the frontend of cuda python does not support nvvm ir as input source. Since cuda python allows users to leverage a "pythonic dsl" format for writing the host code (taking care of launch parameters etc), it makes sense to also allow NVVM IR as an alternative input to the already included list of inputs {ptx, c++, lto ir} etc.
Discussion Link: #906
Fix #452
Changes made {to be made} in this PR:
Checklist
cc @leofang