Skip to content

Conversation

ZuseZ4
Copy link
Member

@ZuseZ4 ZuseZ4 commented Aug 22, 2025

LLVM's offload functionality usually expects an extra dyn_ptr argument. We could avoid it,b ut likely gonna need it very soon in one of the follow-up PRs (e.g. to request shared memory). So we might as well already add it.

This PR adds a %dyn_ptr ptr to GPUKernel ABI functions, if the offload feature is enabled.

WIP

r? @ghost

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 22, 2025
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Aug 22, 2025

@oli-obk Probably stupid question, but where do set parameter names?
When lowering any GPUKernel Function (including amdgpu_kernel) I add one ptr argument, so for a one-ptr function I instead generate

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
define amdgpu_kernel void @kernel_1(ptr noundef writeonly captures(none) %x, ptr noundef readnone captures(none) %0) unnamed_addr #0 {
start:
  %1 = addrspacecast ptr %x to ptr addrspace(1)
  store double 2.100000e+01, ptr addrspace(1) %1, align 8
  ret void
}

but want to generate
define amdgpu_kernel void @kernel_1(ptr noundef writeonly captures(none) %0, ptr noundef readnone captures(none) %x) (aka ignore the first argument). My PR (asdf commit) just copies the first arg and the last arg get's a generic name, so I assume we have a list of names on the rust side, which we match starting at the first arg. However, I couldn't find a relevant location where we're calling LLVM functions on the rust side to set those names. Do you have any ideas?

Edit: I decided I'll just do the full rewrite of the module on llvm-ir level instead of MIR, since that's what I know best (and seathlin mentioned MIR is probably too low eithr way). I'll add more details later

@rust-log-analyzer
Copy link
Collaborator

The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
[TIMING:end] tool::Tidy { compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target: x86_64-unknown-linux-gnu } -- 0.000
fmt check
Diff in /checkout/compiler/rustc_target/src/callconv/mod.rs:577:
 ///
 /// The signature represented by this type may not match the MIR function signature.
 /// Certain attributes, like `#[track_caller]` can introduce additional arguments, which are present in [`FnAbi`], but not in `FnSig`.
-/// The std::offload module also adds an addition dyn_ptr argument to the GpuKernel ABI. 
+/// The std::offload module also adds an addition dyn_ptr argument to the GpuKernel ABI.
 /// While this difference is rarely relevant, it should still be kept in mind.
 ///
 /// I will do my best to describe this structure, but these
Diff in /checkout/compiler/rustc_codegen_llvm/src/mono_item.rs:5:
 use rustc_middle::bug;
 use rustc_middle::mir::mono::Visibility;
 use rustc_middle::ty::layout::{FnAbiOf, HasTypingEnv, LayoutOf};
-use rustc_middle::ty::{self, Ty, TyCtxt,Instance, TypeVisitableExt};
+use rustc_middle::ty::{self, Instance, Ty, TyCtxt, TypeVisitableExt};
 use rustc_session::config::CrateType;
+use rustc_target::callconv::FnAbi;
 use rustc_target::spec::RelocModel;
 use tracing::debug;
 
Diff in /checkout/compiler/rustc_codegen_llvm/src/mono_item.rs:13:
-use rustc_target::callconv::FnAbi;
-
 use crate::context::CodegenCx;
 use crate::errors::SymbolAlreadyDefined;
 use crate::type_of::LayoutLlvmExt;
Diff in /checkout/compiler/rustc_codegen_llvm/src/mono_item.rs:18:
 use crate::{base, llvm};
 
-fn my_fn_abi<'tcx>(abi: &FnAbi<'tcx, Ty<'tcx>> ) -> FnAbi<'tcx, Ty<'tcx>> {
+fn my_fn_abi<'tcx>(abi: &FnAbi<'tcx, Ty<'tcx>>) -> FnAbi<'tcx, Ty<'tcx>> {
     let mut myABI = abi.clone();
     dbg!(&myABI.args);
     let val = myABI.clone().args[0].clone();
Diff in /checkout/compiler/rustc_codegen_llvm/src/mono_item.rs:25:
     dbg!(&myABI.args);
     myABI
 }
-
 
 impl<'tcx> PreDefineCodegenMethods<'tcx> for CodegenCx<'_, 'tcx> {
     fn predefine_static(
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/write.rs:655:
 
     fn handle_offload(m: &llvm::Module, llcx: &llvm::Context, old_fn: &llvm::Value) {
         dbg!(&old_fn);
-        unsafe {llvm::LLVMRustOffloadWrapper(m, old_fn)};
+        unsafe { llvm::LLVMRustOffloadWrapper(m, old_fn) };
         //unsafe {llvm::LLVMDumpModule(m);}
         //unsafe {
         //    // Get the old function type
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/write.rs:690:
         //    let new_fn_cstr = CString::new(new_fn_name).unwrap();
         //    let new_fn = llvm::LLVMAddFunction(m, new_fn_cstr.as_ptr(), new_fn_ty);
         //    dbg!(&new_fn);
-        //    let a0 = llvm::LLVMGetParam(new_fn, 0); 
+        //    let a0 = llvm::LLVMGetParam(new_fn, 0);
         //    llvm::LLVMSetValueName2(a0, b"dyn_ptr\0".as_ptr().cast(), "dyn_ptr".len());
         //    dbg!(&new_fn);
 
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/write.rs:725:
     for num in 0..9 {
         let name = format!("kernel_{num}");
         let c_name = CString::new(name).unwrap();
-        if let Some(kernel) = unsafe {llvm::LLVMGetNamedFunction(module.module_llvm.llmod()   , c_name.as_ptr())} {
+        if let Some(kernel) =
+            unsafe { llvm::LLVMGetNamedFunction(module.module_llvm.llmod(), c_name.as_ptr()) }
+        {
             dbg!("found offload kernel asfd");
-            handle_offload(module.module_llvm.llmod(), module.module_llvm.llcx,    kernel);
+            handle_offload(module.module_llvm.llmod(), module.module_llvm.llcx, kernel);
         }
     }
 
Diff in /checkout/compiler/rustc_codegen_llvm/src/llvm/ffi.rs:1249:
 
     // Operations on functions
     pub(crate) fn LLVMSetFunctionCallConv(Fn: &Value, CC: c_uint);
-    pub(crate) fn LLVMAddFunction<'a>(Mod: &'a Module, Name: *const c_char, FunctionTy: &'a Type) -> &'a Value;
+    pub(crate) fn LLVMAddFunction<'a>(
+        Mod: &'a Module,
+        Name: *const c_char,
+        FunctionTy: &'a Type,
+    ) -> &'a Value;
     pub(crate) fn LLVMDeleteFunction(Fn: &Value);
 
     // Operations about llvm intrinsics
Diff in /checkout/compiler/rustc_codegen_llvm/src/llvm/ffi.rs:1948:
     ) -> &Attribute;
 
     // Operations on functions
-    pub(crate) fn LLVMRustOffloadWrapper<'a>(
-        M: &'a Module,
-        Fn: &'a Value,
-    );
+    pub(crate) fn LLVMRustOffloadWrapper<'a>(M: &'a Module, Fn: &'a Value);
     pub(crate) fn LLVMRustGetOrInsertFunction<'a>(
         M: &'a Module,
         Name: *const c_char,
Diff in /checkout/compiler/rustc_codegen_llvm/src/callee.rs:75:
             llfn
         } else {
             //if Some(fn_abi) = fn_abi {
-                if fn_abi.conv == rustc_abi::CanonAbi::GpuKernel {
-                    dbg!("found gpu fn!");
-                }
+            if fn_abi.conv == rustc_abi::CanonAbi::GpuKernel {
+                dbg!("found gpu fn!");
+            }
             //}
             cx.declare_fn(sym, fn_abi, Some(instance))
         };
fmt: checked 6340 files
Bootstrap failed while executing `test --stage 0 src/tools/tidy tidyselftest --extra-checks=py,cpp,js,spellcheck`
Build completed unsuccessfully in 0:00:48
  local time: Wed Aug 27 00:37:33 UTC 2025

@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Aug 27, 2025

I'll clean this up further later to minimze the amount of c++, but I lost a bit of patience with LLVM's C API, so I just did 100% of the work with the C++ API. With this and the previous (review ready) patch, Rust's amdgcn target runs on a GPU, without manual LLVM-IR rewriting. I'll port it back from C++ to Rust.
This should also run on Nvidia or Intel GPUs, I just didn't got to test it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants