-
Notifications
You must be signed in to change notification settings - Fork 52
Description
I was trying out SnoopPrecompile.jl with CUDA.jl, on Julia 1.9, doing some minimal kernel compilation during precompilation:
@precompile_setup let
@precompile_all_calls begin
target = PTXCompilerTarget(; cap=v"7.5.0")
params = CUDACompilerParams()
job = CompilerJob(target, FunctionSpec(identity, Tuple{Nothing}, true), params)
GPUCompiler.code_native(devnull, job)
end
endThis results in an LLVM-related abort when Julia writes out the package image:
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.membar.sys
[53782] signal (6.-6): Aborted
in expression starting at none:0
unknown function (ip: 0x7fd1bcc9564c)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel15CannotYetSelectEPNS_6SDNodeE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel16SelectCodeCommonEPNS_6SDNodeEPKhj at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel6SelectEPN4llvm6SDNodeE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel22DoInstructionSelectionEv at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE.part.975 at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel20runOnMachineFunctionERN4llvm15MachineFunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
operator() at /home/tim/Julia/src/julia/src/aotcompile.cpp:698
jl_dump_native_impl at /home/tim/Julia/src/julia/src/aotcompile.cpp:710
ijl_write_compiler_output at /home/tim/Julia/src/julia/src/precompile.c:126
ijl_atexit_hook at /home/tim/Julia/src/julia/src/init.c:258
jl_repl_entrypoint at /home/tim/Julia/src/julia/src/jlapi.c:718
main at /home/tim/Julia/src/julia/cli/loader_exe.c:59
unknown function (ip: 0x7fd1bcc3028f)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
_start at /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115
Allocations: 66463518 (Pool: 66446875; Big: 16643); GC: 87
It looks like Julia is trying to generate host-native code for GPU-only functionality here. After discussing this with @vchuravy, we think this happens because SnoopPrecompile.jl tracks code that's inferred, which includes GPU-code, and queues that up for precompilation. Setting verbose[] = true does indeed show that it compiles GPU-only functionality:
MethodInstance for CUDA.signal_exception()
That function is implemented here, https://github.com/JuliaGPU/CUDA.jl/blob/3d1670c9fe0bd12fb5d44e8427ab50d5f85a3d6a/src/device/runtime.jl#L35-L47, calling threadfence_system() which in turn is implemented using the llvm.nvvm.membar.sys intrinsic.
I guess that we somehow should avoid this code from getting in the pkgimage, for now. Normally we avoid polluting host caches with GPU code by using a custom AbstractInterpreter, and registering that to codegen using the lookup codegen-parameter. Maybe some property derived from this needs to be added to the data in Core.Compiler.Timings._timings so that SnoopPrecompile can decide to skip this code?