Skip to content

SnoopPrecompile with pkgimages chokes on non-native code #338

@maleadt

Description

@maleadt

I was trying out SnoopPrecompile.jl with CUDA.jl, on Julia 1.9, doing some minimal kernel compilation during precompilation:

@precompile_setup let
    @precompile_all_calls begin
        target = PTXCompilerTarget(; cap=v"7.5.0")
        params = CUDACompilerParams()
        job = CompilerJob(target, FunctionSpec(identity, Tuple{Nothing}, true), params)
        GPUCompiler.code_native(devnull, job)
    end
end

This results in an LLVM-related abort when Julia writes out the package image:

LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.membar.sys

[53782] signal (6.-6): Aborted
in expression starting at none:0
unknown function (ip: 0x7fd1bcc9564c)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel15CannotYetSelectEPNS_6SDNodeE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel16SelectCodeCommonEPNS_6SDNodeEPKhj at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel6SelectEPN4llvm6SDNodeE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel22DoInstructionSelectionEv at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE.part.975 at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel20runOnMachineFunctionERN4llvm15MachineFunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
operator() at /home/tim/Julia/src/julia/src/aotcompile.cpp:698
jl_dump_native_impl at /home/tim/Julia/src/julia/src/aotcompile.cpp:710
ijl_write_compiler_output at /home/tim/Julia/src/julia/src/precompile.c:126
ijl_atexit_hook at /home/tim/Julia/src/julia/src/init.c:258
jl_repl_entrypoint at /home/tim/Julia/src/julia/src/jlapi.c:718
main at /home/tim/Julia/src/julia/cli/loader_exe.c:59
unknown function (ip: 0x7fd1bcc3028f)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
_start at /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115
Allocations: 66463518 (Pool: 66446875; Big: 16643); GC: 87

It looks like Julia is trying to generate host-native code for GPU-only functionality here. After discussing this with @vchuravy, we think this happens because SnoopPrecompile.jl tracks code that's inferred, which includes GPU-code, and queues that up for precompilation. Setting verbose[] = true does indeed show that it compiles GPU-only functionality:

MethodInstance for CUDA.signal_exception()

That function is implemented here, https://github.com/JuliaGPU/CUDA.jl/blob/3d1670c9fe0bd12fb5d44e8427ab50d5f85a3d6a/src/device/runtime.jl#L35-L47, calling threadfence_system() which in turn is implemented using the llvm.nvvm.membar.sys intrinsic.

I guess that we somehow should avoid this code from getting in the pkgimage, for now. Normally we avoid polluting host caches with GPU code by using a custom AbstractInterpreter, and registering that to codegen using the lookup codegen-parameter. Maybe some property derived from this needs to be added to the data in Core.Compiler.Timings._timings so that SnoopPrecompile can decide to skip this code?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions