-
Notifications
You must be signed in to change notification settings - Fork 189
Open
Labels
bugSomething isn't workingSomething isn't workingtriageNeeds the team's attentionNeeds the team's attention
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Runtime Error
Component
cuda.bindings
Describe the bug
Hi team,
I tested cuda.bindings.runtime.cudaLaunchHostFunc
, it seemed not working well with CUDA Graph. However, the C++ cudaLaunchHostFunc
can work with CUDA Graph.
Similar to cupy/cupy#9274, the host function can be captured by CUDA graph, but when replying CUDA graph, only the first reply can call the host function, while more replays cannot. Is this expected?
Please see the repro steps below. Thanks!
How to Reproduce
import ctypes
import cuda.bindings.runtime as cudart
import torch
class Struct(ctypes.Structure):
_fields_ = [
("a", ctypes.c_int),
]
def hostfunc(userData):
data = Struct.from_address(userData)
print(f"Hello, World! {data.a}")
return 0
HostFn_t = ctypes.PYFUNCTYPE(ctypes.c_int, ctypes.c_void_p)
def main():
data = Struct(a=1)
# ctypes is managing the pointer value for us
c_hostfunc = HostFn_t(hostfunc)
cuda_hostfunc = cudart.cudaHostFn_t(_ptr=ctypes.addressof(c_hostfunc))
# Run
stream = torch.cuda.Stream()
cudart_stream = cudart.cudaStream_t(stream.cuda_stream)
g = torch.cuda.CUDAGraph()
with torch.cuda.graph(g, stream=stream):
(err, ) = cudart.cudaLaunchHostFunc(cudart_stream, cuda_hostfunc,
ctypes.addressof(data))
assert err == cudart.cudaError_t.cudaSuccess
torch.cuda.synchronize()
print("Graph captured", flush=True)
with torch.cuda.stream(stream):
for i in range(10):
g.replay()
torch.cuda.synchronize()
if __name__ == "__main__":
main()
Expected behavior
The hostfunc should be called for 10 times. However, only the first time is successful:
Graph captured
Hello, World! 1
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2e906)
Segmentation fault (core dumped)
Operating System
Ubuntu 24.04.2 LTS
nvidia-smi output
Mon Aug 4 10:10:28 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H20 On | 00000000:D1:00.0 Off | 0 |
| N/A 37C P0 73W / 500W | 0MiB / 97871MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H20 On | 00000000:DF:00.0 Off | 0 |
| N/A 39C P0 76W / 500W | 0MiB / 97871MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingtriageNeeds the team's attentionNeeds the team's attention
Type
Projects
Status
Todo