From 9780a34fda4a1a3647ba9b6902f3fee4a70b3af3 Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Fri, 13 Dec 2024 16:10:28 +0000 Subject: [PATCH 1/2] add an interoperatibility page --- cuda_core/docs/source/index.rst | 1 + cuda_core/docs/source/interoperability.rst | 83 ++++++++++++++++++++++ 2 files changed, 84 insertions(+) create mode 100644 cuda_core/docs/source/interoperability.rst diff --git a/cuda_core/docs/source/index.rst b/cuda_core/docs/source/index.rst index 15bde57285..19c14c2aec 100644 --- a/cuda_core/docs/source/index.rst +++ b/cuda_core/docs/source/index.rst @@ -10,6 +10,7 @@ and other functionalities. release.md install.md + interoperability.rst api.rst diff --git a/cuda_core/docs/source/interoperability.rst b/cuda_core/docs/source/interoperability.rst new file mode 100644 index 0000000000..42434b9054 --- /dev/null +++ b/cuda_core/docs/source/interoperability.rst @@ -0,0 +1,83 @@ +.. currentmodule:: cuda.core.experimental + +Interoperability +================ + +``cuda.core`` is designed to be interoperable with other Python GPU libraries. Below +we cover a list of possible such scenarios. + + +Current device/context +---------------------- + +The :meth:`Device.set_current` method ensures that the calling host thread has +an active CUDA context set to current. This CUDA context can be seen and accessed +by other GPU libraries without any code change. For libraries built on top of +the CUDA runtime (``cudart``), this is as if ``cudaSetDevice`` is called. + +Since CUDA contexts are per-thread constructs, in a multi-threaded program each +host thread should call this method. + +Conversely, if any GPU library already set a device (or context) to current, this +method ensures that the same device/context is picked up by and shared with +``cuda.core``. + + +``__cuda_stream__`` protocol +---------------------------- + +The :class:`~_stream.Stream` class is a vocabulary type representing CUDA streams +in Python. While we encourage new Python projects to start using streams (and other +CUDA types) from ``cuda.core``, we understand that there are already several projects +exposing their own stream types. + +To address this issue, we propose the ``__cuda_stream__`` protocol (currently version +0) as follows: For any Python objects that are meant to be interpreted as a stream, they +should add a ``__cuda_stream__`` attribute that returns a 2-tuple: The version number +(``0``) and the address of ``cudaStream_t``: + +.. code-block:: python + + class MyStream: + + @property + def __cuda_stream__(self): + return (0, self.ptr) + + ... + +Then such objects can be understood by ``cuda.core`` anywhere a stream-like object +is needed. + +We suggest all existing Python projects that expose a stream class to also support this +protocol wherever a function takes a stream. + + +Memory view utilities for CPU/GPU buffers +----------------------------------------- + +The Python community has defined protocols such as CUDA Array Interface (CAI) [1]_ and DLPack +[2]_ (part of the Python array API standard [3]_) for facilitating zero-copy data exchange +between two GPU projects. In particular, performance considerations prompted the protocol +designs gearing toward *stream-ordered* operations so as to avoid unnecessary synchronizations. +While the designs are robust, *implementing* such protocols can be tricky and often requires +a few iterations to ensure correctness. + +``cuda.core`` offers a :func:`~utils.args_viewable_as_strided_memory` decorator for +extracting the metadata (such as pointer address, shape, strides, and dtype) from any +Python objects supporting either CAI or DLPack and returning a :class:`~utils.StridedMemoryView` object, see the +`strided_memory_view.py `_ +example. Alternatively, a :class:`~utils.StridedMemoryView` object can be explicitly +constructed without using the decorator. This provides a *concrete implementation* to both +protocols that is **array-library-agnostic**, so that all Python projects can just rely on this +without either re-implementing (the consumer-side of) the protocols or tying to any particular +array libraries. + +The :attr:`~utils.StridedMemoryView.is_device_accessible` attribute can be used to check +whether or not the underlying buffer can be accessed on GPU. + +.. rubric:: Footnotes + +.. [1] https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html +.. [2] https://dmlc.github.io/dlpack/latest/python_spec.html +.. [3] https://data-apis.org/array-api/latest/design_topics/data_interchange.html From c06270c92b4a779314a029ac75c1645ab242b00c Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Fri, 13 Dec 2024 13:36:47 -0500 Subject: [PATCH 2/2] apply review suggestions --- cuda_core/docs/source/interoperability.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/cuda_core/docs/source/interoperability.rst b/cuda_core/docs/source/interoperability.rst index 42434b9054..3bcdbe6807 100644 --- a/cuda_core/docs/source/interoperability.rst +++ b/cuda_core/docs/source/interoperability.rst @@ -13,12 +13,13 @@ Current device/context The :meth:`Device.set_current` method ensures that the calling host thread has an active CUDA context set to current. This CUDA context can be seen and accessed by other GPU libraries without any code change. For libraries built on top of -the CUDA runtime (``cudart``), this is as if ``cudaSetDevice`` is called. +the `CUDA runtime `_, +this is as if ``cudaSetDevice`` is called. Since CUDA contexts are per-thread constructs, in a multi-threaded program each host thread should call this method. -Conversely, if any GPU library already set a device (or context) to current, this +Conversely, if any GPU library already sets a device (or context) to current, this method ensures that the same device/context is picked up by and shared with ``cuda.core``. @@ -34,7 +35,7 @@ exposing their own stream types. To address this issue, we propose the ``__cuda_stream__`` protocol (currently version 0) as follows: For any Python objects that are meant to be interpreted as a stream, they should add a ``__cuda_stream__`` attribute that returns a 2-tuple: The version number -(``0``) and the address of ``cudaStream_t``: +(``0``) and the address of ``cudaStream_t`` (both as Python `int`): .. code-block:: python