Thoughts on OpenGL/CUDA interop #16
Replies: 1 comment
-
| So, I did implement this, and the results really speak for themselves. I'll post a new Twitter video soon (after I get instant previews working), but the "stickiness" of the panning-around interaction during previews in Blender is instantly remedied. Check out these excerpts: ^ the new CPURenderBuffer makes use of page-locked memory via  ^ this copies data from GPU to CPU on a background thread, and prepares it for use by OpenGL ^ Now, in OpenGL we can call a  Unfortunately, we are doing a roundtrip from GPU -> CPU -> GPU, and there are going to be bandwidth limitations of this process.  However, the user experience is much better than with  This could be further improved to only copy data for areas that have been changed since last time. And beyond this, hopefully Vulkan will let us just write data into a texture directly with CUDA... | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently in the
open_for_cuda_accessmethod ofOpenGLRenderSurfacewe can see the CUDA/OpenGL interop happening. Notably, these lines here happen once every timedraw()is called by the custom Blender Renger Engine.https://github.com/JamesPerlman/NeRFRenderCore/blob/050ddf33f5577d6a25e9c23c2a994699b0dd080b/src/render-targets/opengl-render-surface.cuh#L102-L107
I have noticed that these specific calls to
cudaGraphicsMapResourcesandcudaGraphicsUnmapResourcescause noticeable lag, which is partially alleviated by callingglFlush()andglFinish()beforehand, but it is still a rather obnoxious thing that I'd like to get rid of. From the CUDA docs:Perhaps this is just a limitation of CUDA/OpenGL interoperability, perhaps I've set up my OpenGL buffers wrong, or perhaps CUDA is just waiting on Blender to finish some OpenGL work before that buffer can be mapped. There are alternative approaches I have tried, like setting up a surface to write to the texture directly (still laggy), and potentially even more approaches like setting up an offscreen renderbuffer which I have not tried yet. I have observed that the smaller the buffer, the lower the lag. I'm not sure what conclusions to draw from that.
Anyway, I have some ideas and I'm going to use this thread as a journal. Please feel free to pitch in if you have any ideas or experience with this.
My intention with using CUDA/OpenGL interop was to avoid doing a round trip from the GPU to the CPU and back to the GPU. I assumed that CUDA could simply blit the data buffer into another buffer managed by OpenGL, keeping all the data on the GPU. Perhaps it can through some obscure and undocumented manner, but with Blender deprecating OpenGL soon I might not need to worry about that for much longer.
For now, since we can't write to an OpenGL buffer on a background thread, there is a potential approach that may work. We can copy the data from CUDA to the host via a background thread, and then when the data is ready we can
tag_redraw()to trigger OpenGL to draw, and then reupload the image to the GPU. There should be no lag from CUDA, and OpenGL should take care of the copying without noticeably locking the UI like it currently does.The downside is this is inefficient from an architectural standpoint, as it does a roundtrip from GPU to CPU and back to GPU (which seems like it should be entirely avoidable), but I just don't know how much effort I want to spend learning the finer points of OpenGL/CUDA interop when OpenGL is being deprecated faster than it takes to train a NeRF. The upside is it will probably work and it's a stopgap measure to reduce visual lag in Blender previews until Vulkan becomes available to use instead.
Beta Was this translation helpful? Give feedback.
All reactions