Skip to content

Conversation

@Spiri0
Copy link
Contributor

@Spiri0 Spiri0 commented Jul 23, 2025

Related issue: #30982

This small extension allows you to control the dispatchSize with the GPU.
The benefit of GPU-side dispatchSize control lies in regressions.

Regressions aren't feasible on the GPU. They require multiple computes. Here's a clear example. My voxelizer. Here I'm voxelizing the BlackPearl. This is incredibly fast with compute shaders. Surface voxels are green. I also voxel the volume with yellow voxels ( important for buoyancy ), and the floodfill mechanism requires 52 iterations for this. With adaptive dispatchSize, the dispatchSize can always be adjusted to the remaining number of voxels.

image image

This is also important for BVH if you want to do it with the GPU because they also require regressions.

The extension was pleasingly simple. You just have to follow the WebGPU guidelines:
No read/write in the same pass. However, it's logical that a compute shader can't compute its own dispatchSize. This requires a separate compute that runs with normal dispatch. Since this usually only requires a count of 1, it's definitely worth it for regressions. Like in Unreal and Unity

// Setup
dispatchBuffer = new IndirectStorageBufferAttribute( new Uint32Array( 3 ), 1 );

computeDispatchShader = Fn or wgslFn

computeDispatchSize = computeDispatchShader( computeDispatchShaderParams ).compute( 1 );


// In the renderloop
renderer.compute( computeDispatchSize );
renderer.compute( someComputeNode, dispatchBuffer );

@github-actions
Copy link

github-actions bot commented Jul 23, 2025

📦 Bundle size

Full ESM build, minified and gzipped.

Before After Diff
WebGL 350.11
84.81
350.11
84.81
+0 B
+0 B
WebGPU 604.76
169.55
604.78
169.56
+18 B
+10 B
WebGPU Nodes 603.37
169.31
603.39
169.32
+18 B
+13 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Before After Diff
WebGL 481.77
119.56
481.77
119.56
+0 B
+0 B
WebGPU 674.14
185.02
674.15
185.03
+17 B
+11 B
WebGPU Nodes 616.13
168.25
616.15
168.26
+17 B
+12 B

@sunag
Copy link
Collaborator

sunag commented Jul 24, 2025

The dispatchSizeOrCount property seems to no longer be compatible with its parameter overloading, it would be better to rename it, perhaps to just dispatchSize and update the js-doc.

Could you include for compute( dispatchSize) too?

@sunag sunag changed the title Introduce dispatchWorkgroupsIndirect WebGPURenderer: Introduce dispatchWorkgroupsIndirect Jul 24, 2025
@Spiri0
Copy link
Contributor Author

Spiri0 commented Jul 25, 2025

The idea of calling it dispatchSize again also occurred to me, it's simply more general for everything that contains the size, whether array or buffer.

Could you include for compute( dispatchSize) too?

Please excuse me, sometimes I'm a little slow on the uptake.
Do you mean whether I can implement it so that a compute writes its own dispatchSize?
This would lead to a read/write conflict in conjunction with dispatchWorkgroupsIndirect, which reads in the same pass.
So a compute cannot write its own dispatch. However, it could write its own dispatch for the next iteration into a buffer and this can be copied into its indirectDispach using copyBufferToBuffer before starting the next compute iteration.

@sunag
Copy link
Collaborator

sunag commented Jul 25, 2025

Hmm... I was referring to this: #31026

Maybe we can replace .compute( count ) with .compute( dispatchSize ) too. Since the parameter is overloaded

@Spiri0
Copy link
Contributor Author

Spiri0 commented Jul 29, 2025

The red line in the left image is the intersection of the dynamic ocean geometry with the camera's near plane. This intersection cannot be determined in one compute. However, it can be computed precisely and efficiently using 4 differnet computes. You can, of course, also work with returns in the individual shaders, but 10,000 returns when there's nothing left to do isn't as elegant as if the shader knew exactly how often it needs to run from the start with a dispatchBuffer from its predecessor compute shader.

image image

I've been wanting to create an underwater world for my ocean repo for a while, but I've always put it off because I simply didn't know how to efficiently implement the transition from above water to underwater. It's easy with the depth shader, but if you're exactly on the waterline in calm water, you won't see any depth. You can use tricks or calculate it precisely with ocean triangles camera near plane intersection. DispatchWorkgroupsIndirect is perfect for compute shader dispatch dependency chains and regressions.

@Spiri0
Copy link
Contributor Author

Spiri0 commented Aug 8, 2025

Hi Sunag, I'm sorry for the delay. I've been very preoccupied with refraction the last few days.
I took a look at the count topic, because I initially liked the idea of replacing it with dispatchSize. But meanwhile I think differently because count is not the same as dispatchSize.

What do you think of the following idea?
Specifying the workgroupSize would be the default in the future, and a new node could be created for the dispatchSize, something like this: dispatchSize( count ). This node would then build the dispatchSize using the workgroupSize specified previously in compute( workgroupSize ) and with the count specified in the dispatchSize node, so that the computeNode and the backend are free of the count.
This would then also solve all the to-dos in the computeNode, because from my point of view it is the degree of freedom in the distribution between workgroupSize and dispatchSize that leads to these to-dos.

} )().compute( workgroupSize );   

...

await renderer.computeAsync( computeParticles, dispatchSize( particleCount ) );

So we have the possibility to specify the dispatchSize self as a number, array, buffer, or to use the dispatchSize node which builds the dispatchSize with the count and the workgroupSize from the compute entity. I realize that this means that over time users will have to adjust their code, but this seems like the cleanest solution to me.

@cmhhelgeson
Copy link
Contributor

cmhhelgeson commented Oct 9, 2025

It would be really nice to have this in r182. I'd obviously be hesitant to change the existing compute syntax, but I think workgroupSize acting as the main compute parameter makes sense in the sense that the ComputeNode effectively constructs the ComputeNode template, while the render loop is actually specifying many times that compute shader is run, rather than having to, say, construct the same compute node multiple times for different dispatch sizes or different element counts.

@Spiri0
Copy link
Contributor Author

Spiri0 commented Oct 9, 2025

I'm currently very busy with other things, so I haven't continued with this. The extension works wonderfully as is, though.
However, I see that there's a conflict with the current version. I'll have to take a look at it.

It's very easy to use. The buffer is passed directly to dispatchWorkgroupsIndirect in the backend. This makes things straightforward and very robust.

renderer.compute( someComputeNode, [ x, y, z ] );   //set by array
renderer.compute( someComputeNode, dispatchBuffer );   //set by buffer 

I'll update and see where the conflict with the current version lies.

@sunag, can this be included in the new version once I've updated it?
I would like to expand the additional features you mentioned, but I have a lot to do for my professional future at the moment.
With the existing computeKernel for the workgroupSize, the whole thing is already a very well-rounded thing.

@cmhhelgeson
Copy link
Contributor

cmhhelgeson commented Oct 9, 2025

My primary thoughts are:

A. In the examples from August, only workgroupSize is passed to the computeNode rather than the count. I'm not sure if that is still the case in this PR or if there is an intermediate solution, but none of the examples have been updated to use this syntax. If the PR is compatible with the existing compute syntax, than there should at minimum be an example demonstrating the functionality and/or benefits of computeIndirect.

The simplest idea I can think of is quickly updating one of the particle examples to dynamically update the number of particles that are being computed/rendered, with the count being read from the buffer passed to the dispatchWorkgroupsIndirect.

B. I think it would be better if there some sort of specific flag or declaration that a compute program would be executed indirectly, rather than switching based on the arguments passed to renderer.compute. That why it's more explicitly apparent to the user that the renderer or the shader program is being reconfigured to run indirectly, the same way that setting an indirect flag on geometry does in webgpu_struct_draw_indirect. Either that or there should be a separate computeIndirect function that takes a buffer as an argument with the compute function remaining as is. I think either solution would be better than overloading the function definition, and would align more closely with the WebGPU backend (dispatchWorkgroups + dispatchWorkgroupsIndirect functions versus a single dispatchWorkgroups function)

@TheBlek
Copy link
Contributor

TheBlek commented Oct 24, 2025

That why it's more explicitly apparent to the user that the renderer or the shader program is being reconfigured to run indirectly, the same way that setting an indirect flag on geometry does in webgpu_struct_draw_indirect.

I'm not sure this is a great idea as the same compute shader could be executed directly and indirectly without modifying it.

Either that or there should be a separate computeIndirect function that takes a buffer as an argument with the compute function remaining as is.

Almost all the other work in those functions would be the same, thus creating duplication if implemented separately. I don't think function overloading is much of a problem.

Anyway, looking forward to this being released!

@Spiri0
Copy link
Contributor Author

Spiri0 commented Oct 25, 2025

I will write tomorrow at the latest

@sunag
Copy link
Collaborator

sunag commented Oct 26, 2025

I think contributions of @TheBlek in Spiri0#10 are a great addition too, let’s try to merge this month 🙏 .

@Spiri0
Copy link
Contributor Author

Spiri0 commented Oct 26, 2025

@cmhhelgeson I'm not usually someone who insists on implementing things according to my own ideas. In this case, I consider the explicit introduction of extras that are specifically named indirect to be unnecessary. The only thing the passEncoder needs in the backend is an array ( direct ) or a buffer ( indirect ), and these can be passed directly to the renderer using:

renderer.compute( someComputeNode, dispatchArray );  //CPU 
renderer.compute( someComputeNode, dispatchBuffer );  //GPU

The indirectness is already apparent and clear when creating the buffer, since it must be an indirect buffer, as in @TheBlek's example.
I personally find this very elegant. The nice thing about the buffer is that you can also update it on the CPU side if you want. This gives you complete freedom with your workgroups and dispatches with little effort.

Please excuse my lack of activity lately. I'm currently working on a demo app that I'd like to present to the University of Berlin, as they like what I'm doing with Threejs.

@TheBlek Thanks for your effort with the example. I already use my extension myself, but my applications aren't suitable as examples due to their scope.

@cmhhelgeson
Copy link
Contributor

Please excuse my lack of activity lately. I'm currently working on a demo app that I'd like to present to the University of Berlin, as they like what I'm doing with Threejs.

No need to excuse your lack of inactivity 👍 and apologies for maybe coming on to strong with my own syntactical preferences.

@sunag sunag merged commit 8bcb114 into mrdoob:dev Oct 27, 2025
9 checks passed
@sunag sunag added this to the r181 milestone Oct 27, 2025
@sunag
Copy link
Collaborator

sunag commented Oct 27, 2025

@TheBlek I think it would be good to have your PR separately in a new now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants