Skip to content

Conversation

@andyleiserson
Copy link
Contributor

@andyleiserson andyleiserson commented Sep 12, 2025

Resolves #7679. Fixes #7854.

This moves command encoding from happening within the calls to the various methods for each command, to instead happen within CommandEncoder.finish. The motivation to do this is to allow generating better barriers in the command stream (in the future -- this PR does not change the barriers), and it also fixes a validation error that could occur in certain cases.

The change is divided into commits, but I'm not sure that makes reviewing easier other than by providing a way to bookmark review progress. (Which I think could equally well be done by noting which files you have reviewed.)

A lot of code has changed, but many of the changes are rote -- e.g. referencing things via state: EncodingState rather than via cmd_buf_data. Some particular things to focus on might be the comment on struct EncodingState, everything in command/mod.rs, and handling of compute / render passes.

With this PR, two copies of every command are saved when tracing -- one with Ids for the trace, and one with Arcs for encoding. I am still working on resolving that.

Testing
Mostly via existing tests, but this enables the tests from #7854 that are sensitive to when command encoding happens, which provides some directed coverage.

I did do some performance testing which had somewhat inconclusive results. When I ran the entire wgpu benchmark (which takes hours), it looked like things might be significantly (10-40%) slower. But since it took hours, I was doing other things on my system. When I isolated a specific test case and adjusted to a more reasonable runtime, things were within a few percent before and after the changes.

Squash or Rebase? Probably squash. Things mostly work at the intermediate stages, but there are also weird failures when commands get encoded out of order, that will not necessarily be obvious if encountered without context.

Checklist

  • Need to add a changelog entry.

@JMS55
Copy link
Collaborator

JMS55 commented Sep 13, 2025

Would be nice to run Bevy and see what the performance difference is.

@teoxoy teoxoy self-assigned this Sep 18, 2025
Copy link
Member

@teoxoy teoxoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@andyleiserson
Copy link
Contributor Author

I ran the bevy_render benchmark before and after this change, it seems like it may be very slightly slower, but the difference is below criterion's threshold for reporting a change on any single benchmark:

layers_intersect        time:   [2.0196 ns 2.0215 ns 2.0242 ns]
                        change: [−1.0243% −0.8111% −0.5820%] (p = 0.00 < 0.05)
                        Change within noise threshold.

smooth_normals          time:   [3.3578 ms 3.3621 ms 3.3667 ms]
                        change: [−1.4260% −1.0745% −0.7489%] (p = 0.00 < 0.05)
                        Change within noise threshold.

angle_weighted_normals  time:   [3.3651 ms 3.3690 ms 3.3731 ms]
                        change: [−0.4215% −0.0445% +0.2781%] (p = 0.80 > 0.05)
                        No change in performance detected.

face_weighted_normals   time:   [1.1781 ms 1.1843 ms 1.1909 ms]
                        change: [−1.0657% −0.2984% +0.5387%] (p = 0.47 > 0.05)
                        No change in performance detected.

flat_normals            time:   [559.91 µs 561.46 µs 563.11 µs]
                        change: [−1.1946% −0.6108% −0.0669%] (p = 0.04 < 0.05)
                        Change within noise threshold.

build_torus             time:   [2.2850 ns 2.3552 ns 2.4392 ns]
                        change: [−6.4441% −3.2043% +0.2574%] (p = 0.07 > 0.05)
                        No change in performance detected.

@Wumpf
Copy link
Member

Wumpf commented Sep 21, 2025

With this PR, two copies of every command are saved when tracing -- one with Ids for the trace, and one with Arcs for encoding. I am still working on resolving that.

Interesting, we've been in that state before transiently :)

@jimblandy
Copy link
Member

For what it's worth: this PR is important to Firefox.

Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good progres and we should land it.

Just so that I understand correctly, this doesn't yet resolve the issue of us needing multiple underlying command buffers in renderpasses - that would be a future pass?

@andyleiserson
Copy link
Contributor Author

Just so that I understand correctly, this doesn't yet resolve the issue of us needing multiple underlying command buffers in renderpasses - that would be a future pass?

Correct. This PR does change the behavior for non-pass operations directly on a command encoder -- previously each was given its own command buffer, now they will reuse a single command buffer. But each pass will still still have two command buffers, one for the "pre-pass" and one for the actual pass.

@cwfitzgerald
Copy link
Member

I'm not too concerned about performance here as we can always optimize later, plus this is far from its final form.

@cwfitzgerald
Copy link
Member

Merging.

@cwfitzgerald cwfitzgerald merged commit 1967900 into gfx-rs:trunk Sep 25, 2025
41 checks passed
@andyleiserson andyleiserson deleted the encode-on-finish branch September 25, 2025 18:56
sharmajai pushed a commit to sharmajai/wgpu that referenced this pull request Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

6 participants