Skip to content

[AUDIO_WORKLET] Optimised output buffer copy #24891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

cwoffenden
Copy link
Contributor

@cwoffenden cwoffenden commented Aug 8, 2025

A reworking of #22753, which "improves the copy back from the audio worklet's heap to JS by 7-12x depending on the browser." From the previous description:

Since we pass in the stack for the worklet from the caller's heap, its address doesn't change. And since the render quantum size doesn't change after the audio worklet creation, the stack positions for the audio buffers do not change either. This optimisation adds one-time subarray views and replaces the float-by-float copy with a simple set() per channel (per output).

The existing interactive tests (written for the original PR) can be run for comparison:

test/runner interactive.test_audio_worklet_stereo_io
test/runner interactive.test_audio_worklet_2x_stereo_io
test/runner interactive.test_audio_worklet_mono_io
test/runner interactive.test_audio_worklet_2x_hard_pan_io
test/runner interactive.test_audio_worklet_params_mixing

These test various input/output arrangements as well as parameters (parameters are interesting because, depending on the browser, the sizes change as the params move from static to varying).

The original benchmark of the extracted copy is still valid:

https://wip.numfum.com/cw/2024-10-29/index.html

This is tested with 32- and 64-bit wasm (which required a reordering of how structs and data were stored to avoid alignment issues).

Some explanations:

  • Fixed-position output buffer views are created once in theWasmAudioWorkletProcessor constructor
  • Stack allocations for the process() call are split into aligned struct data (see the comments) and audio/param data
  • The struct writes are simplified by this splitting of data
  • ASSERTIONS are used to ensure everything fits and correctly aligns
  • The tests account for size changes in the params, which can vary from a single float to 128 floats (a single float nicely showing up any 8-byte alignment issues for wasm64)

Future improvements: the output views are sequential, so instead of of being individual views covering each channel the views could cover one to however-many-views needed, with a single set() being enough for all outputs.

@cwoffenden cwoffenden force-pushed the cw-aw-optimised-copy branch from 5a70474 to 99f4c12 Compare August 8, 2025 18:23
@cwoffenden cwoffenden marked this pull request as draft August 8, 2025 18:23
@cwoffenden cwoffenden force-pushed the cw-aw-optimised-copy branch 2 times, most recently from e19d4fd to 348932f Compare August 12, 2025 16:17
@cwoffenden cwoffenden marked this pull request as ready for review August 12, 2025 19:54
@cwoffenden cwoffenden force-pushed the cw-aw-optimised-copy branch from 2313240 to 748c167 Compare August 13, 2025 08:30
@sbc100 sbc100 requested a review from juj August 13, 2025 15:16
for (var n = this.maxBuffers; n > 0; n--) {
// Added in reverse so the lowest indices are closest to the stack top
this.outputViews.unshift(
HEAPF32.subarray(viewDataIdx, viewDataIdx += this.samplesPerChannel)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line of pre-creating the views does not seem to be safe.

The problem is that if the WebAssembly.Memory object is grown after creating this views, then these outputViews objects become null typed array views.

Consider:

image

So in order to be able to precreate the views, they would have to be updated when the buffer is grown.


Note that the above behavior is super subtle. The "neutering" occurs only if it is the AudioWorker context that grows the WebAssembly Memory. If another Worker context grows the memory, then this.outputViews will actually remain valid to view the old heap size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has been recent work to fix this issue: #24684

We cannot turn it on by default yet, but in the mean time you should be able to do something like this:

// Support for growable heap + pthreads, where the buffer may change, so JS views
// must be updated.
function growMemViews() {
// `updateMemoryViews` updates all the views simultaneously, so it's enough to check any of them.
if (wasmMemory.buffer != HEAP8.buffer) {
updateMemoryViews();
}
}

i.e. you can recreate the views (only) when needed.

Copy link
Contributor Author

@cwoffenden cwoffenden Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both for the feedback. I'll look at:

  • Splitting out the view creation ✅
  • Looking for changes and recreating views ✅
  • At the same time I might as well copy all channels at once with a single set()
  • Add an audio test that grows the heap ✅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #24931 to test the heap growing (next will fix the breakage).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #24931 this PR as-is plays back fine on Mac Chrome, Firefox and Safari (even though HEAPF32.buffer != this.outputViews[0].buffer). Next I'll get the AW to grow the memory to repro juj's example.

Regardless, I'll catch the change and recreate the views.

@cwoffenden cwoffenden marked this pull request as draft August 13, 2025 17:31
@cwoffenden cwoffenden force-pushed the cw-aw-optimised-copy branch 2 times, most recently from 4cba21c to 998ac6b Compare August 14, 2025 14:45
@cwoffenden cwoffenden force-pushed the cw-aw-optimised-copy branch from b57d103 to 5186686 Compare August 14, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants