Workaround for performance regression introduced by FFTW #362

galenlynch · 2020-05-24T16:46:25Z

Convolutions in DSP currently rely on FFTW.jl, and a recent change in FFTW.jl
(JuliaMath/FFTW.jl#105) has introduced a large performance regression in conv
whenever Julia is started with more than one thread. Since v1 of FFTW.jl, it uses multi-threaded
FFTW transformations by default whenever Julia has more than one thread. This
new default causes small FFT problems to run much more slowly and use much more
memory. Since the overlap-save method of conv in DSP breaks a convolution
into many small convolutions, and therefore performs a large number of small FFTW
transformations, this change can cause convolutions to be slower by two orders
of magnitude, and similarly use two orders of magnitude more memory. While
FFTW.jl does not provide an explicit way to set the number of threads used by a
FFTW plan without changing a global variable, generating the plans with the
planning flag set to FFTW.PATIENT (instead of the default MEASURE) allows
the planner to consider changing the number of threads. Adding this flag to the
plans generated by the overlap-save convolution method seems to rescue the
performance regression on multi-threaded instances of Julia.

Fixes #339
Also see JuliaMath/FFTW.jl#121

Convolutions in DSP currently rely on FFTW.jl, and a recent change in FFTW.jl (JuliaMath/FFTW.jl#105) has introduced a large performance regression in `conv` whenever Julia is started with more than one thread. Since v1 of FFTW.jl, it uses multi-threaded FFTW transformations by default whenever Julia has more than one thread. This new default causes small FFT problems to run much more slowly and use much more memory. Since the overlap-save method of `conv` in DSP breaks a convolutions into small convolutions, and therefore performs a large number of small FFTW transformations, this change can cause convolutions to be slower by two orders of magnitude, and similarly use two orders of magnitude more memory. While FFTW.jl does not provide an explicit way to set the number of threads used by a FFTW plan without changing a global variable, generating the plans with the planning flag set to `FFTW.PATIENT` (instead of the default `MEASURE`) allows the planner to consider changing the number of threads. Adding this flag to the plans generated by the overlap-save convolution method seems to rescue the performance regression on multi-threaded instances of Julia. Fixes JuliaDSP#399 Also see JuliaMath/FFTW.jl#121

martinholters · 2020-05-25T07:08:48Z

This seems to take significant time when running the tests---enough for travis to cancel the jobs after 10 minutes without output.

martinholters · 2020-05-25T11:57:12Z

The planning time seems to become excessive in the higher-dimensional case:

julia> x = zeros(128, 128, 128);

julia> out = zeros(255, 255, 255);

# master
julia> @time DSP.unsafe_conv_kern_os!(out, x, x, size(x), size(x), size(out), (256, 256, 256));
  9.859362 seconds (13.47 M allocations: 1.152 GiB, 3.15% gc time)

julia> @time DSP.unsafe_conv_kern_os!(out, x, x, size(x), size(x), size(out), (256, 256, 256));
  5.168003 seconds (2.27 k allocations: 515.127 MiB, 2.97% gc time)

# this PR
julia> @time DSP.unsafe_conv_kern_os!(out, x, x, size(x), size(x), size(out), (256, 256, 256));
751.674809 seconds (13.06 M allocations: 1.381 GiB, 0.04% gc time)

julia> @time DSP.unsafe_conv_kern_os!(out, x, x, size(x), size(x), size(out), (256, 256, 256));
  3.693583 seconds (2.25 k allocations: 772.126 MiB, 4.53% gc time)

(Note: this is without multiple julia threads.)

This looks prohibitive, I'm afraid, but I don't have an alternative idea.

galenlynch · 2020-05-25T12:38:18Z

Ah, bummer :(

I guess we'll have to wait to see if JuliaMath/FFTW.jl#150 gets merged...

galenlynch closed this May 25, 2020

galenlynch deleted the single_threaded_ffts branch May 25, 2020 12:38

galenlynch mentioned this pull request May 25, 2020

slowdown in threaded code from julia 1.2 to julia 1.4-DEV JuliaMath/FFTW.jl#121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Workaround for performance regression introduced by FFTW #362

Workaround for performance regression introduced by FFTW #362

Uh oh!

galenlynch commented May 24, 2020 •

edited

Loading

Uh oh!

martinholters commented May 25, 2020

Uh oh!

martinholters commented May 25, 2020

Uh oh!

galenlynch commented May 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Workaround for performance regression introduced by FFTW #362

Workaround for performance regression introduced by FFTW #362

Uh oh!

Conversation

galenlynch commented May 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinholters commented May 25, 2020

Uh oh!

martinholters commented May 25, 2020

Uh oh!

galenlynch commented May 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

galenlynch commented May 24, 2020 •

edited

Loading