Skip to content

Conversation

@galenlynch
Copy link
Member

@galenlynch galenlynch commented May 24, 2020

Convolutions in DSP currently rely on FFTW.jl, and a recent change in FFTW.jl
(JuliaMath/FFTW.jl#105) has introduced a large performance regression in conv
whenever Julia is started with more than one thread. Since v1 of FFTW.jl, it uses multi-threaded
FFTW transformations by default whenever Julia has more than one thread. This
new default causes small FFT problems to run much more slowly and use much more
memory. Since the overlap-save method of conv in DSP breaks a convolution
into many small convolutions, and therefore performs a large number of small FFTW
transformations, this change can cause convolutions to be slower by two orders
of magnitude, and similarly use two orders of magnitude more memory. While
FFTW.jl does not provide an explicit way to set the number of threads used by a
FFTW plan without changing a global variable, generating the plans with the
planning flag set to FFTW.PATIENT (instead of the default MEASURE) allows
the planner to consider changing the number of threads. Adding this flag to the
plans generated by the overlap-save convolution method seems to rescue the
performance regression on multi-threaded instances of Julia.

Fixes #339
Also see JuliaMath/FFTW.jl#121

Convolutions in DSP currently rely on FFTW.jl, and a recent change in FFTW.jl
(JuliaMath/FFTW.jl#105) has introduced a large performance regression in `conv`
whenever Julia is started with more than one thread. Since v1 of FFTW.jl, it uses multi-threaded
FFTW transformations by default whenever Julia has more than one thread. This
new default causes small FFT problems to run much more slowly and use much more
memory. Since the overlap-save method of `conv` in DSP breaks a convolutions
into small convolutions, and therefore performs a large number of small FFTW
transformations, this change can cause convolutions to be slower by two orders
of magnitude, and similarly use two orders of magnitude more memory. While
FFTW.jl does not provide an explicit way to set the number of threads used by a
FFTW plan without changing a global variable, generating the plans with the
planning flag set to `FFTW.PATIENT` (instead of the default `MEASURE`) allows
the planner to consider changing the number of threads. Adding this flag to the
plans generated by the overlap-save convolution method seems to rescue the
performance regression on multi-threaded instances of Julia.

Fixes JuliaDSP#399
Also see JuliaMath/FFTW.jl#121
@martinholters
Copy link
Member

This seems to take significant time when running the tests---enough for travis to cancel the jobs after 10 minutes without output.

@martinholters
Copy link
Member

The planning time seems to become excessive in the higher-dimensional case:

julia> x = zeros(128, 128, 128);

julia> out = zeros(255, 255, 255);

# master
julia> @time DSP.unsafe_conv_kern_os!(out, x, x, size(x), size(x), size(out), (256, 256, 256));
  9.859362 seconds (13.47 M allocations: 1.152 GiB, 3.15% gc time)

julia> @time DSP.unsafe_conv_kern_os!(out, x, x, size(x), size(x), size(out), (256, 256, 256));
  5.168003 seconds (2.27 k allocations: 515.127 MiB, 2.97% gc time)

# this PR
julia> @time DSP.unsafe_conv_kern_os!(out, x, x, size(x), size(x), size(out), (256, 256, 256));
751.674809 seconds (13.06 M allocations: 1.381 GiB, 0.04% gc time)

julia> @time DSP.unsafe_conv_kern_os!(out, x, x, size(x), size(x), size(out), (256, 256, 256));
  3.693583 seconds (2.25 k allocations: 772.126 MiB, 4.53% gc time)

(Note: this is without multiple julia threads.)

This looks prohibitive, I'm afraid, but I don't have an alternative idea.

@galenlynch
Copy link
Member Author

Ah, bummer :(

I guess we'll have to wait to see if JuliaMath/FFTW.jl#150 gets merged...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance regression in conv for images

2 participants