FFTW can use 4x available threads #223

navidcy · 2020-11-16T00:24:15Z

See discussion in

CliMA/Oceananigans.jl#1113

JuliaMath/FFTW.jl#105

JuliaMath/FFTW.jl#151

navidcy · 2020-12-16T20:46:06Z

@glwagner shall I merge this? Is there a verdict whether it makes any difference?

glwagner · 2020-12-16T22:10:03Z

only one way to find that out...

navidcy · 2020-12-22T00:00:53Z

I made a "clean" decaying 2D turbulence script.

using FourierFlows, Printf, Random
 
using Random: seed!
using FFTW: rfft, irfft

import GeophysicalFlows.TwoDNavierStokes
import GeophysicalFlows.TwoDNavierStokes: energy, enstrophy
import GeophysicalFlows: peakedisotropicspectrum


dev = CPU()     # Device (CPU/GPU)
n, L  = 1024, 2π             # grid resolution and domain length

    dt = 2e-3  # timestep
nsteps = 4000  # total number of steps
 nsubs = 20    # number of steps between each plot

prob = TwoDNavierStokes.Problem(dev; nx=n, Lx=L, ny=n, Ly=L, dt=dt, stepper="FilteredRK4")

sol, clock, vars, grid = prob.sol, prob.clock, prob.vars, prob.grid
x, y = grid.x, grid.y

seed!(1234)
k₀, E₀ = 6, 0.5
ζ₀ = peakedisotropicspectrum(grid, k₀, E₀, mask=prob.timestepper.filter)
TwoDNavierStokes.set_zeta!(prob, ζ₀)

startwalltime = time()

for j = 0:Int(nsteps/nsubs)
  if j % (1000 / nsubs) == 0
    cfl = clock.dt * maximum([maximum(vars.u) / grid.dx, maximum(vars.v) / grid.dy])
    
    log = @sprintf("step: %04d, t: %d, cfl: %.2f, walltime: %.2f min",
        clock.step, clock.t, cfl, (time()-startwalltime)/60)

    println(log)
  end  

  stepforward!(prob, nsubs)
  TwoDNavierStokes.updatevars!(prob)  
end

println(@sprintf("walltime: %.2f min", (time()-startwalltime)/60))

Running with n=256 I got

With current setup:

step: 0000, t: 0, cfl: 0.46, walltime: 0.00 min
step: 1000, t: 5, cfl: 0.51, walltime: 0.16 min
step: 2000, t: 10, cfl: 0.46, walltime: 0.33 min
step: 3000, t: 15, cfl: 0.56, walltime: 0.51 min
step: 4000, t: 20, cfl: 0.41, walltime: 0.69 min
walltime: 0.70 min

and with FFTW.set_num_threads(4*threads):

step: 0000, t: 0, cfl: 0.46, walltime: 0.00 min
step: 1000, t: 5, cfl: 0.51, walltime: 0.18 min
step: 2000, t: 10, cfl: 0.46, walltime: 0.35 min
step: 3000, t: 15, cfl: 0.56, walltime: 0.51 min
step: 4000, t: 20, cfl: 0.41, walltime: 0.67 min
walltime: 0.67 min

Hm.... then I cranked it up to n=1024. Results are:

step: 0000, t: 0, cfl: 0.79, walltime: 0.00 min
step: 1000, t: 2, cfl: 0.87, walltime: 2.38 min
step: 2000, t: 4, cfl: 0.70, walltime: 4.71 min
step: 3000, t: 6, cfl: 0.74, walltime: 7.10 min
step: 4000, t: 8, cfl: 0.78, walltime: 9.53 min
walltime: 9.58 min

and with FFTW.set_num_threads(4*threads):

step: 0000, t: 0, cfl: 0.79, walltime: 0.00 min
step: 1000, t: 2, cfl: 0.87, walltime: 2.34 min
step: 2000, t: 4, cfl: 0.70, walltime: 4.76 min
step: 3000, t: 6, cfl: 0.74, walltime: 7.26 min
step: 4000, t: 8, cfl: 0.78, walltime: 9.68 min
walltime: 9.72 min

So, @glwagner, based on the above I conclude that this PR does nothing. So I'm closing it and feel free to open if you think otherwise.

fftw can use 4x available threads

0f06869

navidcy requested a review from glwagner November 16, 2020 00:24

navidcy closed this Dec 22, 2020

navidcy deleted the fftw-4x branch February 25, 2021 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FFTW can use 4x available threads #223

FFTW can use 4x available threads #223

Uh oh!

navidcy commented Nov 16, 2020

Uh oh!

navidcy commented Dec 16, 2020

Uh oh!

glwagner commented Dec 16, 2020

Uh oh!

navidcy commented Dec 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FFTW can use 4x available threads #223

FFTW can use 4x available threads #223

Uh oh!

Conversation

navidcy commented Nov 16, 2020

Uh oh!

navidcy commented Dec 16, 2020

Uh oh!

glwagner commented Dec 16, 2020

Uh oh!

navidcy commented Dec 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants