- 
                Notifications
    You must be signed in to change notification settings 
- Fork 87
Implement mapreduce #561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Implement mapreduce #561
Conversation
| Could these things maybe live in https://github.com/anicusan/AcceleratedKernels.jl in the future ? | 
| 
 There is a dependency ordering issue, GPUArrays is the common infrastructure and this is would be the fallback implementation for a common implementation. So GPUArrays would need to take a dependency on something like AcceleratedKernels.jl | 
| Of course JLArrays doesn't work.. That uses the CPU backend and this is  | 
| 
 I was considering it as "leave it to AcceleratedKernels" to implement these. Well, it's a very young package, but I was wondering if it could be a path towards the future ;) | 
| Just to write down my current understanding of the JLArray issue: Is not valid for the CPU in KA right now due to the synchronization within a  GPU execution on all vendors should still work, and Arrays should have their own implementation somewhere else. It's just that the JLArray tests will fail for a bit here. | 
        
          
                src/host/mapreduce.jl
              
                Outdated
          
        
      | # reduce_items = launch_configuration(kernel) | ||
| reduce_items = 512 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # reduce_items = launch_configuration(kernel) | |
| reduce_items = 512 | |
| # reduce_items = compute_items(launch_configuration(kernel)) | |
| reduce_items = compute_items(512) | 
But also has to become dynamic, of course.
        
          
                src/host/mapreduce.jl
              
                Outdated
          
        
      | # we need multiple steps to cover all values to reduce | ||
| partial = similar(R, (size(R)..., reduce_groups)) | ||
| if init === nothing | ||
| # without an explicit initializer we need to copy from the output container | ||
| partial .= R | ||
| end | ||
| reduce_kernel(f, op, init, Val(items), Rreduce, Rother, partial, A; ndrange) | ||
|  | ||
| GPUArrays.mapreducedim!(identity, op, R′, partial; init=init) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a good time to add support for grid stride loops to KA.jl and handle this with a single kernel launch + atomic writes to global memory?
7348bba    to
    f418d7a      
    Compare
  
    4974a5e    to
    2314e24      
    Compare
  
    | If we continue this, see JuliaGPU/CUDA.jl#2778: The  | 
| I thought the idea was to move towards depending on AK.jl for these kernels? | 
| 
 Ideally it would be. I pushed the feedback so it would be easier to benchmark between existing implementations and the equivalent KA port (for Metal at least, CUDA has some differences as previously discussed) | 
Ported from oneAPI.jl