Skip to content

Conversation

@rocking5566
Copy link
Collaborator

@rocking5566 rocking5566 commented Feb 10, 2023

This PR improve performance of normalization kernel
Improve fp32 at least 15%

What I have done in this PR

  1. Add more instances (tune kernel parameter)
  2. Separate the pipeline of sweep once from normal pipeline, so that I could move the buffer_load(gamma) in advance to hide the latency of welford
  3. Support naive variance for normalization

@rocking5566 rocking5566 removed the WIP label Feb 15, 2023
@rocking5566 rocking5566 changed the title Improve layernorm Improve normalization Feb 15, 2023
@asroy asroy merged commit 6a6163a into develop Feb 15, 2023
@illsilin illsilin deleted the improve_layernorm branch December 7, 2023 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants