-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add Span.Reverse() intrinsic for Arm64 #72780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
adamsitnik
merged 5 commits into
dotnet:main
from
SwapnilGaikwad:github-span-reverse-byte-intrinsic
Aug 10, 2022
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
6f9008a
Add Span.Reverse() intrinsic for AArch64
SwapnilGaikwad f8e22e4
Use Vector128.Shuffle() to reverse vector elements
SwapnilGaikwad c673a31
Use Vector256.Shuffle() for AVX2 implementations of Span.Reverse()
SwapnilGaikwad 998d7e8
Remove AVX2 refactoring using Vector256.Shuffle()
SwapnilGaikwad 1298d3f
Apply suggestions from code review: use Vector128.IsHardwareAccelerated
adamsitnik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with when you tried to replace
Avx2.ShufflewithVector256.Shuffleis that you didn't adjust thereverseMask.You should change this:
To:
The
Vector256APIs operate as if it is1x256-bitvector rather than as2x128-bitvector lanes. This is consistent with howAVX-512,Arm64,WASM,Vector64,Vector128, and other types all operate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do this, then
Vector256.Shuffle(tempFirst, Vector256.Create(...))will work as expected and still be performant on AVX2 hardware where you don't want to cross lanes.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, shouldn't we adjust the mask to
I noticed the issue the reverse mask but couldn't reproduce the failure yet. Does the
runfotool expects to use Windows only? The steps usingrunfoto reproduce the pipeline failure seem create a batch file to run on Windows.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be used on Linux as well. You just need to pass the right job-id and work item which you can find it in AzDo.
The
RunTests.cmd/shis present in the zip folder you download.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh, right. Thanks Kunal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave the AVX2 changes out of this patch. The patch is now self contained to changes in the 128bit variants.
I'm happy for 256bit part to be changed if it's obvious, but clearly it's going to require some extra debugging to get right and not break performance (given #72793). Let's avoid feature creeping this PR.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to leave it untouched, crossplatform Vector256 apis are mostly for consistency, they're not crossplatform and unlikely to be ever so