-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Avoid large table as part of TensorPrimitives remainder mask #92765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1404,22 +1404,25 @@ private static float GetFirstNaN(Vector512<float> vector) => | |
|
|
||
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private static unsafe Vector128<float> LoadRemainderMaskSingleVector128(int validItems) => | ||
| Vector128.LoadUnsafe( | ||
| ref Unsafe.As<uint, float>(ref MemoryMarshal.GetReference(RemainderUInt32Mask_16x16)), | ||
| (uint)((validItems * 16) + 12)); // last four floats in the row | ||
| Vector128.ConditionalSelect( | ||
| Vector128.LessThan(Vector128.Create(3, 2, 1, 0), Vector128.Create(validItems)).AsSingle(), | ||
| Vector128<float>.AllBitsSet, | ||
| Vector128<float>.Zero); | ||
|
|
||
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private static unsafe Vector256<float> LoadRemainderMaskSingleVector256(int validItems) => | ||
| Vector256.LoadUnsafe( | ||
| ref Unsafe.As<uint, float>(ref MemoryMarshal.GetReference(RemainderUInt32Mask_16x16)), | ||
| (uint)((validItems * 16) + 8)); // last eight floats in the row | ||
| Vector256.ConditionalSelect( | ||
| Vector256.LessThan(Vector256.Create(7, 6, 5, 4, 3, 2, 1, 0), Vector256.Create(validItems)).AsSingle(), | ||
| Vector256<float>.AllBitsSet, | ||
| Vector256<float>.Zero); | ||
|
|
||
| #if NET8_0_OR_GREATER | ||
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private static unsafe Vector512<float> LoadRemainderMaskSingleVector512(int validItems) => | ||
| Vector512.LoadUnsafe( | ||
| ref Unsafe.As<uint, float>(ref MemoryMarshal.GetReference(RemainderUInt32Mask_16x16)), | ||
| (uint)(validItems * 16)); // all sixteen floats in the row | ||
| Vector512.ConditionalSelect( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So for AVX512 it'd looked like var mmask16 mask = (1 << n) - 1;
var maskedAnd = Avx512.And(vec1, vec2, mask);but we decided not to expose mask registers so if we expose these as a public API, we can intrinsify at leas the AVX512 version to be cheap 🙂 if it matters, presumably, performance of handling of trailing elements is not that much important esp for large data. |
||
| Vector512.LessThan(Vector512.Create(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0), Vector512.Create(validItems)).AsSingle(), | ||
| Vector512<float>.AllBitsSet, | ||
| Vector512<float>.Zero); | ||
| #endif | ||
|
|
||
| private readonly struct AddOperator : IAggregationOperator | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be slower and result in a 128-bit constant (or more for other sizes) emitted at runtime per method this gets inlined into.
I think the table is ultimately better and we can get rid of it longer term using a proper JIT intrinsic.
If we really don't want the table, then achieving this with just a broadcast + comparison should be sufficient, since
LessThanalready produces a per-element mask ofAllBitsSet(true) andZero(false). You can useGreaterThanOrEqualif you need the mask invertedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll stick with the table. I was on the fence anyway. Just don't like its bulkiness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, since
TrailingMaskwants to handle only trailing elements, it will skip processing the firstnitems that have already been processed.So if
2items remain, you have3 < 2, 2 < 2, 1 < 2, 0 < 2which producesZero, Zero, AllBitsSet, AllBitsSetalready