-
Notifications
You must be signed in to change notification settings - Fork 5.2k
JIT ARM64-SVE: Add TrueMask and LoadVector #98218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
6f94411
864b925
c2031ca
83194f3
1c66d45
fe09128
dce9aef
941db03
8bd6507
5dc7234
5a2e84e
310812f
8fdd381
93c33af
afdae94
6beb760
dae6d90
fa07d6b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -58,6 +58,7 @@ enum HWIntrinsicCategory : uint8_t | |
| HW_Category_ShiftLeftByImmediate, | ||
| HW_Category_ShiftRightByImmediate, | ||
| HW_Category_SIMDByIndexedElement, | ||
| HW_Category_EnumPattern, | ||
|
|
||
| // Helper intrinsics | ||
| // - do not directly correspond to a instruction, such as Vector64.AllBitsSet | ||
|
|
@@ -175,6 +176,21 @@ enum HWIntrinsicFlag : unsigned int | |
|
|
||
| // The intrinsic needs consecutive registers | ||
| HW_Flag_NeedsConsecutiveRegisters = 0x4000, | ||
|
|
||
| // The intrinsic uses scalable registers | ||
| HW_Flag_Scalable = 0x8000, | ||
|
|
||
| // Returns Per-Element Mask | ||
| // the intrinsic returns a vector containing elements that are either "all bits set" or "all bits clear" | ||
| // this output can be used as a per-element mask | ||
| HW_Flag_ReturnsPerElementMask = 0x10000, | ||
kunalspathak marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| // The intrinsic uses a mask in arg1 to select elements present in the result | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we not just check for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, that's the sve convention. Result, then mask, then inputs.
Ok, that sounds better. I can look and see how this would be done. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@tannergooding - Looking closer at this, I'm not quite sure what this would entail. In I can't see any obvious way for the jit to understand know that the first arg of the method is expected to be a predicate mask, other than to use the enum or hardcode it with case statements somewhere. The jit can check the type of the actual arg1 child node, but that only tells us what the type actually is, and not what the expected type is. I imagine I'll have to write code that says if the actual type and expected type don't match, then somehow convert arg1 to the expected type. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, basically. Most intrinsics support masking optionally and so you'll have something similar to this https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/gentree.cpp#L19988-L20008. That is, you'll have some There are then a handful of intrinsics which require masking. For example, SVE comparison intrinsics may always return a TYP_MASK, in which case you could either add a new entry to the table such as There are then a handful of intrinsics which require mask inputs and which aren't recognized via pattern matching. You would likewise add a flag or manually handle the few of them like this: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicxarch.cpp#L3970-L3983 The insertion of the We then make this efficient in morph (see https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/morph.cpp#L10775-L10827) where we ensure that we aren't unnecessarily converting from mask to vector and back to mask, or vice versa. This allows things that take a mask to consume a produced mask directly and gives the optimal codegen expected in most scenarios. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That was the comment around
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. That feels like it might touch quite a few files. Given the size of this PR, do you think it's worth keeping this PR as is, and then putting the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I think this would even be the preferred route given its not required and is its own isolated change really.
Which lowering code is this? In general I think its fine for this PR to be the basic plumbing of TYP_MASK support into the Arm64 side of the JIT. As long as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I added some code do the remove the mask->vector->mask and vector->mask->vector conversions. But, nothing in this PR uses it because of the lcl var, so I decided not to push it. Will mark this as ready now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
... but not quite yet, as I need #99049 to merge so I can remove it from this PR. |
||
| HW_Flag_MaskedOperation = 0x20000, | ||
|
|
||
| // The intrinsic uses a mask in arg1 to select elements present in the result, and must use a low register. | ||
| HW_Flag_LowMaskedOperation = 0x40000, | ||
|
|
||
| #else | ||
| #error Unsupported platform | ||
| #endif | ||
|
|
@@ -654,10 +670,8 @@ struct HWIntrinsicInfo | |
| static bool ReturnsPerElementMask(NamedIntrinsic id) | ||
| { | ||
| HWIntrinsicFlag flags = lookupFlags(id); | ||
| #if defined(TARGET_XARCH) | ||
| #if defined(TARGET_XARCH) || defined(TARGET_ARM64) | ||
| return (flags & HW_Flag_ReturnsPerElementMask) != 0; | ||
| #elif defined(TARGET_ARM64) | ||
| unreached(); | ||
| #else | ||
| #error Unsupported platform | ||
| #endif | ||
|
|
@@ -848,6 +862,25 @@ struct HWIntrinsicInfo | |
| const HWIntrinsicFlag flags = lookupFlags(id); | ||
| return (flags & HW_Flag_HasImmediateOperand) != 0; | ||
| } | ||
|
|
||
| static bool IsScalable(NamedIntrinsic id) | ||
| { | ||
| const HWIntrinsicFlag flags = lookupFlags(id); | ||
| return (flags & HW_Flag_Scalable) != 0; | ||
| } | ||
|
|
||
| static bool IsMaskedOperation(NamedIntrinsic id) | ||
| { | ||
| const HWIntrinsicFlag flags = lookupFlags(id); | ||
| return ((flags & HW_Flag_MaskedOperation) != 0) || IsLowMaskedOperation(id); | ||
| } | ||
|
|
||
| static bool IsLowMaskedOperation(NamedIntrinsic id) | ||
| { | ||
| const HWIntrinsicFlag flags = lookupFlags(id); | ||
| return (flags & HW_Flag_LowMaskedOperation) != 0; | ||
| } | ||
|
|
||
| #endif // TARGET_ARM64 | ||
|
|
||
| static bool HasSpecialSideEffect(NamedIntrinsic id) | ||
|
|
@@ -907,7 +940,7 @@ struct HWIntrinsic final | |
| InitializeBaseType(node); | ||
| } | ||
|
|
||
| bool IsTableDriven() const | ||
| bool codeGenIsTableDriven() const | ||
| { | ||
| // TODO-Arm64-Cleanup - make more categories to the table-driven framework | ||
| bool isTableDrivenCategory = category != HW_Category_Helper; | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.