Skip to content

Commit f92d72a

Browse files
kunalspathaktannergoodinglambdageekvargaz
authored
Arm64: Implement VectorTableLookup/VectorTableLookupExtension intrinsinsic + Consecutive registers support (#80297)
* Add VectorTableLookup 2/3/4 in hwinstrinsiclistarm64.h * Add VectorTableLookup * fixes to libraries * Prototype of simple tbl * Some progress * Some more updates * working model * Vector64<byte> support * Add VectorTableLookup_3 * Add VectorTableLookup_4 * cleanup * Remove regCount from LclVarDsc * Some more cleanup Some more cleanup * setNextConsecutiveRegisterAssignment * Some more cleanup * TARGET_ARM64 * Use getNextConsecutiveRefPositions instead of nextConsecutiveRefPosition field * jit format * Move getNextConsecutiveRefPosition * SA1141: Use tuple syntax * Remove the unwanted field list code * revert the flag that was mistakenly changed * Add test cases * FIELD_LIST * Use FIELD_LIST approach * jit format and fix arm build * fix assert failure * Add summary docs Add summary docs in all the required files. * Make APIs public again * cleanup * Handle case for reg mod 32 * Remove references from ref until API is approved * Use generic getFreeCandidates() * Add entries in ExtraAPis * Set CLSCompliant=false * Move in inner class * Remove CLSCompliant flag * Add a suppression file for System.Runtime.Intrinsics on the new APIs until it they go through API review * Review feedback * Add workaround for building tests * review feedback * TP: remove needsConsecutive parameter from BuildUse() * TP: Remove pseudo intrinsic entries * More fixes * Add the missing csproj * Fix test cases * Add fake lib for AdvSimd.Arm64* as well * Remove the workaround * Use template to control if consecutive registers is needed or not * jit format * fix the workaround * Revert "fix the workaround" This reverts commit 1cb22d0. * Revert "Remove the workaround" This reverts commit b0b6a5e. * Add VectorTableLookupExtensions in libraries * Add support for VectorTableLookupExtension * WIP: available regs * WIP: Remove test hacks * Update getFreeCandidates() for consecutive registers * Add missing resetRegState() * Do not assume the current assigned register for consecutiveRegisters refposition is good. If a refposition is marked as needConsecutive, then do not just assume that the existing register assigned is good. We still go through the allocation for it to make sure that we allocate it a register such that the consecutive registers are also free. * Handle case for copyReg For copyReg, if we assigned a different register, do not forget to free the existing register it was holding * Update setNextConsecutiveRegister() with UPPER_VECTOR_RESTORE * Update code around copyReg Updated code such that if the refPosition is already assigned a register, then check if assignedRegister satisfies are needs (for first / non-first refposition). If not, performs copyReg. TODO: Extract the code surrounding and including copyReg until where we `continue`. * Create the VectorTableLookup fake CoreLib as a reference assembly Make the AdvSimd.Arm64 tests reference the VectorTableLookup fake CoreLib as reference assembly; and ensure that it is not included as a ProjectReference by the toplevel HardwareIntrinsics merged test runners. The upshot is that the AdvSimd.Arm64 tests can call the extra APIs via a direct reference to CoreLib (instead of through System.Runtime), but the fake library is not copied into any test artifact directories, and the Mono AOT compiler never sees it. That said, after applying this, the test fails during AOT compilation of the *real* CoreLib ``` Mono Ahead of Time compiler - compiling assembly /Users/alklig/work/dotnet-runtime/runtime-bugs2/artifacts/tests/coreclr/osx.arm64.Release/Tests/Core_Root/System.Private.CoreLib.dll AOTID EA8D702E-9736-3BD5-435B-A9D5EEADCC78 %"System.ValueTuple`2<System.Runtime.Intrinsics.Vector128`1<byte>, System.Runtime.Intrinsics.Vector128`1<byte>>"* %arg_table <16 x i8> * Assertion: should not be reached at /Users/alklig/work/dotnet-runtime/runtime-bugs2/src/mono/mono/mini/mini-llvm.c:1455 ``` * Rename VectorTableLookup to VectorTableLookup.RefOnly * Start consecutive refpositions with RefTypeUse and never with RefTypeUpperVectorSave * Add test cases for VectorTableLookupExtension * Pass the missing defaultValues * Use platform neutral BitScanForward * jit format * Remove the fake testlib workaround * Fix mono failures * Fix x64 TP regression * Fix test cases * fix some more tp regression * Fix test build * misc. changes * Fix the bug where we were not freeing copyReg causing an assert in tier0 * Refactor little bit to reduce checks for VectorTableLookup * Add template parameter for allocateReg/copyReg/select * Comments * Fix mono failures * Added some more comments * Call allocateReg/assignCopyReg/select methods only for refpositions that need consecutive registers * Add heuristics to pick best possible set of registers which will need less spilling * setNextConsecutiveRegisterAssignment() no longer checks for areNextConsecutiveRegistersFree() * Rename getFreeCandidates() -> getConsecutiveCandidates() * fix parameters to areNextConsecutiveRegistersFree() * Rename and update canAssignNextConsecutiveRegisters() * Add the missing setNextConsecutiveRegisterAssignment() calls * Fix a condition for upperVector * Update spill heurisitic to handle cases for jitstressregs * Misc. remove popcount() check from getConsecutiveRegisters() * jit format * Fix a bug in canAssignNextConsecutiveRegisters() * Add filterConsecutiveCandidates() and perform free/busy candidates scan * Consume the new free/busy consecutive candidates method * Handle case where 'copyReg == assignedReg' * Misc. cleanup * Include LsraExtraFPSetForConsecutive for stress regs * handle case where 'assignedInterval == nullptr' for try_SPILL_COST() * fix build error * Call consecutiveCandidates() only for first refposition * Only perform special handling for non-uppervectorrestore * jit format * Add impVectorTableLookup/impVectorTableLookupExtension * Add the missing break * Update assert * Move definitions in GenTree, fix assert * fix arm issue * Remove common functions * Rename info.needsConsecutiveRegisters to info.compNeedsConsecutiveRegisters * Use needsConsecutiveRegisters template parameter for all configurations * Handle case of round-robin in getConsecutiveRegisters() * Disable tests for Mono * Initialize outArray in test * Add IsSupported checks for VectorLookup/VectorLookupExtension * Fix the test cases for RunReflectionScenario_UnsafeRead() * Review feedback * wip * fix a typo in test case * Add filterConsecutiveCandidatesForSpill() to select range that needs fewer register spilling * Add mono support. * Delay free the registers for VectorTableLookupExtension * fix mono build error --------- Co-authored-by: Tanner Gooding <[email protected]> Co-authored-by: Aleksey Kliger <[email protected]> Co-authored-by: Zoltan Varga <[email protected]>
1 parent 4271678 commit f92d72a

27 files changed

+4431
-53
lines changed

src/coreclr/jit/codegenlinear.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1624,6 +1624,14 @@ void CodeGen::genConsumeRegs(GenTree* tree)
16241624
genConsumeRegs(tree->gtGetOp1());
16251625
genConsumeRegs(tree->gtGetOp2());
16261626
}
1627+
else if (tree->OperIsFieldList())
1628+
{
1629+
for (GenTreeFieldList::Use& use : tree->AsFieldList()->Uses())
1630+
{
1631+
GenTree* fieldNode = use.GetNode();
1632+
genConsumeRegs(fieldNode);
1633+
}
1634+
}
16271635
#endif
16281636
else if (tree->OperIsLocalRead())
16291637
{

src/coreclr/jit/compiler.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6718,6 +6718,10 @@ int Compiler::compCompileHelper(CORINFO_MODULE_HANDLE classPtr,
67186718
compBasicBlockID = 0;
67196719
#endif
67206720

6721+
#ifdef TARGET_ARM64
6722+
info.compNeedsConsecutiveRegisters = false;
6723+
#endif
6724+
67216725
/* Initialize emitter */
67226726

67236727
if (!compIsForInlining())

src/coreclr/jit/compiler.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2809,6 +2809,10 @@ class Compiler
28092809
CORINFO_CLASS_HANDLE clsHnd,
28102810
CORINFO_SIG_INFO* sig,
28112811
CorInfoType simdBaseJitType);
2812+
2813+
#ifdef TARGET_ARM64
2814+
GenTreeFieldList* gtConvertTableOpToFieldList(GenTree* op, unsigned fieldCount);
2815+
#endif
28122816
#endif // FEATURE_HW_INTRINSICS
28132817

28142818
GenTree* gtNewMustThrowException(unsigned helper, var_types type, CORINFO_CLASS_HANDLE clsHnd);
@@ -10061,6 +10065,10 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
1006110065
// Number of class profile probes in this method
1006210066
unsigned compHandleHistogramProbeCount;
1006310067

10068+
#ifdef TARGET_ARM64
10069+
bool compNeedsConsecutiveRegisters;
10070+
#endif
10071+
1006410072
} info;
1006510073

1006610074
ReturnTypeDesc compRetTypeDesc; // ABI return type descriptor for the method

src/coreclr/jit/fginline.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1447,6 +1447,10 @@ void Compiler::fgInsertInlineeBlocks(InlineInfo* pInlineInfo)
14471447

14481448
lvaGenericsContextInUse |= InlineeCompiler->lvaGenericsContextInUse;
14491449

1450+
#ifdef TARGET_ARM64
1451+
info.compNeedsConsecutiveRegisters |= InlineeCompiler->info.compNeedsConsecutiveRegisters;
1452+
#endif
1453+
14501454
// If the inlinee compiler encounters switch tables, disable hot/cold splitting in the root compiler.
14511455
// TODO-CQ: Implement hot/cold splitting of methods with switch tables.
14521456
if (InlineeCompiler->fgHasSwitch && opts.compProcedureSplitting)

src/coreclr/jit/gentree.cpp

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23010,6 +23010,7 @@ GenTree* Compiler::gtNewSimdShuffleNode(var_types type,
2301023010
op2->AsVecCon()->gtSimdVal = vecCns;
2301123011

2301223012
return gtNewSimdHWIntrinsicNode(type, op1, op2, lookupIntrinsic, simdBaseJitType, simdSize, isSimdAsHWIntrinsic);
23013+
2301323014
#else
2301423015
#error Unsupported platform
2301523016
#endif // !TARGET_XARCH && !TARGET_ARM64
@@ -23879,6 +23880,38 @@ GenTree* Compiler::gtNewSimdWithElementNode(var_types type,
2387923880
return gtNewSimdHWIntrinsicNode(type, op1, op2, op3, hwIntrinsicID, simdBaseJitType, simdSize, isSimdAsHWIntrinsic);
2388023881
}
2388123882

23883+
#ifdef TARGET_ARM64
23884+
//------------------------------------------------------------------------
23885+
// gtConvertTableOpToFieldList: Convert a operand that represents table of rows into
23886+
// field list, where each field represents a row in the table.
23887+
//
23888+
// Arguments:
23889+
// op -- Operand to convert.
23890+
// fieldCount -- Number of fields or rows present.
23891+
//
23892+
// Return Value:
23893+
// The GenTreeFieldList node.
23894+
//
23895+
GenTreeFieldList* Compiler::gtConvertTableOpToFieldList(GenTree* op, unsigned fieldCount)
23896+
{
23897+
LclVarDsc* opVarDsc = lvaGetDesc(op->AsLclVar());
23898+
unsigned lclNum = lvaGetLclNum(opVarDsc);
23899+
unsigned fieldSize = opVarDsc->lvSize() / fieldCount;
23900+
var_types fieldType = TYP_SIMD16;
23901+
23902+
GenTreeFieldList* fieldList = new (this, GT_FIELD_LIST) GenTreeFieldList();
23903+
int offset = 0;
23904+
for (unsigned fieldId = 0; fieldId < fieldCount; fieldId++)
23905+
{
23906+
GenTreeLclFld* fldNode = gtNewLclFldNode(lclNum, fieldType, offset);
23907+
fieldList->AddField(this, fldNode, offset, fieldType);
23908+
23909+
offset += fieldSize;
23910+
}
23911+
return fieldList;
23912+
}
23913+
#endif // TARGET_ARM64
23914+
2388223915
GenTree* Compiler::gtNewSimdWithLowerNode(var_types type,
2388323916
GenTree* op1,
2388423917
GenTree* op2,

src/coreclr/jit/hwintrinsic.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,9 @@ enum HWIntrinsicFlag : unsigned int
176176

177177
// The intrinsic supports some sort of containment analysis
178178
HW_Flag_SupportsContainment = 0x2000,
179+
180+
// The intrinsic needs consecutive registers
181+
HW_Flag_NeedsConsecutiveRegisters = 0x4000,
179182
#else
180183
#error Unsupported platform
181184
#endif
@@ -751,6 +754,14 @@ struct HWIntrinsicInfo
751754
return (flags & HW_Flag_SpecialCodeGen) != 0;
752755
}
753756

757+
#ifdef TARGET_ARM64
758+
static bool NeedsConsecutiveRegisters(NamedIntrinsic id)
759+
{
760+
HWIntrinsicFlag flags = lookupFlags(id);
761+
return (flags & HW_Flag_NeedsConsecutiveRegisters) != 0;
762+
}
763+
#endif
764+
754765
static bool HasRMWSemantics(NamedIntrinsic id)
755766
{
756767
HWIntrinsicFlag flags = lookupFlags(id);

src/coreclr/jit/hwintrinsicarm64.cpp

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1900,7 +1900,84 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
19001900
retNode = impAssignMultiRegTypeToVar(op1, sig->retTypeSigClass DEBUGARG(CorInfoCallConvExtension::Managed));
19011901
break;
19021902
}
1903+
case NI_AdvSimd_VectorTableLookup:
1904+
case NI_AdvSimd_Arm64_VectorTableLookup:
1905+
{
1906+
assert(sig->numArgs == 2);
1907+
1908+
CORINFO_ARG_LIST_HANDLE arg1 = sig->args;
1909+
CORINFO_ARG_LIST_HANDLE arg2 = info.compCompHnd->getArgNext(arg1);
1910+
var_types argType = TYP_UNKNOWN;
1911+
CORINFO_CLASS_HANDLE argClass = NO_CLASS_HANDLE;
19031912

1913+
argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg2, &argClass)));
1914+
op2 = getArgForHWIntrinsic(argType, argClass);
1915+
argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg1, &argClass)));
1916+
op1 = impPopStack().val;
1917+
1918+
if (op1->TypeGet() == TYP_STRUCT)
1919+
{
1920+
info.compNeedsConsecutiveRegisters = true;
1921+
unsigned fieldCount = info.compCompHnd->getClassNumInstanceFields(argClass);
1922+
1923+
if (!op1->OperIs(GT_LCL_VAR))
1924+
{
1925+
unsigned tmp = lvaGrabTemp(true DEBUGARG("VectorTableLookup temp tree"));
1926+
1927+
impAssignTempGen(tmp, op1, CHECK_SPILL_NONE);
1928+
op1 = gtNewLclvNode(tmp, argType);
1929+
}
1930+
1931+
op1 = gtConvertTableOpToFieldList(op1, fieldCount);
1932+
}
1933+
else
1934+
{
1935+
assert(varTypeIsSIMD(op1->TypeGet()));
1936+
}
1937+
1938+
retNode = gtNewSimdHWIntrinsicNode(retType, op1, op2, intrinsic, simdBaseJitType, simdSize);
1939+
break;
1940+
}
1941+
case NI_AdvSimd_VectorTableLookupExtension:
1942+
case NI_AdvSimd_Arm64_VectorTableLookupExtension:
1943+
{
1944+
assert(sig->numArgs == 3);
1945+
1946+
CORINFO_ARG_LIST_HANDLE arg1 = sig->args;
1947+
CORINFO_ARG_LIST_HANDLE arg2 = info.compCompHnd->getArgNext(arg1);
1948+
CORINFO_ARG_LIST_HANDLE arg3 = info.compCompHnd->getArgNext(arg2);
1949+
var_types argType = TYP_UNKNOWN;
1950+
CORINFO_CLASS_HANDLE argClass = NO_CLASS_HANDLE;
1951+
1952+
argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg3, &argClass)));
1953+
op3 = getArgForHWIntrinsic(argType, argClass);
1954+
argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg2, &argClass)));
1955+
op2 = impPopStack().val;
1956+
op1 = impPopStack().val;
1957+
1958+
if (op2->TypeGet() == TYP_STRUCT)
1959+
{
1960+
info.compNeedsConsecutiveRegisters = true;
1961+
unsigned fieldCount = info.compCompHnd->getClassNumInstanceFields(argClass);
1962+
1963+
if (!op2->OperIs(GT_LCL_VAR))
1964+
{
1965+
unsigned tmp = lvaGrabTemp(true DEBUGARG("VectorTableLookupExtension temp tree"));
1966+
1967+
impAssignTempGen(tmp, op2, CHECK_SPILL_NONE);
1968+
op2 = gtNewLclvNode(tmp, argType);
1969+
}
1970+
1971+
op2 = gtConvertTableOpToFieldList(op2, fieldCount);
1972+
}
1973+
else
1974+
{
1975+
assert(varTypeIsSIMD(op1->TypeGet()));
1976+
}
1977+
1978+
retNode = gtNewSimdHWIntrinsicNode(retType, op1, op2, op3, intrinsic, simdBaseJitType, simdSize);
1979+
break;
1980+
}
19041981
default:
19051982
{
19061983
return nullptr;

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1002,6 +1002,110 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
10021002
(emitSize == EA_8BYTE) ? INS_OPTS_8B : INS_OPTS_16B);
10031003
break;
10041004

1005+
case NI_AdvSimd_VectorTableLookup:
1006+
case NI_AdvSimd_Arm64_VectorTableLookup:
1007+
{
1008+
unsigned regCount = 0;
1009+
if (intrin.op1->OperIsFieldList())
1010+
{
1011+
GenTreeFieldList* fieldList = intrin.op1->AsFieldList();
1012+
GenTree* firstField = fieldList->Uses().GetHead()->GetNode();
1013+
op1Reg = firstField->GetRegNum();
1014+
INDEBUG(regNumber argReg = op1Reg);
1015+
for (GenTreeFieldList::Use& use : fieldList->Uses())
1016+
{
1017+
regCount++;
1018+
#ifdef DEBUG
1019+
1020+
GenTree* argNode = use.GetNode();
1021+
assert(argReg == argNode->GetRegNum());
1022+
argReg = REG_NEXT(argReg);
1023+
#endif
1024+
}
1025+
}
1026+
else
1027+
{
1028+
regCount = 1;
1029+
op1Reg = intrin.op1->GetRegNum();
1030+
}
1031+
1032+
switch (regCount)
1033+
{
1034+
case 2:
1035+
ins = INS_tbl_2regs;
1036+
break;
1037+
case 3:
1038+
ins = INS_tbl_3regs;
1039+
break;
1040+
case 4:
1041+
ins = INS_tbl_4regs;
1042+
break;
1043+
default:
1044+
assert(regCount == 1);
1045+
assert(ins == INS_tbl);
1046+
break;
1047+
}
1048+
1049+
GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
1050+
break;
1051+
}
1052+
1053+
case NI_AdvSimd_VectorTableLookupExtension:
1054+
case NI_AdvSimd_Arm64_VectorTableLookupExtension:
1055+
{
1056+
assert(isRMW);
1057+
unsigned regCount = 0;
1058+
op1Reg = intrin.op1->GetRegNum();
1059+
op3Reg = intrin.op3->GetRegNum();
1060+
assert(targetReg != op3Reg);
1061+
if (intrin.op2->OperIsFieldList())
1062+
{
1063+
GenTreeFieldList* fieldList = intrin.op2->AsFieldList();
1064+
GenTree* firstField = fieldList->Uses().GetHead()->GetNode();
1065+
op2Reg = firstField->GetRegNum();
1066+
INDEBUG(regNumber argReg = op2Reg);
1067+
for (GenTreeFieldList::Use& use : fieldList->Uses())
1068+
{
1069+
regCount++;
1070+
#ifdef DEBUG
1071+
1072+
GenTree* argNode = use.GetNode();
1073+
1074+
// registers should be consecutive
1075+
assert(argReg == argNode->GetRegNum());
1076+
// and they should not interfere with targetReg
1077+
assert(targetReg != argReg);
1078+
argReg = REG_NEXT(argReg);
1079+
#endif
1080+
}
1081+
}
1082+
else
1083+
{
1084+
regCount = 1;
1085+
op2Reg = intrin.op2->GetRegNum();
1086+
}
1087+
1088+
switch (regCount)
1089+
{
1090+
case 2:
1091+
ins = INS_tbx_2regs;
1092+
break;
1093+
case 3:
1094+
ins = INS_tbx_3regs;
1095+
break;
1096+
case 4:
1097+
ins = INS_tbx_4regs;
1098+
break;
1099+
default:
1100+
assert(regCount == 1);
1101+
assert(ins == INS_tbx);
1102+
break;
1103+
}
1104+
1105+
GetEmitter()->emitIns_Mov(INS_mov, emitTypeSize(node), targetReg, op1Reg, /* canSkip */ true);
1106+
GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op2Reg, op3Reg, opt);
1107+
break;
1108+
}
10051109
default:
10061110
unreached();
10071111
}

src/coreclr/jit/hwintrinsiclistarm64.h

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -477,8 +477,8 @@ HARDWARE_INTRINSIC(AdvSimd, SubtractSaturateScalar,
477477
HARDWARE_INTRINSIC(AdvSimd, SubtractScalar, 8, 2, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sub, INS_sub, INS_fsub, INS_fsub}, HW_Category_SIMD, HW_Flag_SIMDScalar)
478478
HARDWARE_INTRINSIC(AdvSimd, SubtractWideningLower, 8, 2, {INS_ssubl, INS_usubl, INS_ssubl, INS_usubl, INS_ssubl, INS_usubl, INS_ssubw, INS_usubw, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromSecondArg|HW_Flag_SpecialCodeGen)
479479
HARDWARE_INTRINSIC(AdvSimd, SubtractWideningUpper, 16, 2, {INS_ssubl2, INS_usubl2, INS_ssubl2, INS_usubl2, INS_ssubl2, INS_usubl2, INS_ssubw2, INS_usubw2, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromSecondArg|HW_Flag_SpecialCodeGen)
480-
HARDWARE_INTRINSIC(AdvSimd, VectorTableLookup, 8, 2, {INS_tbl, INS_tbl, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_NoFlag)
481-
HARDWARE_INTRINSIC(AdvSimd, VectorTableLookupExtension, 8, 3, {INS_tbx, INS_tbx, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_HasRMWSemantics)
480+
HARDWARE_INTRINSIC(AdvSimd, VectorTableLookup, 8, 2, {INS_tbl, INS_tbl, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_NeedsConsecutiveRegisters)
481+
HARDWARE_INTRINSIC(AdvSimd, VectorTableLookupExtension, 8, 3, {INS_tbx, INS_tbx, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_HasRMWSemantics|HW_Flag_NeedsConsecutiveRegisters)
482482
HARDWARE_INTRINSIC(AdvSimd, Xor, -1, 2, {INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor}, HW_Category_SIMD, HW_Flag_Commutative)
483483
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningLower, 8, 1, {INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromFirstArg)
484484
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningUpper, 16, 1, {INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromFirstArg)
@@ -651,8 +651,8 @@ HARDWARE_INTRINSIC(AdvSimd_Arm64, TransposeEven,
651651
HARDWARE_INTRINSIC(AdvSimd_Arm64, TransposeOdd, -1, 2, {INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2}, HW_Category_SIMD, HW_Flag_NoFlag)
652652
HARDWARE_INTRINSIC(AdvSimd_Arm64, UnzipEven, -1, 2, {INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1}, HW_Category_SIMD, HW_Flag_NoFlag)
653653
HARDWARE_INTRINSIC(AdvSimd_Arm64, UnzipOdd, -1, 2, {INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2}, HW_Category_SIMD, HW_Flag_NoFlag)
654-
HARDWARE_INTRINSIC(AdvSimd_Arm64, VectorTableLookup, 16, 2, {INS_tbl, INS_tbl, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_NoFlag)
655-
HARDWARE_INTRINSIC(AdvSimd_Arm64, VectorTableLookupExtension, 16, 3, {INS_tbx, INS_tbx, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_HasRMWSemantics)
654+
HARDWARE_INTRINSIC(AdvSimd_Arm64, VectorTableLookup, 16, 2, {INS_tbl, INS_tbl, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_NeedsConsecutiveRegisters)
655+
HARDWARE_INTRINSIC(AdvSimd_Arm64, VectorTableLookupExtension, 16, 3, {INS_tbx, INS_tbx, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_HasRMWSemantics|HW_Flag_NeedsConsecutiveRegisters)
656656
HARDWARE_INTRINSIC(AdvSimd_Arm64, ZipHigh, -1, 2, {INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2}, HW_Category_SIMD, HW_Flag_NoFlag)
657657
HARDWARE_INTRINSIC(AdvSimd_Arm64, ZipLow, -1, 2, {INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1}, HW_Category_SIMD, HW_Flag_NoFlag)
658658

0 commit comments

Comments
 (0)