Skip to content

Commit 1e029d0

Browse files
authored
Enable EVEX feature: embedded broadcast for Vector128/256/512.Add() in limited cases (#84821)
* Enable EVEX feature: embedded broadcast Embedded Broadcast is enabled in Vector256<float>.Add() with limited cases: 1. Vector256.Add(Vec, Vector256.Create(DCon)); 2. Vector256<float> VecCns = Vector256.Create(DCon); Vector256.Add(Vec, VecCns); 3. Vector256.Add(Vec, Vector256.Create(LCL_VAR)); 4. Vector256<float> VecCns = Vector256.Create(LCL_VAR); Vector256.Add(Vec, VecCns); Note: Case 2 4 can only be optimized when DOTNET_TieredCompilation = 0. * remove some irrelevent change from previous main. * Enable containment at Broadcast intrinsic to improve the embedded broadcast enabling works. * Convert the check logics on broadcast into a flag * bug fixes: 1. fixed the contain logic at lowering, to accomadate the situation when both operands for a EB compatible node are EB candidates. 2. fixed some unexpected EVEX.b set at some non-EVEX instructions on x86 * apply format patch. * Add "insOpts" data structure to xarch: insOpts may contain information on the EVEX.b bit, currently only embedded broaddcast * Add "OperIsBroadcastScalar" check: This check is to ensure the intrinsic is actually a broadcast scalar intrinsic, the reason to add this check is that gentree flags are using overlapping definition, GTF_BROADCAST_EMBEDDED has some conflicting definition, so we need to ensure the flag we checked does not come from other overlapping flags. * rebase the branch and resolve conflicts * changes based on the reivews: 1. removed the gentree flag GTF_EMBEDDED_BROADCAST. 2. mark the embedded broadcast node by making it contained. 3. improved logics in GetMemOpSize() to return the correct pointer size when embedded broadcast is enabled. 4. improved logics in genOperandDesc() to emit scalar when constant vector operand is found to be created from scalar. * apply format patch * bug fixes * bug fixes * aaply format patch * Enable embedded broadcast for Vector128<float>.Add * Enable embedded broadcast for Vector512<float>.Add * make double as embedded broadcast supported * Add EB support to AVX_BroadcastScalarToVector* * apply format patch * Enable embedded broadcast for double const vector * Enable embedded broadcast for integer Add. * Changes based on the review: 1. Change GenTreeHWIntrinsic::OperIsEmbBroadcastHWIntrinsic to OperIsEmbBroadcastCompatible 2. removed OperIsBroadcastScalar 3. formatting 4. correct errors in the comments. * removed the gentree flag: GTF_VECCON_FROMSCALAR * Bug fixes on embedded broadcast with AVX_Broadcast * enable embedded broadcast in R_R_A path * apply format patch * bug fixes: re-introduce "OperIsBroadcastScalar", there are some cases when non-broadcast node (e.g. Load, Read) contained by embedded broadcast and embedded broadcast is enabled unexpectedly, using this method can filter out those cases. * Changes based on reviews: 1. code style improvement 2. fixes typos and errors in the comments. 3. extract the operand swap logic when lowering Create node into a function: TryCanonizeEmbBroadcastCandicate() * unfold VecCon node when lowering if this node is eligible for embedded broadcast. * apply format patch * bug fixes: 1. added missing default branch 2. filter out some possible embedded broadcast cases for some better optimization * resolve the mishandling for the previous conflict. * move the unfolding logic to ContainChecks * Code changes based on the review * apply format patch * support embedded broadcast for GT_IND as the operand of a broadcast node. * bug fixes: Long type should only be on 64-bit system. * apply format patch * Introduce MakeHWIntrinsicSrcContained(): This function will handle the case that constant vector is the operand of embedded broadcast ops. If the constant vector is eligible for embedded broadcast, will unfold the constatn vector to the corresponding broadcast intrinsic form. * Code changes based on reviews: 1. a helper function to detect embedded broadcast compatible flag 2. contain logic improvement. 3. typo fixes. * Code changes based on review * apply format patch * Code changes based on review: 1. deleted irrelevant comments. Move the contain check up to cover more cases. * Code changes based on review: 1. Update comment to keep up with the changes in InstrDesc. 2. Removed un-needed argumnet in the irrelevant method.
1 parent e126ca3 commit 1e029d0

File tree

13 files changed

+566
-62
lines changed

13 files changed

+566
-62
lines changed

src/coreclr/jit/codegeninterface.h

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,9 @@ class CodeGenInterface
127127
#define INST_FP 0x01 // is it a FP instruction?
128128
public:
129129
static bool instIsFP(instruction ins);
130-
130+
#if defined(TARGET_XARCH)
131+
static bool instIsEmbeddedBroadcastCompatible(instruction ins);
132+
#endif // TARGET_XARCH
131133
//-------------------------------------------------------------------------
132134
// Liveness-related fields & methods
133135
public:
@@ -764,6 +766,10 @@ class CodeGenInterface
764766

765767
virtual const char* siStackVarName(size_t offs, size_t size, unsigned reg, unsigned stkOffs) = 0;
766768
#endif // LATE_DISASM
769+
770+
#if defined(TARGET_XARCH)
771+
bool IsEmbeddedBroadcastEnabled(instruction ins, GenTree* op);
772+
#endif
767773
};
768774

769775
#endif // _CODEGEN_INTERFACE_H_

src/coreclr/jit/emit.h

Lines changed: 39 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -781,6 +781,9 @@ class emitter
781781
unsigned _idCallRegPtr : 1; // IL indirect calls: addr in reg
782782
unsigned _idCallAddr : 1; // IL indirect calls: can make a direct call to iiaAddr
783783
unsigned _idNoGC : 1; // Some helpers don't get recorded in GC tables
784+
#if defined(TARGET_XARCH)
785+
unsigned _idEvexbContext : 1; // does EVEX.b need to be set.
786+
#endif // TARGET_XARCH
784787

785788
#ifdef TARGET_ARM64
786789
opSize _idOpSize : 3; // operand size: 0=1 , 1=2 , 2=4 , 3=8, 4=16
@@ -814,8 +817,8 @@ class emitter
814817

815818
////////////////////////////////////////////////////////////////////////
816819
// Space taken up to here:
817-
// x86: 46 bits
818-
// amd64: 46 bits
820+
// x86: 47 bits
821+
// amd64: 47 bits
819822
// arm: 48 bits
820823
// arm64: 50 bits
821824
// loongarch64: 46 bits
@@ -830,8 +833,10 @@ class emitter
830833
#define ID_EXTRA_BITFIELD_BITS (16)
831834
#elif defined(TARGET_ARM64)
832835
#define ID_EXTRA_BITFIELD_BITS (18)
833-
#elif defined(TARGET_XARCH) || defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64)
836+
#elif defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64)
834837
#define ID_EXTRA_BITFIELD_BITS (14)
838+
#elif defined(TARGET_XARCH)
839+
#define ID_EXTRA_BITFIELD_BITS (15)
835840
#else
836841
#error Unsupported or unset target architecture
837842
#endif
@@ -866,8 +871,8 @@ class emitter
866871

867872
////////////////////////////////////////////////////////////////////////
868873
// Space taken up to here (with/without prev offset, assuming host==target):
869-
// x86: 52/48 bits
870-
// amd64: 53/48 bits
874+
// x86: 53/49 bits
875+
// amd64: 54/49 bits
871876
// arm: 54/50 bits
872877
// arm64: 57/52 bits
873878
// loongarch64: 53/48 bits
@@ -1529,6 +1534,19 @@ class emitter
15291534
_idNoGC = val;
15301535
}
15311536

1537+
#ifdef TARGET_XARCH
1538+
bool idIsEvexbContext() const
1539+
{
1540+
return _idEvexbContext != 0;
1541+
}
1542+
void idSetEvexbContext()
1543+
{
1544+
assert(_idEvexbContext == 0);
1545+
_idEvexbContext = 1;
1546+
assert(_idEvexbContext == 1);
1547+
}
1548+
#endif
1549+
15321550
#ifdef TARGET_ARMARCH
15331551
bool idIsLclVar() const
15341552
{
@@ -3655,9 +3673,25 @@ inline unsigned emitter::emitGetInsCIargs(instrDesc* id)
36553673
//
36563674
emitAttr emitter::emitGetMemOpSize(instrDesc* id) const
36573675
{
3676+
36583677
emitAttr defaultSize = id->idOpSize();
36593678
instruction ins = id->idIns();
3679+
if (id->idIsEvexbContext())
3680+
{
3681+
// should have the assumption that Evex.b now stands for the embedded broadcast context.
3682+
// reference: Section 2.7.5 in Intel 64 and ia-32 architectures software developer's manual volume 2.
3683+
ssize_t inputSize = GetInputSizeInBytes(id);
3684+
switch (inputSize)
3685+
{
3686+
case 4:
3687+
return EA_4BYTE;
3688+
case 8:
3689+
return EA_8BYTE;
36603690

3691+
default:
3692+
unreached();
3693+
}
3694+
}
36613695
switch (ins)
36623696
{
36633697
case INS_pextrb:

src/coreclr/jit/emitxarch.cpp

Lines changed: 60 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1231,9 +1231,10 @@ bool emitter::TakesEvexPrefix(const instrDesc* id) const
12311231
#define DEFAULT_BYTE_EVEX_PREFIX_MASK 0xFFFFFFFF00000000ULL
12321232
#define LBIT_IN_BYTE_EVEX_PREFIX 0x0000002000000000ULL
12331233
#define LPRIMEBIT_IN_BYTE_EVEX_PREFIX 0x0000004000000000ULL
1234+
#define EVEX_B_BIT 0x0000001000000000ULL
12341235

12351236
//------------------------------------------------------------------------
1236-
// AddEvexPrefix: Add default EVEX perfix with only LL' bits set.
1237+
// AddEvexPrefix: Add default EVEX prefix with only LL' bits set.
12371238
//
12381239
// Arguments:
12391240
// ins -- processor instruction to check.
@@ -1268,6 +1269,22 @@ emitter::code_t emitter::AddEvexPrefix(instruction ins, code_t code, emitAttr at
12681269
return code;
12691270
}
12701271

1272+
//------------------------------------------------------------------------
1273+
// AddEvexPrefix: set Evex.b bit if EvexbContext is set in instruction descritor.
1274+
//
1275+
// Arguments:
1276+
// code -- opcode bits.
1277+
//
1278+
// Return Value:
1279+
// encoded code with Evex.b set if needed.
1280+
//
1281+
emitter::code_t emitter::AddEvexbBit(code_t code)
1282+
{
1283+
assert(hasEvexPrefix(code));
1284+
code |= EVEX_B_BIT;
1285+
return code;
1286+
}
1287+
12711288
// Returns true if this instruction requires a VEX prefix
12721289
// All AVX instructions require a VEX prefix
12731290
bool emitter::TakesVexPrefix(instruction ins) const
@@ -6667,7 +6684,8 @@ void emitter::emitIns_R_S_I(instruction ins, emitAttr attr, regNumber reg1, int
66676684
emitCurIGsize += sz;
66686685
}
66696686

6670-
void emitter::emitIns_R_R_A(instruction ins, emitAttr attr, regNumber reg1, regNumber reg2, GenTreeIndir* indir)
6687+
void emitter::emitIns_R_R_A(
6688+
instruction ins, emitAttr attr, regNumber reg1, regNumber reg2, GenTreeIndir* indir, insOpts instOptions)
66716689
{
66726690
assert(IsAvx512OrPriorInstruction(ins));
66736691
assert(IsThreeOperandAVXInstruction(ins));
@@ -6678,6 +6696,11 @@ void emitter::emitIns_R_R_A(instruction ins, emitAttr attr, regNumber reg1, regN
66786696
id->idIns(ins);
66796697
id->idReg1(reg1);
66806698
id->idReg2(reg2);
6699+
if (instOptions == INS_OPTS_EVEX_b)
6700+
{
6701+
assert(UseEvexEncoding());
6702+
id->idSetEvexbContext();
6703+
}
66816704

66826705
emitHandleMemOp(indir, id, (ins == INS_mulx) ? IF_RWR_RWR_ARD : emitInsModeFormat(ins, IF_RRD_RRD_ARD), ins);
66836706

@@ -6778,8 +6801,13 @@ void emitter::emitIns_R_AR_R(instruction ins,
67786801
emitCurIGsize += sz;
67796802
}
67806803

6781-
void emitter::emitIns_R_R_C(
6782-
instruction ins, emitAttr attr, regNumber reg1, regNumber reg2, CORINFO_FIELD_HANDLE fldHnd, int offs)
6804+
void emitter::emitIns_R_R_C(instruction ins,
6805+
emitAttr attr,
6806+
regNumber reg1,
6807+
regNumber reg2,
6808+
CORINFO_FIELD_HANDLE fldHnd,
6809+
int offs,
6810+
insOpts instOptions)
67836811
{
67846812
assert(IsAvx512OrPriorInstruction(ins));
67856813
assert(IsThreeOperandAVXInstruction(ins));
@@ -6797,6 +6825,11 @@ void emitter::emitIns_R_R_C(
67976825
id->idReg1(reg1);
67986826
id->idReg2(reg2);
67996827
id->idAddr()->iiaFieldHnd = fldHnd;
6828+
if (instOptions == INS_OPTS_EVEX_b)
6829+
{
6830+
assert(UseEvexEncoding());
6831+
id->idSetEvexbContext();
6832+
}
68006833

68016834
UNATIVE_OFFSET sz = emitInsSizeCV(id, insCodeRM(ins));
68026835
id->idCodeSize(sz);
@@ -6829,7 +6862,8 @@ void emitter::emitIns_R_R_R(instruction ins, emitAttr attr, regNumber targetReg,
68296862
emitCurIGsize += sz;
68306863
}
68316864

6832-
void emitter::emitIns_R_R_S(instruction ins, emitAttr attr, regNumber reg1, regNumber reg2, int varx, int offs)
6865+
void emitter::emitIns_R_R_S(
6866+
instruction ins, emitAttr attr, regNumber reg1, regNumber reg2, int varx, int offs, insOpts instOptions)
68336867
{
68346868
assert(IsAvx512OrPriorInstruction(ins));
68356869
assert(IsThreeOperandAVXInstruction(ins));
@@ -6842,6 +6876,11 @@ void emitter::emitIns_R_R_S(instruction ins, emitAttr attr, regNumber reg1, regN
68426876
id->idReg2(reg2);
68436877
id->idAddr()->iiaLclVar.initLclVarAddr(varx, offs);
68446878

6879+
if (instOptions == INS_OPTS_EVEX_b)
6880+
{
6881+
assert(UseEvexEncoding());
6882+
id->idSetEvexbContext();
6883+
}
68456884
#ifdef DEBUG
68466885
id->idDebugOnlyInfo()->idVarRefOffs = emitVarRefOffs;
68476886
#endif
@@ -8134,14 +8173,15 @@ void emitter::emitIns_SIMD_R_R_I(instruction ins, emitAttr attr, regNumber targe
81348173
// indir -- The GenTreeIndir used for the memory address
81358174
//
81368175
void emitter::emitIns_SIMD_R_R_A(
8137-
instruction ins, emitAttr attr, regNumber targetReg, regNumber op1Reg, GenTreeIndir* indir)
8176+
instruction ins, emitAttr attr, regNumber targetReg, regNumber op1Reg, GenTreeIndir* indir, insOpts instOptions)
81388177
{
81398178
if (UseSimdEncoding())
81408179
{
8141-
emitIns_R_R_A(ins, attr, targetReg, op1Reg, indir);
8180+
emitIns_R_R_A(ins, attr, targetReg, op1Reg, indir, instOptions);
81428181
}
81438182
else
81448183
{
8184+
assert(instOptions == INS_OPTS_NONE);
81458185
emitIns_Mov(INS_movaps, attr, targetReg, op1Reg, /* canSkip */ true);
81468186
emitIns_R_A(ins, attr, targetReg, indir);
81478187
}
@@ -8159,15 +8199,21 @@ void emitter::emitIns_SIMD_R_R_A(
81598199
// fldHnd -- The CORINFO_FIELD_HANDLE used for the memory address
81608200
// offs -- The offset added to the memory address from fldHnd
81618201
//
8162-
void emitter::emitIns_SIMD_R_R_C(
8163-
instruction ins, emitAttr attr, regNumber targetReg, regNumber op1Reg, CORINFO_FIELD_HANDLE fldHnd, int offs)
8202+
void emitter::emitIns_SIMD_R_R_C(instruction ins,
8203+
emitAttr attr,
8204+
regNumber targetReg,
8205+
regNumber op1Reg,
8206+
CORINFO_FIELD_HANDLE fldHnd,
8207+
int offs,
8208+
insOpts instOptions)
81648209
{
81658210
if (UseSimdEncoding())
81668211
{
8167-
emitIns_R_R_C(ins, attr, targetReg, op1Reg, fldHnd, offs);
8212+
emitIns_R_R_C(ins, attr, targetReg, op1Reg, fldHnd, offs, instOptions);
81688213
}
81698214
else
81708215
{
8216+
assert(instOptions == INS_OPTS_NONE);
81718217
emitIns_Mov(INS_movaps, attr, targetReg, op1Reg, /* canSkip */ true);
81728218
emitIns_R_C(ins, attr, targetReg, fldHnd, offs);
81738219
}
@@ -8222,14 +8268,15 @@ void emitter::emitIns_SIMD_R_R_R(
82228268
// offs -- The offset added to the memory address from varx
82238269
//
82248270
void emitter::emitIns_SIMD_R_R_S(
8225-
instruction ins, emitAttr attr, regNumber targetReg, regNumber op1Reg, int varx, int offs)
8271+
instruction ins, emitAttr attr, regNumber targetReg, regNumber op1Reg, int varx, int offs, insOpts instOptions)
82268272
{
82278273
if (UseSimdEncoding())
82288274
{
8229-
emitIns_R_R_S(ins, attr, targetReg, op1Reg, varx, offs);
8275+
emitIns_R_R_S(ins, attr, targetReg, op1Reg, varx, offs, instOptions);
82308276
}
82318277
else
82328278
{
8279+
assert(instOptions == INS_OPTS_NONE);
82338280
emitIns_Mov(INS_movaps, attr, targetReg, op1Reg, /* canSkip */ true);
82348281
emitIns_R_S(ins, attr, targetReg, varx, offs);
82358282
}
@@ -15717,7 +15764,7 @@ BYTE* emitter::emitOutputLJ(insGroup* ig, BYTE* dst, instrDesc* i)
1571715764
// Return Value:
1571815765
// size in bytes.
1571915766
//
15720-
ssize_t emitter::GetInputSizeInBytes(instrDesc* id)
15767+
ssize_t emitter::GetInputSizeInBytes(instrDesc* id) const
1572115768
{
1572215769
insFlags inputSize = static_cast<insFlags>((CodeGenInterface::instInfo[id->idIns()] & Input_Mask));
1572315770

0 commit comments

Comments
 (0)