Skip to content

Conversation

@snickolls-arm
Copy link
Contributor

This branch introduces a new type (TYP_SIMDSV) to the JIT for supporting scalable vectors, registers whose size is determined by hardware at runtime but remains constant for the duration of a process. For ARM64, this means we have vectors sized in powers of 2 from 128 bits up to 2048 bits depending on hardware implementation, with an instruction available to query this size for compiler use. I've also adjusted the implementation of TYP_MASK to scale with the vector length on ARM64 in a similar manner.

This builds on and borrows much of Kunal's work in: #115948.

This PR focuses on enabling scalable type awareness as a foundation for future vector length agnostic code generation. I've refactored existing systems with a way of retrieving the size of the type with access to the compiler runtime state. This mainly involves refactoring areas that depend on genTypeSize to call a new instance method Compiler::getSizeOfType. This is allowing the JIT to emit SVE register moves, loads and stores for Vector<T> etc. but doesn't change the implementation of the Vector<T> API surface. It still emits NEON for arithmetic operations, logical operations, floating point operations and so on. The codegen is functionally equivalent while the vector length is set to 128 bits.

With this type being passed around, we can begin implementing vector length agnostic code in subsequent work, as we can now test for a TYP_SIMDSV and distinguish it from a fixed size TYP_SIMD16.

Testing

Importing Vector<T> as the new type is gated behind DOTNET_JitUseScalableVectorT, meaning TYP_SIMDSV will not appear in compilation unless that variable is set. Likewise, the VM will not report use of the HFA type CORINFO_HFA_ELEM_VECTORT unless this variable is set. With the variable set, testing is validating the implementation of the new type. Without the variable set, testing is validating that TYP_SIMD16 behavior remains stable under these changes.

SuperPMI method contexts are currently out of sync with the updated JIT-EE interface, which causes mismatches in return values between the JIT and EE. I don't expect this to stop until DOTNET_JitUseScalableVectorT is removed and standardized, so this feature will need to be tested with some specially generated MCH files while being developed.

Future

We will be able to remove DOTNET_JitUseScalableVectorT once Vector<T> is working in AOT compilation. This will require all phases to be aware that TYP_SIMDSV is dynamically sized and make the choice if the pass can run or not depending on this.

For a transitional approach, we could allow specifying a fixed target vector length for AOT compilation while broader vector-length agnostic support is implemented. In some cases we might be able to take advantage of knowing the vector size at compilation time, so having both approaches available might be advantageous for JIT mode.

Code Example

static void SimdAdd(ReadOnlySpan<float> a, ReadOnlySpan<float> b, Span<float> c)
{
    int len = Math.Min(Math.Min(a.Length, b.Length), c.Length);
    int i = 0;

    if (Vector.IsHardwareAccelerated)
    {
        int width = Vector<float>.Count;
        for (; i <= len - width; i += width)
        {
            var va = new Vector<float>(a.Slice(i, width));
            var vb = new Vector<float>(b.Slice(i, width));
            (va + vb).CopyTo(c.Slice(i, width));
        }
    }

   // Scalar tail (or full path if no HW acceleration)
   for (; i < len; i++)
       c[i] = a[i] + b[i];
}

With DOTNET_JitUseScalableVectorT=1, the main vector loop body compiles to:

G_M23455_IG03:        ; offs=0x000020, size=0x0044, bbWeight=4, PerfScore 100.00, gcrefRegs=0000 {}, byrefRegs=0015 {x0 x2 x4}, BB03 [0002], BB20 [0018], BB32 [0033], BB44 [0048], byref, isz

IN000b: 000020      mov     w8, w7
IN000c: 000024      add     x9, x8, #4
IN000d: 000028      cmp     x9, w1, UXTW
IN000e: 00002C      bhi     G_M23455_IG10
IN000f: 000030      lsl     x8, x8, #2
IN0010: 000034      add     x10, x0, x8
IN0011: 000038      ldr     z24, [x10]                                 ;; was ldr q24, [x10]
IN0012: 00003C      cmp     x9, w3, UXTW
IN0013: 000040      bhi     G_M23455_IG10
IN0014: 000044      add     x10, x2, x8
IN0015: 000048      ldr     z25, [x10]                                 ;; was ldr q25, [x10]
IN0016: 00004C      fadd    v24.4s, v24.4s, v25.4s
IN0017: 000050      cmp     x9, w5, UXTW
IN0018: 000054      bhi     G_M23455_IG10
IN0019: 000058      add     x8, x4, x8
IN001a: 00005C      str     z24, [x8]                                  ;; was str q24, [x8]
IN001b: 000060      add     w7, w7, #4

G_M23455_IG04:        ; offs=0x000064, size=0x000C, bbWeight=8, PerfScore 16.00, gcrefRegs=0000 {}, byrefRegs=0015 {x0 x2 x4}, loop=IG03, BB04 [0003], byref, isz

IN001c: 000064      sub     w8, w6, #4
IN001d: 000068      cmp     w8, w7
IN001e: 00006C      bge     G_M23455_IG03

Contributing towards #120599

Current implementation derives a size based on the identified SIMD type,
and then uses the size to derive the node type. It should instead
directly derive the node type from the identified SIMD type, because
some SIMD types will not have a statically known size, and this size may
conflict with other SIMD types.
The function that determines the size of a local variable needs
to have access to compiler state at runtime to handle variable types
with sizes that depend on some runtime value, for example Vector<T>
when backed by ARM64 scalable vectors.
The function that determines the size of an indirection needs to
have access to compiler state at runtime to handle variable types
with sizes that depend on some runtime value, for example Vector<T>
when backed by ARM64 scalable vectors.
Create a new type designed to support Vector<T> with size evaluated at
runtime. Adds a new HFA type to the VM to support passing Vector<T> as
a scalable vector register on ARM64. Both types are experimental and
locked behind the DOTNET_JitUseScalableVectorT configuration option.

This first patch implements SVE codegen for Vector<T>, mainly for managing
Vector<T> as a data structure that can be placed in a Z register.  When
DOTNET_JitUseScalableVectorT=1, the JIT will move the type around using
SVE instructions operating on Z registers. It does not yet unlock longer
vector lengths or implement operations from the Vector<T> API surface using
SVE. This API still generates NEON code, which is functionally equivalent
so long as Vector<T>.Count is limited to 128-bits.

When DOTNET_JitUseScalableVectorT=0 the code generated for Vector<T> should
have zero functional difference but may have some cosmetic differences as some
refactoring has been done on general SIMD codegen to support the new type.
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 27, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 27, 2025
@snickolls-arm
Copy link
Contributor Author

@dotnet/arm64-contrib @a74nh @tannergooding

This is the outcome of the investigation I've been doing into supporting scalable vector types in the JIT. I've worked through a few problems and test failures but I still consider this early experimentation, feedback or suggestions on direction are appreciated.

@jkotas jkotas added the arm-sve Work related to arm64 SVE/SVE2 support label Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants