-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed as not planned
Labels
Milestone
Description
In dotnet/perf-autofiling-issues#33182, we discovered that MonoJIT generates two memcpy calls for
[StructLayout(LayoutKind.Sequential, Size = 64)]
public struct Block64 {}
Unsafe.WriteUnaligned(ref dest, Unsafe.ReadUnaligned<Block64>(ref src));
1 il_seq_point intr il: 0x0
2 il_seq_point il: 0x1
3 load_membase R37 <- [fp + 0x18]
4 add_imm R38 <- fp [32]
5 move R40 <- R38
6 move R41 <- R37
7 iconst R42 <- [64]
8 voidcall [void string:memcpy (byte*,byte*,int)] [r0 <- R40] [r1 <- R41] [r2 <- R42] clobbers: c
9 il_seq_point il: 0x8, nonempty-stack
10 add_imm R43 <- fp [32]
11 load_membase R44 <- [fp + 0x10]
12 move R46 <- R44
13 move R47 <- R43
14 iconst R48 <- [64]
15 voidcall [void string:memcpy (byte*,byte*,int)] [r0 <- R46] [r1 <- R47] [r2 <- R48] clobbers: c
16 il_seq_point il: 0xd, nonempty-stack
17 il_seq_point il: 0xd
18 il_seq_point il: 0xe
For this scenario, MonoJIT takes the "safe" path in mini_emit_memcpy_internal (passes size / align > MAX_INLINE_COPIES) instead of using mini_emit_memcpy that handles with copy unrolling.
In comparison Unsafe.As<byte, Block64>(ref dest) = Unsafe.As<byte, Block64>(ref src); leads to:
1 il_seq_point intr il: 0x0
2 il_seq_point il: 0x1
3 il_seq_point il: 0x7, nonempty-stack
4 il_seq_point il: 0xd, nonempty-stack
5 load_membase R39 <- [fp + 0x10]
6 nop
7 load_membase R40 <- [fp + 0x18]
8 nop
9 iconst R41 <- [64]
10 voidcall [void string:memcpy (byte*,byte*,int)] [r0 <- R39] [r1 <- R40] [r2 <- R41] clobbers: c
11 il_seq_point il: 0x17
This is causing serious regression on MonoJIT dotnet/perf-autofiling-issues#33182 and more. Fixing this would bring over 400+ microbenchmark improvements (dotnet/perf-autofiling-issues#41406 (comment))
EgorBo