-
Couldn't load subscription status.
- Fork 5.2k
Remove List<T>.Enumerator.MoveNextRare #118425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the List<T>.Enumerator implementation by removing the MoveNextRare method to improve JIT stack allocation of enumerators. The change addresses a performance issue where the JIT's stack allocation optimization for enumerators fails when List<T> instances become sufficiently large (around 1000 elements) because the MoveNextRare method is considered cold and not inlined, causing the enumerator to be boxed instead of stack-allocated.
Key changes:
- Inlines the
MoveNextRarelogic directly into theMoveNextmethod - Reorders field declarations and simplifies the enumerator state management
- Changes the end-of-enumeration index marker from
_list._size + 1to-1
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/List.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/List.cs
Show resolved
Hide resolved
|
@EgorBot -arm -amd -intel using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
[MemoryDiagnoser(false)]
public class Bench
{
[Benchmark]
[ArgumentsSource(nameof(GetLists))]
public int SumList(List<int> list)
{
int sum = 0;
foreach (int item in list)
{
sum += item;
}
return sum;
}
[Benchmark]
[ArgumentsSource(nameof(GetLists))]
public int SumEnumerable(IEnumerable<int> list)
{
int sum = 0;
foreach (int item in list)
{
sum += item;
}
return sum;
}
public static IEnumerable<List<int>> GetLists() =>
from count in new int[] { 1, 10, 1_000 }
select Enumerable.Range(0, count).ToList();
} |
|
Related to #118420, cc: @AndyAyersMS |
8e5ea19 to
da4eb06
Compare
da4eb06 to
507e605
Compare
|
FYI there is a similar pattern in |
|
@EgorBot -arm -amd -intel using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
[MemoryDiagnoser(false)]
public class Bench
{
[Benchmark]
[ArgumentsSource(nameof(GetLists))]
public int SumList(List<int> list)
{
int sum = 0;
foreach (int item in list)
{
sum += item;
}
return sum;
}
[Benchmark]
[ArgumentsSource(nameof(GetLists))]
public int SumEnumerable(IEnumerable<int> list)
{
int sum = 0;
foreach (int item in list)
{
sum += item;
}
return sum;
}
public static IEnumerable<List<int>> GetLists() =>
from count in new int[] { 1, 10, 1_000 }
select Enumerable.Range(0, count).ToList();
} |
|
@jkotas, any concerns? |
|
I assume that this will regress perf for common cases without PGO (NAOT and probably Mono too) since Should we go all the way and mark I do not have a strong opinion either way. This feels like a variant of the code size vs. microbenchmark perf trade off we have faced number of times. |
@EgorBo or @AndyAyersMS can comment more authoritatively, but it seems like it's still inlineable even without PGO: I can mark it with AggressiveInlining, though, if we want to be more sure it happens. |
Cool, LGTM then. |
In cases where the jit sees the struct enumerator in IL, it applies a fair number of inlining boosts even without PGO: |
Take 2 on #116150
The JIT work to stack allocate enumerators stops working with
List<T>whenList<T>gets sufficiently long, e.g. around 1000 elements. At that point, profiling sees the MoveNextRare method used for the last MoveNext as being cold and doesn't inline it. With it not inlined, the boxed enumerator escapes, and the enumerator is then not stack allocated.(This change only moves the boundary significantly, from ~1000 elements to ~10000 elements. At ~10000, it starts hitting a new limit, due to OSR.)