Skip to content

Conversation

@stephentoub
Copy link
Member

@stephentoub stephentoub commented Aug 6, 2025

Take 2 on #116150

The JIT work to stack allocate enumerators stops working with List<T> when List<T> gets sufficiently long, e.g. around 1000 elements. At that point, profiling sees the MoveNextRare method used for the last MoveNext as being cold and doesn't inline it. With it not inlined, the boxed enumerator escapes, and the enumerator is then not stack allocated.

(This change only moves the boundary significantly, from ~1000 elements to ~10000 elements. At ~10000, it starts hitting a new limit, due to OSR.)

Copilot AI review requested due to automatic review settings August 6, 2025 02:22
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 6, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the List<T>.Enumerator implementation by removing the MoveNextRare method to improve JIT stack allocation of enumerators. The change addresses a performance issue where the JIT's stack allocation optimization for enumerators fails when List<T> instances become sufficiently large (around 1000 elements) because the MoveNextRare method is considered cold and not inlined, causing the enumerator to be boxed instead of stack-allocated.

Key changes:

  • Inlines the MoveNextRare logic directly into the MoveNext method
  • Reorders field declarations and simplifies the enumerator state management
  • Changes the end-of-enumeration index marker from _list._size + 1 to -1

@stephentoub
Copy link
Member Author

@EgorBot -arm -amd -intel

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[MemoryDiagnoser(false)]
public class Bench
{
    [Benchmark]
    [ArgumentsSource(nameof(GetLists))]
    public int SumList(List<int> list)
    {
        int sum = 0;
        foreach (int item in list)
        {
            sum += item;
        }
        return sum;
    }

    [Benchmark]
    [ArgumentsSource(nameof(GetLists))]
    public int SumEnumerable(IEnumerable<int> list)
    {
        int sum = 0;
        foreach (int item in list)
        {
            sum += item;
        }
        return sum;
    }

    public static IEnumerable<List<int>> GetLists() =>
        from count in new int[] { 1, 10, 1_000 }
        select Enumerable.Range(0, count).ToList();
}

@stephentoub stephentoub added this to the 10.0.0 milestone Aug 6, 2025
@stephentoub stephentoub added area-System.Collections and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 6, 2025
@stephentoub
Copy link
Member Author

Related to #118420, cc: @AndyAyersMS

@stephentoub stephentoub force-pushed the listenumeratortake2 branch from da4eb06 to 507e605 Compare August 6, 2025 14:10
@AndyAyersMS
Copy link
Member

FYI there is a similar pattern in PriorityQueue

@stephentoub
Copy link
Member Author

@EgorBot -arm -amd -intel

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[MemoryDiagnoser(false)]
public class Bench
{
    [Benchmark]
    [ArgumentsSource(nameof(GetLists))]
    public int SumList(List<int> list)
    {
        int sum = 0;
        foreach (int item in list)
        {
            sum += item;
        }
        return sum;
    }

    [Benchmark]
    [ArgumentsSource(nameof(GetLists))]
    public int SumEnumerable(IEnumerable<int> list)
    {
        int sum = 0;
        foreach (int item in list)
        {
            sum += item;
        }
        return sum;
    }

    public static IEnumerable<List<int>> GetLists() =>
        from count in new int[] { 1, 10, 1_000 }
        select Enumerable.Range(0, count).ToList();
}

@stephentoub
Copy link
Member Author

@jkotas, any concerns?

@jkotas
Copy link
Member

jkotas commented Aug 6, 2025

I assume that this will regress perf for common cases without PGO (NAOT and probably Mono too) since MoveNext() hot path is not going to be inlined anymore. Is that correct?

Should we go all the way and mark MoveNext() with aggressive inlining?

I do not have a strong opinion either way. This feels like a variant of the code size vs. microbenchmark perf trade off we have faced number of times.

@stephentoub
Copy link
Member Author

I assume that this will regress perf for common cases without PGO (NAOT and probably Mono too) since MoveNext() hot path is not going to be inlined anymore. Is that correct?

@EgorBo or @AndyAyersMS can comment more authoritatively, but it seems like it's still inlineable even without PGO:
SharpLab

I can mark it with AggressiveInlining, though, if we want to be more sure it happens.

@jkotas
Copy link
Member

jkotas commented Aug 6, 2025

it seems like it's still inlineable even without PGO:

Cool, LGTM then.

@stephentoub stephentoub enabled auto-merge (squash) August 6, 2025 21:06
@AndyAyersMS
Copy link
Member

it seems like it's still inlineable even without PGO:

Cool, LGTM then.

In cases where the jit sees the struct enumerator in IL, it applies a fair number of inlining boosts even without PGO:

Invoking compiler for the inlinee method System.Collections.Generic.List`1+Enumerator[int]:MoveNext():bool:this :
...
multiplier in methods of struct increased to 3.
9 ldfld or stfld over arguments which are structs.  Multiplier increased to 4.
Inline candidate has arg that feeds range check.  Multiplier increased to 5.
Inline candidate is generic and caller is not.  Multiplier increased to 7.
Inline candidate has 1 foldable branches.  Multiplier increased to 11.
Inline has 1 foldable binary expressions.  Multiplier increased to 13.
Inline candidate has an arg that feeds a constant test.  Multiplier increased to 14.
Inline candidate callsite is in a loop.  Multiplier increased to 17.
calleeNativeSizeEstimate=829
callsiteNativeSizeEstimate=85
benefit multiplier=17
threshold=1445
Native estimate for function size is within threshold for inlining 82.9 <= 144.5 (multiplier = 17)

@stephentoub stephentoub merged commit 8b08265 into dotnet:main Aug 6, 2025
137 checks passed
@stephentoub stephentoub deleted the listenumeratortake2 branch August 6, 2025 22:24
@github-actions github-actions bot locked and limited conversation to collaborators Sep 6, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants