[NativeAOT] Using the same CastCache implementation as in CoreClr #84430

VSadov · 2023-04-06T17:29:48Z

Reasons:

code sharing
CoreClr implementation does not require a Crst , or any kind of locks.

ghost · 2023-04-06T17:30:01Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes: #75111

Reasons:

code sharing
CoreClr implementation does not require a Crst , or any kind of locks.

Author:	VSadov
Assignees:	-
Labels:	`area-NativeAOT-coreclr`
Milestone:	-

jkotas · 2023-04-06T18:07:18Z

src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastCache.cs

+
+#if TARGET_64BIT
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        private static ulong RotateLeft(ulong value, int offset)


This should be deleted and replaced by BitOperations.RotateLeft.

I've introduced this because NativeAOT was not building in the test configuration that was missing a lot of things - like Volatile, Interlocked, BitOperations.
Later I moved custom implementations under Test.CoreLib, but missed this one.

We may want to omit this cache for Test.CoreLib. Test.CoreLib has simplistic implementation of number of other subsystems.

At this point it would be easy to move these helpers to Test.CoreLib - as custom BitOperations implementations. Then we can have the cache in Test.CorLib and have extra test coverage.

However, if having a minimal implementation is the point, then I can instead add a Test.CoreLib - specific implementation of the cache that does not do anything.

Then we can have the cache in Test.CorLib and have extra test coverage.

There are no relevant tests running against Test.CorLib. All tests that matter for this run against real CoreLib.

I reverted additional API implementations for Test.CorLib and added a trivial no-op cache instead.

jkotas · 2023-04-06T18:10:02Z

src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastCache.cs

+                        // - issue a load barrier before reading _version
+                        // benchmarks on available hardware (Jan 2020) show that use of a read barrier is cheaper.
+
+#if CORECLR


If this is required for correctness, it should not be under ifdef.

It is either a load fence here or the two loads above need to be acquires. As it was measured, when this was implemented, one fence was noticeably cheaper.

We need to add Interlocked.ReadMemoryBarrier() to NativeAOT. And it needs to be an intrinsic - no point in implementing it as an internal call.

I will log an issue on Interlocked.ReadMemoryBarrier(). Once that is added, we can remove ifdefs here and around two acquires above.

Logged an issue for adding Interlocked.ReadMemoryBarrier() intrinsic - #84445

jkotas · 2023-04-06T18:11:09Z

src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastCache.cs

+        }
+
+        // the rest is the support for updating the cache.
+        // in CoreClr the cache is only updated in the native code


Would it make sense for CoreCLR to just call this managed implementation instead?

Interesting idea. I think what could prevent that is:

if we need to use the cache before we can run managed code.
JIT calls into casting machinery, so it is hard to tell if we may need the cache too early.

if we need to use the cache before managed cctor has run.
I'd guess the cctor could be forced to run early enough.

if we need to use cache from some mode that does not allow managed code.
Most likely this is not a requirement, but I am not sure.

if calling managed code from native has high overhead.
Adding to the cache is reasonably fast. We do not see the cost of adding stuff even at start up, when we'd expect cache misses.
I do not have a good sense of how expensive it is to call managed code from the native runtime in this context.

Would we want to call managed TryGet as well?
(that would be a lot more sensitive to the overhead and TryGet is GC_NOTRIGGER, I vaguely remember there were some reasons for that.)

I think this is worth considering, but it should be a separate change.

It does not look like a straightforward tweak and may need some experimenting.
Also the impact of such change will be mostly in CoreClr, while this PR mostly affects NativeAOT, so in terms of watching for failures or stress/perf consequences, it would be better to have separate changes.

Logged an issue to follow up on switching CoreCLR to use managed TrySet and maybe TryGet - #84448

VSadov · 2023-04-06T21:56:16Z

I have run some simple scenarios with this change to see if there are no obvious performance regressions.

It looks like cached casting is slightly faster with the new cache, but not by much. The key point is to have a cache in the first place and NativeAOT already had it.

I also noticed that CoreClr is faster than NativeAOT. I think the reason is the code that runs before we hit the cache. That is - the code that lives in nativeaot\Runtime.Base\src\System\Runtime\TypeCast.cs. It looks like the entry points are a bit more complex than in CoreClr cast helpers.
Cached casts are fast and couple extra calls, indirections or branches may have an impact.
Also it could be that some extra complexity is necessary. Dealing with IsCloned appears to be specific to NativeAOT.
There are also couple TODOs about IsIDynamicInterfaceCastable that seem relevant to the overall perf.

It may be worth looking at the cast entry points. Perhaps there are some opportunities there.

The microbenchmark that I used to see impact of the cache in "easy" case:

    internal class Program
    {
        const int iters = 1000000;

        static void Main(string[] args)
        {
            for(; ; )
            {
                Time(TestLStringToIROCstring);
            }
        }
        
        static void Time(Action a)
        {
            var sw = Stopwatch.StartNew();
            for (int i = 0; i < 100; i++)
            {
                a();
            }

            sw.Stop();
            System.Console.WriteLine(sw.ElapsedMilliseconds);
        }

        static object o = new List<string>();

        static void TestLStringToIROCstring()
        {
            for (int i = 0; i < iters; i++)
            {
                if (o as IReadOnlyCollection<object> == null)
                    throw null;

                if (o as IReadOnlyCollection<string> == null)
                    throw null;

                if (o as IEnumerable<object> == null)
                    throw null;

                if (o as IEnumerable<string> == null)
                    throw null;
            }
        }
    }

On my machine (x64), I see:
(smaller is better)

==== before change:
871
868
874
870
870
870

==== after the change
858
860
863
864
859
856

==== CoreClr (JIT)
586
589
583
587
583
585

jkotas

More code sharing. Yay!

jkotas · 2023-04-07T01:04:39Z

Dealing with IsCloned appears to be specific to NativeAOT.

IsCloned support is not used. I would not feel bad about deleting it. I think it is unlikely that we will ever use it for anything.

CoreCLR has the methodtable flags specifically shaped to make casting fast. We may want to copy/unify the shape.

VSadov · 2023-04-07T01:17:19Z

CoreCLR has the methodtable flags specifically shaped to make casting fast. We may want to copy/unify the shape.

Right. Also the CoreClr entry points were carefully crafted to minimize branches, benefit from tail calling, if possible. Perhaps at some costs to readability, but for that code it is acceptable.

I will look at what we can borrow from there.

MichalStrehovsky · 2023-04-07T01:20:25Z

Right. Also the CoreClr entry points were carefully crafted to minimize branches, benefit from tail calling, if possible. Perhaps at some costs to readability, but for that code it is acceptable.

We also have some unnecessary checks like "is this null" or "is this exactly the same type" for things that RyuJIT already tests for. The code was written for UTC, not RyuJIT. We just need to validate all the non-codegen callees (i.e. when we call into casting from our regular C# code) also do the appropriate checks and delete the ifs.

VSadov · 2023-04-07T02:04:35Z

Thanks!!

VSadov added 9 commits April 6, 2023 10:11

Getter

d3ed7b4

different limits on debug/release

8c43fd6

tweaks

de42edc

remove now unnecessary CrstCastCache

9f6b993

implement flushing

eab1dee

move coreclr castcache to a separate file

335ac89

Unified CastCache implementation

3831d61

comments and cleanups

173b820

couple more cleanups

c6e1ba5

VSadov requested a review from MichalStrehovsky as a code owner April 6, 2023 17:29

ghost added the area-NativeAOT-coreclr label Apr 6, 2023

ghost assigned VSadov Apr 6, 2023

jkotas reviewed Apr 6, 2023

View reviewed changes

VSadov added 2 commits April 6, 2023 13:43

trivial implementation of the cast cache for the Test.Corlib

8fb20e5

use Numerics.BitOperations for bit math

410dca4

VSadov mentioned this pull request Apr 6, 2023

Consider calling the managed methods from CastCache instead of having native implementation. #84448

Open

jkotas approved these changes Apr 7, 2023

View reviewed changes

VSadov mentioned this pull request Apr 7, 2023

[NativeAOT] Possible perf improvements in casting #84464

Closed

VSadov merged commit 38b81ba into dotnet:main Apr 7, 2023

VSadov deleted the castCacheN branch April 7, 2023 02:04

ghost locked as resolved and limited conversation to collaborators May 7, 2023

[NativeAOT] Using the same CastCache implementation as in CoreClr #84430

[NativeAOT] Using the same CastCache implementation as in CoreClr #84430

Uh oh!

Conversation

VSadov commented Apr 6, 2023

Uh oh!

ghost commented Apr 6, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov commented Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

jkotas commented Apr 7, 2023

Uh oh!

VSadov commented Apr 7, 2023

Uh oh!

MichalStrehovsky commented Apr 7, 2023

Uh oh!

VSadov commented Apr 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VSadov Apr 6, 2023 •

edited

Loading

VSadov Apr 6, 2023 •

edited

Loading

VSadov Apr 6, 2023 •

edited

Loading

VSadov Apr 6, 2023 •

edited

Loading

VSadov commented Apr 6, 2023 •

edited

Loading