Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Dec 4, 2023

Backport of #95539 to release/8.0-staging

/cc @AndyAyersMS

Customer Impact

Issue #95394 was opened by a customer.

An application that contains a certain pattern of code may compute with the wrong values, or crash. The pattern is one where there is a copy of a local struct on the (IL) stack, then a virtual, interface, or delegate call that modifies that same local, and then a use of the copy after the call. In the repro case (here P.XY is virtual):

static int Problem(P p, int n, (double x, double y) tuple)
{
    ...
    (double x, double y) tupleTmp = tuple;
    tuple = p.XY();
    (_, double f) = tupleTmp;
    ... f ...
}

In such cases if the call is optimized by GDV (Guarded Devitualization)—as it is likely to be now that PGO is enabled by default in .NET 8—the subsequent read of tupleTemp may actually read the values of the tuple returned by the call, instead of the values that tuple had before the call. So, in the example above, f ends up with the wrong value.

The underlying bug in the JIT is pre-existing and goes back to at least .NET 6.0.

Whether or not the IL stack contains the problematic sequence depends on the options passed to csc, in particular the default options in release mode will encourage csc to rely more heavily on the stack.

Testing

Verified the fix on several version of the originally reported repro case.

Ran standard CI testing, plus additional PGO, outerloop, and jitstress. Fix did not lead to any diffs on SPMI collections.

Risk

Low.

A similar fix was made for boxes and GDV calls in #60355.

In the jit there are several factors that contribute to this being a tricky area:

  • inline candidates will split IR trees into a call part and a return part
  • PGO enables many more calls to be handled as more inline candidates
  • calls returning structs via "hidden return buffers" that get stored to locals are modified after they are first imported to "sink" the store into the call. When the call is an inline candidate then this modification is non-local as the logic locally has the return part and links back to the call part to make the modification
  • when pushing a local onto the IL stack the JIT will attempt to avoid making a copy. Subsequent operations that might modify the local require the jit to spill the entry from the stack. In this case the call triggers a spill but because of the tree splitting the split is placed just before the return and not just before the call.
  • Thus the spill IR is temporarily out of order. Once the inline candidate is resolved (inlined or abandoned) the temporary out of order spill ends up back in the proper order as the store done by the call is placed at the point of the return (so after the spill).
  • GDV introduces control flow at the point of the call that inhibits the reordering.

The fix here is to move the spill to the proper place immediately after it is created, so that the JIT no longer produces a temporarily out of order IR sequence.

If we have a call that returns a struct that is immediately assigned
to some local, the call is a GDV inline candidate, and the call is
invoked with a copy of the same local on the evaluation stack, the
spill of that local into a temp will appear in the IR stream between
the call and the ret expr, instead of before the call.

As part of our IR resolution the store to the local gets "sunk" into
the call as the hidden return buffer, so the importer ordering is
manifestly incorrect:
```
call &retbuf, ...
tmp = retbuf
...ret-expr
...tmp
```

For normal inline candidates this mis-ordering gets fixed up either
by swapping the call back into the ret expr's position, or for successful
inlines by swapping the return value store into the ret expr's position.
The JIT has behaved this way for a very long time, and the transient
mis-ordering has not lead to any noticble problem.

For GDV calls the return value stores are done earlier, just after
the call, and so the spill picks up the wrong value. GDV calls normally
only happen with PGO data. This persistent mis-ordering has been
the behavior since at least 6.0, but now that PGO is enabled by default
a much wider set of programs are at risk of running into it.

The fix here is to reorder the IR in the importer at the point the
store to the local is appended to the IR stream, as we are processing
spills for the local. If the source of the store is a GT_RET_EXPR we
keep track of these spills, find the associated GT_CALL statement, and
move the spills before the call.

There was a similar fix made for boxes in #60335, where once again the
splitting of the inline candidate call and the subsequent modification of
the call to write directly to the return buffer local led to similar
problems with GDV calls.

Fixes #95394.
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 4, 2023
@ghost
Copy link

ghost commented Dec 4, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Backport of #95539 to release/8.0-staging

/cc @AndyAyersMS

Customer Impact

Testing

Risk

IMPORTANT: If this backport is for a servicing release, please verify that:

  • The PR target branch is release/X.0-staging, not release/X.0.

  • If the change touches code that ships in a NuGet package, you have added the necessary package authoring and gotten it explicitly reviewed.

Author: github-actions[bot]
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member

@dotnet/jit-contrib PTAL

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved. we will take for consideration in 8.0.x

@jeffschwMSFT jeffschwMSFT added the Servicing-consider Issue for next servicing release review label Dec 4, 2023
@jeffschwMSFT jeffschwMSFT added this to the 8.0.x milestone Dec 4, 2023
@jeffschwMSFT jeffschwMSFT added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Dec 6, 2023
@jeffschwMSFT jeffschwMSFT modified the milestones: 8.0.x, 8.0.2 Dec 6, 2023
@jeffschwMSFT jeffschwMSFT merged commit 14a32f8 into release/8.0-staging Jan 2, 2024
@jkotas jkotas deleted the backport/pr-95539-to-release/8.0-staging branch January 2, 2024 17:26
@github-actions github-actions bot locked and limited conversation to collaborators Feb 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Servicing-approved Approved for servicing release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants