-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
To better support high-level optimizations, it makes sense to try and defer or encapsulate some of the more complex runtime lowerings in the JIT IR. Here are some thoughts on the matter.
Motivations:
- High-level optimizations would prefer to see logical operators (even if complex) rather than a complex tree or tree sequence
- Many times these operators become dead and cleaning up after them can be complex if they’ve been expanded
- Sometimes these operators can be abstractly simplified if they are partially dead. For instance a box used only to feed a type test can become a type lookup.
- Properties of these operators are not always evident from their expansions, and the expansions can vary considerably, making “reparsing” within the jit to recover information lost during expansion problematic
- Often of these operators have nice properties (invariant, nonfaulting) and would be good candidates for hoisting, but their complex shape makes this difficult/costly.
- Often the equivalence of two such operators can be stated rather simply as equivalence of some abstract inputs, making CSE/value numbering simple.
Possible candidates for this kind of encapsulation include
- Runtime lookup
- Static field access
- Box (already semi-encapsulated)
- Unbox
- Cast/Isint
- Allocation (already encapsulated)
- Class initialization
The downside to encapsulation is that the subsequent expansion is context dependent. The jit would have to ensure that it could retain all the necessary bits of context so it could query the runtime when it is time to actually expand the operation. This becomes complicated when these runtime operators are created during inlining, as sometimes inlining must be abandoned when the runtime operator expansions become complex. So it could be this approach becomes somewhat costly in space (given the amount of retained context per operator) or in time (since we likely must simulate enough of the expansion during inlining to see if problematic cases arise).
We’d also have more kinds of operations flowing around in the IR and would need to decide when to remove/expand them. This can be done organically, removing the operations just after the last point at which some optimization is able to reason about them. Initially perhaps they’d all vanish after inlining or we could repurpose the object allocation lowering to become a more general runtime lowering.
Instead of full encapsulation, we might consider relying initially on partial encapsulation like we do now for box: introduce a “thin” unary encapsulation wrapper over a fully expanded tree that identifies the tree as an instance of some particular runtime operation (and possibly, as in box, keeping tabs on related upstream statements) with enough information to identify the key properties. Expansion would be simple: the wrapper would disappear at a suitable downstream phase, simply replaced by its content. These thin wrappers would not need to capture all the context, but just add a small amount of additional state. Current logic for abandoning inlines in the face of complex expansion would apply, so no new logic would be needed.
As opportunities arise we can then gradually convert the thin wrappers to full encapsulations; most “upstream” logic should not care that much since presumably the expanded subtrees, once built, do not play any significant role in high-level optimization, so their creation could be deferred.
So I’m tempted to say that thin encapsulation gives us the right set of tradeoffs, and start building upon that.
The likely first target is the runtime lookups feeding type equality and eventually type cast operations. Then probably static field accesses feeding devirtualization opportunities.
If you’re curious what this would look like, here’s a prototype: master..AndyAyersMS:WrapRuntimeLookup
And here’s an example using the prototype. In this case the lookup tree is split off into an earlier statement, but at the point of use we can still see some information about what type the tree intends to look up. A new jit interface call (not present in the fork above) can use this to determine if the types are possibly equal or not equal, even with runtime lookups for one or both inputs.
fgMorphTree BB01, stmt 3 (before)
[000037] --C-G------- * JTRUE void
[000035] ------------ | /--* CNS_INT int 0
[000036] --C-G------- \--* EQ int
[000032] --C-G------- \--* CALL int System.Type.op_Equality
[000028] --C-G------- arg0 +--* CALL help ref HELPER.CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE
[000026] ------------ arg0 | \--* RUNTIMELOOKUP long 0x7ffcc5769428 class
[000025] ------------ | \--* LCL_VAR long V03 loc1
[000031] --C-G------- arg1 \--* CALL help ref HELPER.CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE
[000029] ------------ arg0 \--* CNS_INT(h) long 0x7ffcc5769530 class
By default the wrapper just evaporates in morph:
Optimizing call to Type:op_Equality to simple compare via EQ
Optimizing compare of types-from-handles to instead compare handles
fgMorphTree BB01, stmt 3 (after)
[000037] ----G+------ * JTRUE void
[000029] -----+------ | /--* CNS_INT(h) long 0x7ffcc5769530 class
[000170] J----+-N---- \--* NE int
[000025] -----+------ \--* LCL_VAR long V03 loc1
But in morph and upstream it can be used to trigger new optimizations.
category:implementation
theme:ir
skill-level:expert
cost:medium