-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Disable adaptive loop alignment for known hot methods #82635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
No hits |
|
Ah, no hits because SPMI disables loop alignment |
|
That will be a lot more padding and sometimes more than the loop code itself. It ignores most of the heuristics we have in place. I think the better strategy will be to add more padding at places only if we can hide those paddings behind the jump. Continuing on this, we should also see if we can enable loop alignment for non-inner most loops as well. |
It is definitely the case that small/hot loops may end up with more padding, but it can also equally be that it makes a significant difference on the throughput of the method. We should likely balance it to not "waste padding" for things like crossgen, but to also be more forgiving on "padding" when we're compiling a known hot method via rejit. If we can skip the "large padding" with an unconditional jump, that's even better. |
That would be a good exercise to do on 100s of benchmarks and see how they are impacted before we merge this change. That's how we fine-tuned the heuristics for loop alignment when we first added the support for it. |
|
Yeah I just wanted to see the diffs on CI (couldn't run them locally) |
Just to clarify - By executing the benchmarks and measuring the impact. |
|
Ah, still no hits in SPMI. Presumably becuase |
|
And you are not able to run this locally to get the data? |
Do you mean to collect PGO? Yes, but we need to update it anyway (it's currently based on some November version of runtime) so I'll just wait till it's propagated |
Should fix perf problems like #82442 (comment)
So for methods with sufficient PGO samples (currently it's 1000 invocations with static PGO) we might want to ignore size-aware heuristics and apply large paddings (e.g. 30 bytes) for loops. Let's see if this hits any diffs