Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 24, 2023

Should fix perf problems like #82442 (comment)

So for methods with sufficient PGO samples (currently it's 1000 invocations with static PGO) we might want to ignore size-aware heuristics and apply large paddings (e.g. 30 bytes) for loops. Let's see if this hits any diffs

@ghost ghost assigned EgorBo Feb 24, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Feb 24, 2023

No hits

@EgorBo
Copy link
Member Author

EgorBo commented Feb 27, 2023

Ah, no hits because SPMI disables loop alignment

@EgorBo EgorBo reopened this Feb 27, 2023
@kunalspathak
Copy link
Contributor

That will be a lot more padding and sometimes more than the loop code itself. It ignores most of the heuristics we have in place. I think the better strategy will be to add more padding at places only if we can hide those paddings behind the jump. Continuing on this, we should also see if we can enable loop alignment for non-inner most loops as well.

@tannergooding
Copy link
Member

That will be a lot more padding and sometimes more than the loop code itself

It is definitely the case that small/hot loops may end up with more padding, but it can also equally be that it makes a significant difference on the throughput of the method.

We should likely balance it to not "waste padding" for things like crossgen, but to also be more forgiving on "padding" when we're compiling a known hot method via rejit.

If we can skip the "large padding" with an unconditional jump, that's even better.

@kunalspathak
Copy link
Contributor

it makes a significant difference on the throughput of the method.

That would be a good exercise to do on 100s of benchmarks and see how they are impacted before we merge this change. That's how we fine-tuned the heuristics for loop alignment when we first added the support for it.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 28, 2023

Yeah I just wanted to see the diffs on CI (couldn't run them locally)

@kunalspathak
Copy link
Contributor

good exercise to do on 100s of benchmarks

Just to clarify - By executing the benchmarks and measuring the impact.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 28, 2023

Ah, still no hits in SPMI. Presumably becuase fgHaveSufficientProfileWeights is too conservative - it expects 1000 calls + static profile. Also, our Static PGO is stale and doesn't include my changes for HexConverter

@EgorBo EgorBo closed this Feb 28, 2023
@kunalspathak
Copy link
Contributor

And you are not able to run this locally to get the data?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 28, 2023

And you are not able to run this locally to get the data?

Do you mean to collect PGO? Yes, but we need to update it anyway (it's currently based on some November version of runtime) so I'll just wait till it's propagated

@ghost ghost locked as resolved and limited conversation to collaborators Mar 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants