Skip to content

Conversation

@kunalspathak
Copy link
Contributor

@kunalspathak kunalspathak commented Aug 4, 2021

Introduce superpmi-replay public pipeline that will be triggered after JIT changes. Here is how it works:

  1. It will perform windowx-x64 build to cross-compile following jit binaries:
    • clrjit_win_x64_x64.dll
    • clrjit_win_arm64_x64.dll
    • clrjit_unix_x64_x64.dll
    • clrjit_unix_arm64_x64.dll
  2. It will perform windowx-x86 build to cross-compile following jit binaries:
    • clrjit_win_x86_x86.dll
    • clrjit_unix_arm_x86.dll
  3. It will spawn 6 machines corresponding to the 6 configurations about to run superpmi replay for below environment variables:
    • "JitStressRegs=0" (default - no flags)
    • "JitStressRegs=1"
    • "JitStressRegs=2"
    • "JitStressRegs=3"
    • "JitStressRegs=4"
    • "JitStressRegs=8"
    • "JitStressRegs=0x10"
    • "JitStressRegs=0x80"
    • "JitStressRegs=0x1000"
  4. In each of the 6 runs, it will download all the collections corresponding to the OS/architecture and do replay for each of the flags and upload a consolidated superpmi__arch.log as an artifact. Currently the pipeline fails because of a bug in helix tracked by https://github.com/dotnet/core-eng/issues/13983.

Changes:

  • Add superpmi-replay public pipeline
  • Added option for --no_progress in "superpmi download" so we do not see the progress logs during mch downloading.
  • Added a template of download-specific-artifacts.yml.

Since I have introduced superpmi-replay** files, I will submit a follow-up PR to rename existing superpmi* files that does the collection to superpmi-collect**.

Also, as a follow-up work, we would trigger this pipeline after superpmi-collect pipeline completes so the collection are fresh and we don't get "missing key errors".

Fixes: #52392

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 4, 2021
@kunalspathak kunalspathak changed the title Spmi replay Spmi replay pipeline Aug 5, 2021
@kunalspathak kunalspathak marked this pull request as ready for review August 5, 2021 16:27
@kunalspathak
Copy link
Contributor Author

@dotnet/jit-contrib

@kunalspathak
Copy link
Contributor Author

Can someone review this? Want to make sure that this goes in before we freeze the main.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ${{ if eq(parameters.uploadAsArtifacts, false) }}:
- ${{ if not(parameters.uploadAsArtifacts) }}:

@ghost
Copy link

ghost commented Aug 11, 2021

Hello @kunalspathak!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good to Me

@ghost ghost merged commit 0812322 into dotnet:main Aug 11, 2021
@BruceForstall
Copy link
Contributor

@kunalspathak Love this work!

Are you planning to move the pipeline from internal to public, so it can be auto-triggered or manually triggered on JIT PR's? Or is there some reason why it must remain on "internal" and run post-merge?

@kunalspathak
Copy link
Contributor Author

We could definitely make it public but there is one scenario in which it doesn't work - If the PR changes the JITEE guid, then we don't get new collection for new guid because we haven't collected it. What should ideally happen in such cases is that collection pipeline should trigger the replay pipeline. But other than that, this can be moved to public (let me know if you plan on doing it).

@BruceForstall
Copy link
Contributor

  1. The "run" legs of the pipeline take 1.5 or 2 hours. Is that because they do all the work on one Helix machine each? Would it be hard to split the work up to run more in parallel? We'd need to do this if we want to run in a PR.
  2. Did implementing this find any asserts in the JIT? Did you test that if an assert is found, the job is marked as failing?

@kunalspathak
Copy link
Contributor Author

  • The "run" legs of the pipeline take 1.5 or 2 hours. Is that because they do all the work on one Helix machine each? Would it be hard to split the work up to run more in parallel? We'd need to do this if we want to run in a PR.

Currently, the partitions are hardcoded in superpmi-replay.proj. Further splitting based on collection would definitely reduce the run time, but the list in below file will grow (no. of collection X 6 entries), but is definitely do-able.

<ItemGroup Condition="'$(Architecture)' == 'x64'">
<SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" />
<SPMI_Partition Include="win-arm64" Platform="windows" Architecture="arm64" />
<SPMI_Partition Include="unix-x64" Platform="Linux" Architecture="x64" />
<SPMI_Partition Include="unix-arm64" Platform="Linux" Architecture="arm64" />
</ItemGroup>
<ItemGroup Condition="'$(Architecture)' == 'x86'">
<SPMI_Partition Include="win-x86" Platform="windows" Architecture="x86" />
<SPMI_Partition Include="unix-arm" Platform="Linux" Architecture="arm" />
</ItemGroup>

  • Did implementing this find any asserts in the JIT? Did you test that if an assert is found, the job is marked as failing?

I didn't see any assert yet. The pipeline failed once and reported when we didn't had .mch files to run on.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 25, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create SuperPMI replay job

4 participants