Skip to content

Conversation

@jkoritzinsky
Copy link
Member

This PR removes the "StageOne/StageTwo" build concept and replaces it with a bootstrapping-build model with the following design:

  • The bootstrap subset builds the following components:
    • Microsoft.NETCore.App ref assemblies
    • Microsoft.NETCore.App runtime assemblies
    • (as part of the above) an updated RID graph with any additional RIDs specified by the build command
    • apphost
    • (if NativeAOT is supported): NativeAOT runtime libraries, managed and native
    • (if NativeAOT is not supported): Singlefilehost

It then lays out the produced files under the artifacts/bootstrap directory. This way we can use the bootstrapped artifacts but still do a "full" product build when we build the product if the artifacts/bin and artifacts/obj directories are deleted.

Two new options are introduced for build.sh:

  • --use-bootstrap: Use the artifacts in artifacts/bootstrap as the "live" artifacts for any bootstrapped components
  • --bootstrap: Build the bootstrap subset, delete artifacts/bin and artifacts/obj, build with --use-bootstrap.

The following projects are "bootstrapped components":

  • crossgen2_publish
  • ILCompiler_publish

I've validated the following scenarios:

  • Building a non-portable build NativeAOTs crossgen2_publish and ILCompiler_publish with the live assets for the OutputRID.
  • Building for FreeBSD produces single-file-host-based crossgen2_publish and ILCompiler_publish with FreeBSD-targeting ELF executables.

This experience is enabled by default for SourceBuild legs that don't force the Mono runtime (as those legs introduce new RIDs).

Windows build script support and support in the VMR to toggle this (do we even want this togglable?) is still TODO.

Built on top of #113765

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

@tmds
Copy link
Member

tmds commented Apr 5, 2025

This PR removes the "StageOne/StageTwo" build

Does source build build these stages separately? I thought they were a CI concept.

a bootstrapping-build

@jkoritzinsky @am11 @ViktorHofer

I can understand HOW this works. I would have also liked to understand WHY it does it this way.

Currently, I only understand the need to make the SDK aware OutputRID is a valid targetable rid (that supports R2R, ILC, ...) by updating the rid graph.

Built on top of #113765

I have a 1 week break coming up and I have to finish some things before that in the next week. I imagined having enough time to run #113765 through some scenarios. I assume this PR is not as ready, so I'm not going to spend time validating it yet. I'm not going to validate #113765 separately either.

I don't think this should block you. This validation was mostly for (my) rest assurance.

@am11
Copy link
Member

am11 commented Apr 5, 2025

WHY

To bootstrap components which require LKG packages during publishing. Those packages include; apphost, singlefilehost, crossgen2 and ilc.

Does source build build these stages separately?

Currently SB/VMR support is broken on community platforms which support full coreclr and require publishing components filipnavara/dotnet-riscv#4

I thought they were a CI concept.

This is the only way to build runtime on community platforms. In .NET 9 and earlier, we were using live-built publishing components during the runtime build. 3f28b1a changed it to use LKG packages. We don't publish apphost, crossgen2 and ilc packages for community platforms, so LKG option does not work.

It would be nice if we also start publishing those packages to nuget feed so non-bootstrap option also works for community platforms; either on the same nuget feed where official platform packages are published or a separate one with, say, -community suffix in its name.

@tmds
Copy link
Member

tmds commented Apr 5, 2025

which require LKG packages during publishing

What is LKG?

Currently SB/VMR support is broken on community platforms which support full coreclr and require publishing components filipnavara/dotnet-riscv#4

This is a restore failure. It should not be trying to restore these packages, should it? These are meant to be produced by the build.

I can see how a bootstrap build can be used to address a restore issue by first doing a build that builds packages are causing the restore problem.

Afaik the runtime build already does some building of "live" artifacts which are then consumed by the build. I don't know why the same mechanism can not be used?

Does the bootstrap build depend on these packages existing for NETCoreSdkRuntimeIdentifier?

This is the only way to build runtime on community platforms. In .NET 9 and earlier, we were using live-built publishing components during the runtime build. 3f28b1a changed it to use LKG packages.

It seems these changes are drive by the change to use LKG packages for .NET 10.
I had missed that completely.

@am11
Copy link
Member

am11 commented Apr 5, 2025

What is LKG?

Last known good configuration. The fixed/immutable versions of runtime packs tied to the SDK version from global.json (they are restored from nuget feed).

This is a restore failure. It should not be trying to restore these packages, should it?

Some components perform dotnet publish during the build, which looks for runtime packs of target. Since community platforms runtime packs are not published to nuget feed, we can't completely build everything.

These are meant to be produced by the build.

Yes, it is the ordering issue and it can be fixed solely in Subsets.porps. We tried three approaches in #105004 and ended up with TwoStage build. The first approach was fixing the order, it worked but it turned out to be twisted and hard to maintain.

Afaik the runtime build already does some building of "live" artifacts which are then consumed by the build. I don't know why the same mechanism can not be used?

It still exists, but since the mainstream platforms are now moved to LKG plan, it is mainly used for community platforms.

Does the bootstrap build depend on these packages existing for NETCoreSdkRuntimeIdentifier?

CrossBuild uses the host runner (dotnet.exe, crossgen2.exe, ilc.exe with Rid=NETCoreSdkRuntimeIdentifier) with runtime pack for the target.

In TwoStage mode, the first stage builds everything which doesn't require publishing, that includes building (constituents of) runtime packs (without the actual nupkg). The second stage builds the publishable components using runtime packs (constituents) built by stage 1; skips restore of package and instead use deterministic paths under artifacts/bin dir to find the runtime pack.

This PR is replacing the TwoStage mode with bootstrap which a bit different. It will build everything-except-publishable-stuff with --use-bootstrap (just like stage 1 of TwoStage mode), but there is no counterpart for stage 2, i.e. 'only' build the publishable stuff skipped by stage 1. With stage 1 / --use-bootstrap, it builds enough parts of the repo that can run most of the src/tests (except host, R2R and AOT). To get the full build with published components, it is providing --bootsrap which first builds with --use-bootstrap, moves artifacts/bin to artifacts/bootstrap, cleans up artifacts/ (modulo artifacts/bootstrap) and rebuilds everything without skipping anything.

It has increased the build time by 56% compared to TwoStage:
https://github.com/am11/CrossRepoCITesting/actions/runs/14276225403
image

but I think it is more cleaner and less error prone: TwoStage required extra care that we are not rebuilding any msbuild target (which needs eviction of cache under obj/ otherwise the target gets skipped resulting in strange unpredictable outcome).

@tmds
Copy link
Member

tmds commented Apr 6, 2025

Those packages include; apphost, singlefilehost, crossgen2 and ilc.

It sounds like these packages (for OutputRID) have become prebuilds of runtime, and the bootstrap build is responsible for building them when they don't exist?

Since community platforms runtime packs are not published to nuget feed, we can't completely build everything.

This includes all (non-Microsoft) non-portable builds?

It has increased the build time by 56% compared to TwoStage:

I find this a large increase.

since the mainstream platforms are now moved to LKG plan

What is the benefit of moving to LKG instead of doing it the ".NET 9 way"?

@jkoritzinsky
Copy link
Member Author

jkoritzinsky commented Apr 6, 2025

We moved to the LKG way as it provides a significantly better developer experience due to slowdowns with debug builds. Our plan when moving to LKG was to always move to an optional bootstrap phase to remove LKG usage without significantly tanking the performance of the dev innerloop by more than this bootstrap build currently does.

@jkotas
Copy link
Member

jkotas commented Apr 6, 2025

.NET 9 way

.NET 9 way was a mix of live and LKG builds. We standardized on using LKG builds by default since it makes dev interloop more efficient.

@am11
Copy link
Member

am11 commented Apr 6, 2025

Does this include all non-Microsoft, non-portable builds?

I'm not sure what the top-level VMR flow will be, but there'll likely be a --bootstrap-runtime option. It's only needed if the required runtime pack version isn’t already available. e.g., the first fedora.42-riscv64 build would need it, but later ones wouldn't if a matching version exists.

I find this a large increase.

Yes, but this is more predictable than TwoStage, we don't reuse obj/ dir to avoid all kinds of caching issues.

@tmds
Copy link
Member

tmds commented Apr 7, 2025

better developer experience

Thank you for the discussion, and helping me understand that this is the driver for these changes!

I'm not sure what the top-level VMR flow will be, but there'll likely be a --bootstrap-runtime option. It's only needed if the required runtime pack version isn’t already available. e.g., the first fedora.42-riscv64 build would need it, but later ones wouldn't if a matching version exists.

We should avoid requiring an option to get a working build.

It sounds like the relative build time increase on the complete vmr build is going to be small, so using it in some cases where it may not be needed shouldn't be too much of a problem. It may even provide extra consistency and make it simpler to understand when bootstrap is used and when not.

Let's try to make the vmr build "smart" enough to know when it should use the bootstrap without user interaction.

Comment on lines 48 to 49
<!-- Source-build will use non-portable RIDs. TO build for these non-portable RID scenarios, we must do a boostrapped build. -->
<InnerBuildArgs Condition="'$(DotNetBuildSourceOnly) == 'true' and '$(DotNetBuildUseMonoRuntime)' != 'true'">$(InnerBuildArgs) --bootstrap</InnerBuildArgs>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<!-- Source-build will use non-portable RIDs. TO build for these non-portable RID scenarios, we must do a boostrapped build. -->
<InnerBuildArgs Condition="'$(DotNetBuildSourceOnly) == 'true' and '$(DotNetBuildUseMonoRuntime)' != 'true'">$(InnerBuildArgs) --bootstrap</InnerBuildArgs>
<!-- Source-build will use non-portable RIDs. To build for these non-portable RID scenarios, we must do a boostrapped build. -->
<InnerBuildArgs Condition="'$(DotNetBuildSourceOnly)' == 'true' and '$(DotNetBuildUseMonoRuntime)' != 'true'">$(InnerBuildArgs) --bootstrap</InnerBuildArgs>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a previously built SDK with ilc/cg2 runtime packs available, should it skip the --bootstrap? I think https://github.com/dotnet/dotnet/blob/main/prep-source-build.sh determines that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bootstrap is skippable in that case. We are considering using --bootstrap for VMR as well as SB (still needs to be decided), so this condition is subject to change.

Copy link
Member

@am11 am11 Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are considering using --bootstrap for VMR as well as SB (still needs to be decided), so this condition is subject to change.

We can infer it in SB so distro maintainer don't need to determine when to add the bootstrap argument and when to skip it. But if it's not feasible to infer, then explicit argument is fine.

prep-source-build.sh has --no-bootstrap arg which has different meaning, so it maybe better to qualify it --bootstrap-runtime (if we end up with explicit option).

@arrowd
Copy link
Contributor

arrowd commented Apr 10, 2025

Just a note from a downstream packager - the ability to produce a bootstrap and then use it to build the full version of the software package is just great. Right now in FreeBSD Ports I'm using a (most likely) hacky way to produce the bootstrap by passing /p:PortableBuild=true to build.sh that I dug in project files.

So, having a high-level switches like --bootstrap and --use-bootstrap makes life much easier for us packagers.

It's used to add OutputRID in the graph if the parent can't be detected. -->
<InnerBuildArgs>$(InnerBuildArgs) /p:AdditionalRuntimeIdentifierParent=$(BaseOS)</InnerBuildArgs>
<!-- Source-build will use non-portable RIDs. To build for these non-portable RID scenarios, we must do a boostrapped build. -->
<InnerBuildArgs Condition="'$(DotNetBuildSourceOnly)' == 'true' and '$(DotNetBuildUseMonoRuntime)' != 'true'">$(InnerBuildArgs) --bootstrap</InnerBuildArgs>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'$(DotNetBuildSourceOnly)' == 'true'

This is good for me. I wonder if TargetRid != NETCoreSdkRuntimeIdentifier might be an alternative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use that by itself because that would cause this to trigger on some (but not all) Unified Build legs.

For Unified Build, we'll want this on either for 100% of the verticals or never.

@tmds
Copy link
Member

tmds commented Apr 10, 2025

@arrowd can you create an issue in https://github.com/dotnet/source-build and describe your workflow?

I hope for the specific change made in this PR, we can avoid extra flags for distro maintainers and let the build take care of it on its own.

@tmds
Copy link
Member

tmds commented Apr 24, 2025

@tmds fixed failures from your usage scenario

@jkoritzinsky , thank you! I'm going to do some more builds which I intend to complete this week and then give my approval on this PR. I won't try and test this from a vmr build integration. If issues would arise from vmr side, I'm sure we'll get them fixed.

Copy link
Member

@tmds tmds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@am11
Copy link
Member

am11 commented Apr 26, 2025

Two more tests passed with this branch:

FreeBSD-x64 AOT with smoke tests in VM: https://github.com/am11/CrossRepoCITesting/actions/runs/14678420933
FreeBSS-arm64 AOT with smoke tests in VM+qemu: https://github.com/am11/CrossRepoCITesting/actions/runs/14681685023

Only thing missing for FreeBSD was UseNativeAotForComponents (last suggestion).

jkoritzinsky and others added 2 commits April 29, 2025 11:02
Co-authored-by: Adeel Mujahid <[email protected]>
Co-authored-by: Adeel Mujahid <[email protected]>
@am11
Copy link
Member

am11 commented Apr 29, 2025

Markdown lint error is unrelated #115159

@am11
Copy link
Member

am11 commented Apr 30, 2025

Merging main should fix the errors in freebsd-x64 leg. 🤞

@am11
Copy link
Member

am11 commented May 1, 2025

FreeBSD x64 and arm64 smoke tests passed with PublishAot'd ilc as well. :shipit:

@Thefrank
Copy link
Contributor

Thefrank commented May 2, 2025

Another FreeBSD voice here. I don't source-build as cross-build is usually faster and FreeBSD ports does not have net10 previews.

This method feels better than the current cross-build method that I use for net10 previews: StageOne, StageTwo, and another part for just building the ILLink NuGet that SDK needs at the end.

Having a top level --bootstrap makes the process cleaner and more clear. I hope this makes it into the final net10 release!

@jkoritzinsky
Copy link
Member Author

Looks like the failures are unrelated. I'll get this merged in and work on the PowerShell script implementation as well (and enable for all VMR builds) for a future PR.

@jkoritzinsky
Copy link
Member Author

/ba-g build analysis is stuck

@jkoritzinsky jkoritzinsky merged commit 5cd7566 into dotnet:main May 2, 2025
152 of 158 checks passed
@jkoritzinsky jkoritzinsky deleted the dotnetbuild-local-props-bootstrap branch May 2, 2025 16:15
@am11
Copy link
Member

am11 commented May 2, 2025

Kudos @jkoritzinsky! 🎉

This method feels better than the current cross-build method that I use for net10 previews: StageOne, StageTwo

Thanks @Thefrank. I was actually going to ping you, @dotnet/samsung, @LuckyXu-HF and @shushanhf after this was merged. 😅

Workflow for CI:

  • net 9: ./build.sh clr+libs -cross
  • net 10 before this PR: ./build.sh clr+libs -cross -p:StageOneBuild=true && ./build.sh clr+libs -cross -p:StageTwoBuild=true
  • net 10 now: ./build.sh clr+libs -cross --bootstrap

Local workflow:
Since --bootstrap comprises of --subset bootstrap then build.sh --use-bootstrap, we can use --bootstrap for the first clean build, then subsequent builds with --use-bootstrap.

and another part for just building the ILLink NuGet that SDK needs at the end.

I think that is all covered. After the changes are picked up in VMR https://github.com/dotnet/dotnet, source build experience will also be pleasant, with one last rough edge:

  • -p:SkipUsingCrossgen=true is a small annoyance which I hope we will figure out before the release (it's only needed for dotnet/sdk repo during VMR build).

(the other one was auto-infer freebsd as crossbuild; fixed by #115247)

It only takes one hour to get the full SDK tarball out:

# community platform source-build
_os=linux # or 'freebsd'
_arch=riscv64

_dockerTagSuffix=riscv64
# or 'loongarch64' or 'freebsd-14-amd64' etc. (search in https://github.com/dotnet/versions/blob/main/build-info/docker/image-info.dotnet-dotnet-buildtools-prereqs-docker-main.json)

# -b or --branch can be a VMR tag (once the preview 4 onwards are out https://github.com/dotnet/dotnet/tags)
$ git clone https://github.com/dotnet/dotnet -b main --single-branch --depth 1

# on x64 host machine
$ docker run --platform linux/amd64 --rm -v$(pwd)/dotnet:/dotnet -w /dotnet -e ROOTFS_DIR=/crossrootfs/$_arch \
     mcr.microsoft.com/dotnet-buildtools/prereqs:azurelinux-3.0-net10.0-cross-$_dockerTagSuffix \
     sh -c './prep-source-build.sh &&
        ./build.sh --clean-while-building -sb --os $_os --arch $_arch -p:SkipUsingCrossgen=true'

# or manually (for other host machines like docker on macOS arm64)
$ docker run --rm -v$(pwd)/dotnet:/dotnet -w /dotnet -e ROOTFS_DIR=/crossrootfs/$_arch \
     ubuntu \
     sh -c './prep-source-build.sh && src/arcade/eng/common/native/install-dependencies.sh &&
         ./build.sh --clean-while-building -sb --os $_os --arch $_arch -p:SkipUsingCrossgen=true'

# intersting ones are:
#   * dotnet/artifacts/assets/Release/dotnet-sdk-*.tar.gz
#   * dotnet/artifacts/packages/Release/Shipping/runtime/Microsoft.NETCore.App.Crossgen2.linux-*.nupkg

(then copy Microsoft.NETCore.App.Crossgen2.linux-*.nupkg to <sdk-install-path>/library-packs to get PublishReadyToRun also working, PublishAot is provided by SDK inbox but crossgen2 is not for some reason?!?)

@tmds
Copy link
Member

tmds commented May 3, 2025

(then copy Microsoft.NETCore.App.Crossgen2.linux-*.nupkg to /library-packs to get PublishReadyToRun also working, PublishAot is provided by SDK inbox but crossgen2 is not for some reason?!?

Support for self-contained and NativeAot were added as part of dotnet/source-build#1215. Though the issue was created with PublishReadyToRun in mind, .NET later introduced NativeAot and I focused on including that instead.

A source-built PublishReadyToRun can be included with the SDK if someone wants it (and someone implements it ...).

@am11
Copy link
Member

am11 commented May 3, 2025

A source-built PublishReadyToRun can be included with the SDK if someone wants it (and someone implements it ...).

Bringing PublishAot and PublishReadyToRun experiences to the same packaging plan will make it easier to reason about. Either include or exclude both nupkgs alike from dotnet-sdk-*.tar.gz.

@tmds
Copy link
Member

tmds commented May 3, 2025

Bringing PublishAot and PublishReadyToRun experiences to the same packaging plan will make it easier to reason about.

That makes sense.

Either include or exclude both nupkgs alike from dotnet-sdk-*.tar.gz.

The ones that are included currently are those that we (per dotnet/source-build#1215) wanted to be able to provide as source-built features to the end-user.

We had to go through some hoops to make this work, but the same patterns should work for PublishReadyToRun.

@am11
Copy link
Member

am11 commented May 3, 2025

Similar to dotnet/sdk#48986, I think pt. 2 is remaining for CG2:

note that ILCompiler is present in that group, but Crossgen2 is missing.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 15, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-Infrastructure source-build Issues relating to dotnet/source-build

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants