[UserEvents] Add end-to-end runtime test #121316

mdh1418 · 2025-11-03T19:40:13Z

With user_events support added in #115265, this PR looks to test a basic end-to-end user_events scenario.

Alternative testing approaches considered

Existing EventPipe runtime tests

Existing EventPipe tests under src/tests/tracing/eventpipe are incompatible with testing the user_events scenario due to:

Starting EventPipeSessions through DiagnosticClient ❌
DiagnosticClient does not have the support to send the IPC command to start a user_events based EventPipe session, because it requires the user_events_data file descriptor to be sent using SCM_RIGHTS (see https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md#passing_file_descriptor).
Using an EventPipeEventSource to validate events streamed through EventPipe ❌
User_events based EventPipe sessions do not stream events. Instead, events are written to configured TraceFS tracepoints, and currently only RecordTrace from https://github.com/microsoft/one-collect/ is capable of generating .nettrace traces from tracepoint user_events.

Native EventPipe Unit Tests

There are Mono Native EventPipe tests under src/mono/mono/eventpipe/test that are not hooked up to CI. These unit tests are built through linking the shared EventPipe interface library against Mono's EventPipe runtime shims and using Mono's test runner. To update these unit tests into the standard runtime tests structure, a larger investment is needed to either migrate EventPipe from using runtime shims to a OS Pal source shared by coreclr/nativeaot/mono (see #118874 (comment)) or build an EventPipe shared library specifically for the runtime test using a runtime-agnostic shim.
As existing mono unit tests don't currently test IPC commands, coupled with no existing runtime infrastructure to read events from tracepoints, there would be even more work on top of updating mono native eventpipe unit tests to even test the user_events scenario.

End-to-End Testing Added

A low-cost approach to testing .NET Runtime's user_events functionality leverages RecordTrace from https://github.com/microsoft/one-collect/, which is already capable of starting user_events based EventPipe sessions and generating .nettraces. (Note: dotnet-trace wraps around RecordTrace)
Despite adding an external dependency which allows RecordTrace failures to fail the end-to-end test, user_events was initially added with the intent to depend on RecordTrace for the end-to-end scenario, and there are no other ways to functionally test a user_events based eventpipe session.

Approach

Start Tracee app
Start tracing with RecordTrace + dotnet-common profile script
Stop RecordTrace (triggers .nettrace generation) and Tracee app
Validate the .nettrace for particular events from Tracee app

Dependencies:

CI runs the runtime test in an environment that supports user_events
CI runs the runtime test with permissions to access user_events_data.
Microsoft.OneCollect.RecordTrace (transitively resolved through a dotnet diagnostics public feed)
Microsoft.Diagnostics.Tracing.TraceEvent 3.1.24+ (to read NetTrace V6)

Copilot

Pull Request Overview

This PR adds a new test for UserEvents tracing on Linux that validates the runtime's ability to emit trace events through the user_events subsystem. The test uses the Microsoft.OneCollect.RecordTrace tool to capture events from a tracee process and validates that GC events were properly recorded.

Key changes include:

Addition of a new test infrastructure for UserEvents tracing
Upgrade of TraceEvent library from version 3.1.16 to 3.1.28
Implementation of multi-process test orchestration with native signal handling

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/tests/tracing/eventpipe/userevents/usereventstracee.cs	Implements tracee process that generates GC events for validation
src/tests/tracing/eventpipe/userevents/userevents.csproj	Project configuration including NuGet package references and build targets
src/tests/tracing/eventpipe/userevents/userevents.cs	Main test orchestration: spawns processes, collects traces, validates events
src/tests/tracing/eventpipe/userevents/dotnet-common.script	Configuration script for record-trace tool specifying provider and flags
eng/Versions.props	Updates TraceEvent package version to 3.1.28

src/tests/tracing/eventpipe/userevents/userevents.cs

jkotas · 2025-11-03T19:51:00Z

src/tests/tracing/eventpipe/userevents/userevents.csproj

+    <CLRTestTargetUnsupported Condition="'$(TargetOS)' != 'linux' or ('$(TargetArchitecture)' != 'x64' and '$(TargetArchitecture)' != 'arm64')">true</CLRTestTargetUnsupported>
+    <RequiresProcessIsolation>true</RequiresProcessIsolation>
+    <ReferenceXUnitWrapperGenerator>false</ReferenceXUnitWrapperGenerator>
+    <TargetFrameworkIdentifier>.NETCoreApp</TargetFrameworkIdentifier>


Nit: Is this line needed?

I was mostly following convention of other runtime tests. I'll remove it for this test, but when would it be appropriate to add?

I would expect this to be the default for all projects under src\tests

…nal_runtime_test

jkotas · 2025-11-03T19:59:19Z

src/tests/tracing/eventpipe/userevents/userevents.csproj

+    <ReferenceXUnitWrapperGenerator>false</ReferenceXUnitWrapperGenerator>
+    <TargetFrameworkIdentifier>.NETCoreApp</TargetFrameworkIdentifier>
+    <IlasmRoundTripIncompatible>true</IlasmRoundTripIncompatible>
+    <MicrosoftOneCollectRecordTraceVersion>0.1.32221</MicrosoftOneCollectRecordTraceVersion>


Should this be in eng\Versions.props where we keep track of versions of all dependencies?

I figured that since this test acquires Microsoft.OneCollect.RecordTrace through dotnet-diagnostics-tests feed via RestoreAdditionalProjectSources I felt that it was more appropriate to scope the version only to the test. Nothing else in the repo uses that feed, and nothing else uses Microsoft.OneCollect.RecordTrace, so eng/Versions.props didn't feel right. But I can move it to eng/Versions.props if that's preferred. It just seemed misleading since I ddidnt think other projects in the repo could even acquire the package without adding the dotnet-diagnostics-tests as a nuget source.

Many versions in Versions.props are one-offs. For example, GrpcAuthVersion is only used by src\tests\FunctionalTests\Android\Device_Emulator\gRPC\Android.Device_Emulator.gRPC.Test.csproj

Nothing else in the repo uses that feed,

I am not sure whether adding the extra field is going to be compatible with the eng system infrastructure. I do not see any other place that adds feeds this way in the repo.

Changed to using a test local NuGet.Config to add dotnet-diagnostics-tests as a source.

jkotas · 2025-11-03T21:00:06Z

a basic end-to-end user_events scenario

I like this approach.

noahfalk · 2025-11-03T23:20:46Z

src/tests/tracing/eventpipe/userevents/userevents.cs

+            string appBaseDir = AppContext.BaseDirectory;
+            string recordTracePath = Path.Combine(appBaseDir, "record-trace");
+            string scriptFilePath = Path.Combine(appBaseDir, "dotnet-common.script");
+            string traceFilePath = Path.Combine(appBaseDir, trace);


We probably want to create a path in some randomly generated temp file (Path.GetTempFileName()). This helps avoid issues if the test runs multiple times and doesn't correctly clean up or if there is ever a scenario where the directory holding the test binary isn't writable.

Switched to Path.GetTempFileName(),

noahfalk · 2025-11-03T23:28:17Z

src/tests/tracing/eventpipe/userevents/userevents.cs

+
+            if (!File.Exists(recordTracePath) || !File.Exists(scriptFilePath))
+            {
+                Console.WriteLine("record-trace or dotnet-common.script not found. Test cannot run.");


Its helpful to print the exact path we didn't find. (In general its helpful for the test logging to err towards being overly verbose to make failure diagnosis easier)

Right, split into separate if blocks to have a more specific error message.

noahfalk · 2025-11-03T23:38:21Z

src/tests/tracing/eventpipe/userevents/userevents.cs

+
+            using Process traceeProcess = Process.Start(traceeStartInfo);
+            using Process recordTraceProcess = Process.Start(recordTraceStartInfo);
+            recordTraceProcess.OutputDataReceived += (_, args) => Console.WriteLine($"[record-trace] {args.Data}");


We should redirect the tracee output as well to ensure we aren't losing useful error diagnostics that it might print.

noahfalk · 2025-11-03T23:40:12Z

src/tests/tracing/eventpipe/userevents/userevents.cs

+                return -1;
+            }
+
+            ProcessStartInfo traceeStartInfo = new();


I'd suggest logging the command-line for both processes being launched.

Added the commands and args near Process.Start and also added logs for the PID associated with the tracee and record-trace.

noahfalk · 2025-11-03T23:41:17Z

src/tests/tracing/eventpipe/userevents/userevents.cs

+            recordTraceProcess.ErrorDataReceived += (_, args) => Console.Error.WriteLine($"[record-trace] {args.Data}");
+            recordTraceProcess.BeginErrorReadLine();
+
+            if (!traceeProcess.HasExited && !traceeProcess.WaitForExit(15000))


I'd suggest logging that the test is waiting for the tracee to exit, and certainly log if the wait times out and we kill it.

good point, added more diagnostics

noahfalk · 2025-11-03T23:46:19Z

src/tests/tracing/eventpipe/userevents/userevents.cs

+        }
+
+        public static int TestEntryPoint()
+        {


We should add checks for:

process is elevated

OS is Linux

user_events are supported

Its likely at some point this test will be run in the wrong environment and the logs should make it trivial to diagnose.

Maybe we should not even build the test on none linux platforms. CLRTestTargetUnsupported msbuild property could be used to exclude a test on specific platforms.

The CLRTestTargetUnsupported is in the csproj, so it should hopefully prevent this test from running on non linux-x64/linux-arm64 platforms. Then again, I think more logic is needed to check for Alpine.

Added checks for geteuid and checking if sys/kernel/tracing/user_events_data exists

noahfalk · 2025-11-03T23:47:51Z

src/tests/tracing/eventpipe/userevents/userevents.cs

+            }
+
+            // Until record-trace supports duration, the only way to stop it is to send SIGINT (ctrl+c)
+            kill(recordTraceProcess.Id, SIGINT);


Add more logging that we sent SIGINT, waiting for exit, and if we killed the process after timeout.

Added, thanks!

noahfalk · 2025-11-03T23:52:51Z

src/tests/tracing/eventpipe/userevents/userevents.cs

+
+        private static bool ValidateTraceeEvents(string traceFilePath)
+        {
+            string etlxPath = TraceLog.CreateFromEventPipeDataFile(traceFilePath);


You can parse the .nettrace file directly using EventPipeEventSource in TraceEvent. This avoids creating a 2nd file that the test also needs to clean up.

For some reason when I tried it last, it didn't work (pilot error), switched to using EventPipeEventSource. Right now the events are "unknown" with id. Maybe its cause I'm using the Dynamic parser? I'll look into TraceEvent more closely

noahfalk · 2025-11-03T23:57:52Z

src/tests/tracing/eventpipe/userevents/userevents.cs

+            recordTraceStartInfo.RedirectStandardError = true;
+
+            using Process traceeProcess = Process.Start(traceeStartInfo);
+            using Process recordTraceProcess = Process.Start(recordTraceStartInfo);


To ensure tracer observes the tracee we should start the tracer process first.

Switched. Originally I was wondering if we should pass --pid {traceePid}, but the process isolation should prevent this test from tracing another runtime test and mistaking the events should the tracee process crash. But I guess even in that case... it showed that user_events were collected.

noahfalk · 2025-11-04T00:00:46Z

src/tests/tracing/eventpipe/userevents/usereventstracee.cs

+        public static void Run()
+        {
+            long startTimestamp = Stopwatch.GetTimestamp();
+            long targetTicks = Stopwatch.Frequency * 10; // 10s


How about 1 second instead? Ideally we want tests to run quickly whenever possible.

lateralusX · 2025-11-04T09:04:23Z

src/tests/tracing/eventpipe/userevents/userevents.cs

+            bool startEventFound = false;
+            bool stopEventFound = false;
+
+            source.AllEvents += (TraceEvent e) =>


Is the plan to add more tests here that checks for metadata/fields, rundown events, callstacks etc or should we add some basic verification to this test or plan to extend existing EventPipe tests to also work over UserEvents?

This was mainly to add a basic verification that the end to end runtime side (from accepting the ipc message to writing to the tracepoints worked). Since User_events is built on EventPipe, my initial thought is that duplicating the existing eventpipe tests for user events wouldn't be adding anything. I think we can add more tests later on, but not sure what coverage is good and reasonable. I'm not even sure yet if our CI machines have user_events, or if they run with elevated privileges, so this was mainly to see if we can have a basic E2E test going.

We should definitely not duplicate but maybe look on extending the existing event pipe tests to run over user events + additional validate logic but agree that is something we could look at later.

So, if this is mainly a smoke test then maybe we should make sure we at least hit things we know is handled in the one-collect library, during the work we hit a number of things that needed special attention, like activity id's, custom metadata and potential stack traces. Right now, these only tests one runtime start/stop event fired under very unique circumstances. Maybe we should do some short multi-threading scenario as well, making sure we won't hit any races in the code path unique to user events?

For a basic E2E test, just enabling GC is sufficient. Locally running this test, occasionally the test failed because it didn't have the expected events. It's not clear why, but with a brief tracee timeout of 1s, collecting dotnet-trace's default keywords and enabling all associated events is overkill for a fast E2E test.

Even after switching to just enabling GCKeyword, the 1s tracee app had a 1% failure rate locally. On the other hand, AllocationSampled had no failures after 1000 runs.

[UserEvents] Add end-to-end runtime test

47231e1

mdh1418 requested review from Copilot, hoyosjs, jkoritzinsky, jkotas, lateralusX, noahfalk and steveisok November 3, 2025 19:40

github-actions bot added the area-Tracing-coreclr label Nov 3, 2025

dotnet-policy-service bot assigned mdh1418 Nov 3, 2025

mdh1418 requested a review from akoeplinger November 3, 2025 19:42

Copilot AI reviewed Nov 3, 2025

View reviewed changes

jkotas reviewed Nov 3, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into user_events_functio…

963efe1

…nal_runtime_test

jkotas reviewed Nov 3, 2025

View reviewed changes

mdh1418 added 4 commits November 3, 2025 20:31

Cleanup test props

e712df6

Add ProcessStartInfo and using keyword

2b246be

Fix process exit detection

a7afeb9

Retain nettrace test asset

c41916a

noahfalk reviewed Nov 4, 2025

View reviewed changes

lateralusX reviewed Nov 4, 2025

View reviewed changes

mdh1418 added 2 commits November 7, 2025 02:57

Address feedback

963af3f

Fix RecordTrace package resolving

005cc94

Alternative user_events_data access check

5d09bd2

This was referenced Nov 8, 2025

System.IO.Tests.File_GetSetTimes_SafeFileHandle.WritingShouldUpdateWriteTime_After_SetLastAccessTime failing #97020

Open

[Long Running Test] 'System.Composition.UnitTests.ConcurrencyTests.SharedInstancesAreNotVisibleUntilActivationCompletes' #113686

Open

mdh1418 added 5 commits November 10, 2025 20:11

Fix record-trace startup

146b348

Detect record-trace unexpected exit and flush output

a8d5d97

Fix record-trace output

f9531d2

Switch to AllocationSampled Event

a0c37b6

Even after switching to just enabling GCKeyword, the 1s tracee app had a 1% failure rate locally. On the other hand, AllocationSampled had no failures after 1000 runs.

This was referenced Nov 11, 2025

AppHost test failure: I/O failure when writing extracted files #120653

Open

[wasm] Microsoft.Playwright.TargetClosedException : Target page, context or browser has been closed #121195

Open

mdh1418 force-pushed the user_events_functional_runtime_test branch from 7e35aea to ba6d4d9 Compare November 11, 2025 21:36

Attempt to copy userevents assets for CI

55edad6

mdh1418 force-pushed the user_events_functional_runtime_test branch from ba6d4d9 to 55edad6 Compare November 12, 2025 01:14

[UserEvents] Add end-to-end runtime test #121316

Are you sure you want to change the base?

[UserEvents] Add end-to-end runtime test #121316

Conversation

mdh1418 commented Nov 3, 2025

Alternative testing approaches considered

Existing EventPipe runtime tests

Native EventPipe Unit Tests

End-to-End Testing Added

Approach

Dependencies:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas commented Nov 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lateralusX Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Nov 3, 2025 •

edited

Loading

lateralusX Nov 4, 2025 •

edited

Loading

lateralusX Nov 4, 2025 •

edited

Loading