Skip to content

Conversation

@mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Nov 3, 2025

With user_events support added in #115265, this PR looks to test a basic end-to-end user_events scenario.

Alternative testing approaches considered

Existing EventPipe runtime tests

Existing EventPipe tests under src/tests/tracing/eventpipe are incompatible with testing the user_events scenario due to:

  1. Starting EventPipeSessions through DiagnosticClient ❌
    DiagnosticClient does not have the support to send the IPC command to start a user_events based EventPipe session, because it requires the user_events_data file descriptor to be sent using SCM_RIGHTS (see https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md#passing_file_descriptor).

  2. Using an EventPipeEventSource to validate events streamed through EventPipe ❌
    User_events based EventPipe sessions do not stream events. Instead, events are written to configured TraceFS tracepoints, and currently only RecordTrace from https://github.com/microsoft/one-collect/ is capable of generating .nettrace traces from tracepoint user_events.

Native EventPipe Unit Tests

There are Mono Native EventPipe tests under src/mono/mono/eventpipe/test that are not hooked up to CI. These unit tests are built through linking the shared EventPipe interface library against Mono's EventPipe runtime shims and using Mono's test runner. To update these unit tests into the standard runtime tests structure, a larger investment is needed to either migrate EventPipe from using runtime shims to a OS Pal source shared by coreclr/nativeaot/mono (see #118874 (comment)) or build an EventPipe shared library specifically for the runtime test using a runtime-agnostic shim.
As existing mono unit tests don't currently test IPC commands, coupled with no existing runtime infrastructure to read events from tracepoints, there would be even more work on top of updating mono native eventpipe unit tests to even test the user_events scenario.

End-to-End Testing Added

A low-cost approach to testing .NET Runtime's user_events functionality leverages RecordTrace from https://github.com/microsoft/one-collect/, which is already capable of starting user_events based EventPipe sessions and generating .nettraces. (Note: dotnet-trace wraps around RecordTrace)
Despite adding an external dependency which allows RecordTrace failures to fail the end-to-end test, user_events was initially added with the intent to depend on RecordTrace for the end-to-end scenario, and there are no other ways to functionally test a user_events based eventpipe session.

Approach

  1. Start Tracee app
  2. Start tracing with RecordTrace + dotnet-common profile script
  3. Stop RecordTrace (triggers .nettrace generation) and Tracee app
  4. Validate the .nettrace for particular events from Tracee app

Dependencies:

  • CI runs the runtime test in an environment that supports user_events
  • CI runs the runtime test with permissions to access user_events_data.
  • Microsoft.OneCollect.RecordTrace (transitively resolved through a dotnet diagnostics public feed)
  • Microsoft.Diagnostics.Tracing.TraceEvent 3.1.24+ (to read NetTrace V6)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new test for UserEvents tracing on Linux that validates the runtime's ability to emit trace events through the user_events subsystem. The test uses the Microsoft.OneCollect.RecordTrace tool to capture events from a tracee process and validates that GC events were properly recorded.

Key changes include:

  • Addition of a new test infrastructure for UserEvents tracing
  • Upgrade of TraceEvent library from version 3.1.16 to 3.1.28
  • Implementation of multi-process test orchestration with native signal handling

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/tests/tracing/eventpipe/userevents/usereventstracee.cs Implements tracee process that generates GC events for validation
src/tests/tracing/eventpipe/userevents/userevents.csproj Project configuration including NuGet package references and build targets
src/tests/tracing/eventpipe/userevents/userevents.cs Main test orchestration: spawns processes, collects traces, validates events
src/tests/tracing/eventpipe/userevents/dotnet-common.script Configuration script for record-trace tool specifying provider and flags
eng/Versions.props Updates TraceEvent package version to 3.1.28

<CLRTestTargetUnsupported Condition="'$(TargetOS)' != 'linux' or ('$(TargetArchitecture)' != 'x64' and '$(TargetArchitecture)' != 'arm64')">true</CLRTestTargetUnsupported>
<RequiresProcessIsolation>true</RequiresProcessIsolation>
<ReferenceXUnitWrapperGenerator>false</ReferenceXUnitWrapperGenerator>
<TargetFrameworkIdentifier>.NETCoreApp</TargetFrameworkIdentifier>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is this line needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly following convention of other runtime tests. I'll remove it for this test, but when would it be appropriate to add?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to be the default for all projects under src\tests

<ReferenceXUnitWrapperGenerator>false</ReferenceXUnitWrapperGenerator>
<TargetFrameworkIdentifier>.NETCoreApp</TargetFrameworkIdentifier>
<IlasmRoundTripIncompatible>true</IlasmRoundTripIncompatible>
<MicrosoftOneCollectRecordTraceVersion>0.1.32221</MicrosoftOneCollectRecordTraceVersion>
Copy link
Member

@jkotas jkotas Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in eng\Versions.props where we keep track of versions of all dependencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that since this test acquires Microsoft.OneCollect.RecordTrace through dotnet-diagnostics-tests feed via RestoreAdditionalProjectSources I felt that it was more appropriate to scope the version only to the test. Nothing else in the repo uses that feed, and nothing else uses Microsoft.OneCollect.RecordTrace, so eng/Versions.props didn't feel right. But I can move it to eng/Versions.props if that's preferred. It just seemed misleading since I ddidnt think other projects in the repo could even acquire the package without adding the dotnet-diagnostics-tests as a nuget source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many versions in Versions.props are one-offs. For example, GrpcAuthVersion is only used by src\tests\FunctionalTests\Android\Device_Emulator\gRPC\Android.Device_Emulator.gRPC.Test.csproj

Nothing else in the repo uses that feed,

I am not sure whether adding the extra field is going to be compatible with the eng system infrastructure. I do not see any other place that adds feeds this way in the repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to using a test local NuGet.Config to add dotnet-diagnostics-tests as a source.

@jkotas
Copy link
Member

jkotas commented Nov 3, 2025

a basic end-to-end user_events scenario

I like this approach.

string appBaseDir = AppContext.BaseDirectory;
string recordTracePath = Path.Combine(appBaseDir, "record-trace");
string scriptFilePath = Path.Combine(appBaseDir, "dotnet-common.script");
string traceFilePath = Path.Combine(appBaseDir, trace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to create a path in some randomly generated temp file (Path.GetTempFileName()). This helps avoid issues if the test runs multiple times and doesn't correctly clean up or if there is ever a scenario where the directory holding the test binary isn't writable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to Path.GetTempFileName(),


if (!File.Exists(recordTracePath) || !File.Exists(scriptFilePath))
{
Console.WriteLine("record-trace or dotnet-common.script not found. Test cannot run.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its helpful to print the exact path we didn't find. (In general its helpful for the test logging to err towards being overly verbose to make failure diagnosis easier)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, split into separate if blocks to have a more specific error message.


using Process traceeProcess = Process.Start(traceeStartInfo);
using Process recordTraceProcess = Process.Start(recordTraceStartInfo);
recordTraceProcess.OutputDataReceived += (_, args) => Console.WriteLine($"[record-trace] {args.Data}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should redirect the tracee output as well to ensure we aren't losing useful error diagnostics that it might print.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

return -1;
}

ProcessStartInfo traceeStartInfo = new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest logging the command-line for both processes being launched.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the commands and args near Process.Start and also added logs for the PID associated with the tracee and record-trace.

recordTraceProcess.ErrorDataReceived += (_, args) => Console.Error.WriteLine($"[record-trace] {args.Data}");
recordTraceProcess.BeginErrorReadLine();

if (!traceeProcess.HasExited && !traceeProcess.WaitForExit(15000))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest logging that the test is waiting for the tracee to exit, and certainly log if the wait times out and we kill it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, added more diagnostics

}

public static int TestEntryPoint()
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add checks for:

  1. process is elevated
  2. OS is Linux
  3. user_events are supported

Its likely at some point this test will be run in the wrong environment and the logs should make it trivial to diagnose.

Copy link
Member

@lateralusX lateralusX Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should not even build the test on none linux platforms. CLRTestTargetUnsupported msbuild property could be used to exclude a test on specific platforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLRTestTargetUnsupported is in the csproj, so it should hopefully prevent this test from running on non linux-x64/linux-arm64 platforms. Then again, I think more logic is needed to check for Alpine.

Added checks for geteuid and checking if sys/kernel/tracing/user_events_data exists

}

// Until record-trace supports duration, the only way to stop it is to send SIGINT (ctrl+c)
kill(recordTraceProcess.Id, SIGINT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more logging that we sent SIGINT, waiting for exit, and if we killed the process after timeout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!


private static bool ValidateTraceeEvents(string traceFilePath)
{
string etlxPath = TraceLog.CreateFromEventPipeDataFile(traceFilePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can parse the .nettrace file directly using EventPipeEventSource in TraceEvent. This avoids creating a 2nd file that the test also needs to clean up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason when I tried it last, it didn't work (pilot error), switched to using EventPipeEventSource. Right now the events are "unknown" with id. Maybe its cause I'm using the Dynamic parser? I'll look into TraceEvent more closely

recordTraceStartInfo.RedirectStandardError = true;

using Process traceeProcess = Process.Start(traceeStartInfo);
using Process recordTraceProcess = Process.Start(recordTraceStartInfo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure tracer observes the tracee we should start the tracer process first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched. Originally I was wondering if we should pass --pid {traceePid}, but the process isolation should prevent this test from tracing another runtime test and mistaking the events should the tracee process crash. But I guess even in that case... it showed that user_events were collected.

public static void Run()
{
long startTimestamp = Stopwatch.GetTimestamp();
long targetTicks = Stopwatch.Frequency * 10; // 10s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about 1 second instead? Ideally we want tests to run quickly whenever possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

bool startEventFound = false;
bool stopEventFound = false;

source.AllEvents += (TraceEvent e) =>
Copy link
Member

@lateralusX lateralusX Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to add more tests here that checks for metadata/fields, rundown events, callstacks etc or should we add some basic verification to this test or plan to extend existing EventPipe tests to also work over UserEvents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mainly to add a basic verification that the end to end runtime side (from accepting the ipc message to writing to the tracepoints worked). Since User_events is built on EventPipe, my initial thought is that duplicating the existing eventpipe tests for user events wouldn't be adding anything. I think we can add more tests later on, but not sure what coverage is good and reasonable. I'm not even sure yet if our CI machines have user_events, or if they run with elevated privileges, so this was mainly to see if we can have a basic E2E test going.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely not duplicate but maybe look on extending the existing event pipe tests to run over user events + additional validate logic but agree that is something we could look at later.

So, if this is mainly a smoke test then maybe we should make sure we at least hit things we know is handled in the one-collect library, during the work we hit a number of things that needed special attention, like activity id's, custom metadata and potential stack traces. Right now, these only tests one runtime start/stop event fired under very unique circumstances. Maybe we should do some short multi-threading scenario as well, making sure we won't hit any races in the code path unique to user events?

For a basic E2E test, just enabling GC is sufficient.

Locally running this test, occasionally the test failed
because it didn't have the expected events. It's not
clear why, but with a brief tracee timeout of 1s,
collecting dotnet-trace's default keywords and enabling
all associated events is overkill for a fast E2E test.
Even after switching to just enabling GCKeyword,
the 1s tracee app had a 1% failure rate locally.
On the other hand, AllocationSampled had no
failures after 1000 runs.
@mdh1418 mdh1418 force-pushed the user_events_functional_runtime_test branch from ba6d4d9 to 55edad6 Compare November 12, 2025 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants