[Design PR] Spec for dotnet-counters and dotnet-trace run #1522

sywhang · 2020-09-03T00:16:34Z

Related: #1466

I've been working on implementing dotnet-counters run and dotnet-trace run and got them working locally. Before I proceed any further I wanted to bring up a spec PR so that we can all chime in on what this command should really do. My current proposal is for dotnet-counters and dotnet-trace only, but dotnet-monitor could benefit from the same command and I believe the tool was written with the run command in mind.

My proposal is to have the run command be an analogous experience to dotnet-trace collect and dotnet-counters collect/monitor, except it launches the target app and starts collecting it.

For dotnet-counters, a user may want to use the collect command instead of the monitor command, so a flag was added so that the user can specify it depending on the need.

I am open to suggestions on shaping this command so please feel free to share your thoughts!

cc @josalem @noahfalk @shirhatti

josalem

(Yay, a priori documentation!!) I agree that dotnet-<tool> run ... should mirror dotnet-<tool> collect|monitor .... Can I suggest a slight variation on how to specify the executable?

dotnet-trace run <dotnet-trace args> -- <exec + args>

For example,

dotnet-trace run --buffersize 512 --profile cpu-sampling -- dotnet MyAwesomeApp.dll -myArg1 -myArg2 val

That feels more similar to what similar tools, e.g., perf, do:

root@acafcbc23e4f:/# perf sched record -h

 Usage: perf record [<options>] [<command>]
    or: perf record [<options>] -- <command> [<options>]

 ...

I'm not 100% against the -e|--exec + --args combo, but it feels a little clunky to use the verb run and then have all the other arguments be the same as collect|monitor. It might end up being simpler if we added -e|--exec as an argument on collect|monitor instead.

If we do use -e|--exec could we not have --args as well? At least for inner loop, it's common to run a dotnet process as dotnet /bin/.../my.dll. If we use -e|--exec and --args, that becomes dotnet-trace run -e dotnet --args bin/.../my.dll. I think it might be simpler if we just took the entire invocation as single string, e.g., dotnet-trace run --exec "dotnet bin/.../my.dll --my-arg" [rest of dotnet-trace args].

Thoughts?

josalem · 2020-09-03T00:19:52Z

documentation/design-docs/dotnet-tools.md

+    Syntax:
+
+```
+    dotnet-counters collect [-h||--help]


Suggested change

dotnet-counters collect [-h||--help]

dotnet-counters run [-h||--help]

I find the verb run to be confusing. Without reading the proposal, I'd assumed dotnet counters run was similar to dotnet run where I could point it to a directory containing a project file as opposed to needing to specify an executable

sywhang · 2020-09-03T05:49:52Z

@josalem Thanks for the thoughtful suggestions!!

Yay, a priori documentation

We used to do this more often... I figured it would be nice if we do this at least for feature work that we do in this space :p

It might end up being simpler if we added -e|--exec as an argument on collect|monitor instead.

I actually quite like this idea since we might end up adding more commands to dotnet-trace as well (i.e. report), and it would fit in nicely with that.

I'll wait for more people to put in their thoughts and if people are happy with what you suggested here, I would be happy to move forward with this proposal instead.

noahfalk · 2020-09-03T07:51:23Z

Its nice to be able to discuss on the spec and +1 for most of what John said : )

I think a very useful scenario is that if someone has a command line X (ie MyApp.exe -appArg1) that runs their app, it is very convenient if the command to run with a diagnostic tool can be produced solely by concatenating some prefix:

foo bar other stuff X  (for example dotnet-trace collect -- MyApp.exe -appArg1)

This makes it easy to give instructions, to manually compose the command line, or to automate composing it in a script. Other options are possible but take progressively more effort to explain or use:

foo bar other stuff "X"
foo bar "X" other stuff
foo bar --exec "X_executable" --args "X_arguments" other stuff

Doing the top option typically means allowing the "--" to ensure there is no ambiguity between options for the diagnostic tool and arguments for the app to be run that also happen to start with "-". I'd hope System.CommandLine already has support for it as it is a pattern that I've seen in a number of other tools.

I like the suggestion to re-use the collect/monitor verbs rather than adding a new run verb. Combining this with the above is a little obnoxious because dotnet-counters already uses positional arguments as the counter list. I wouldn't be opposed to changing the syntax so that counters are now a comma separated list with an explicit "--counters" option to free up the positional arguments for the target app command line. We could also be nice and treat the positional arguments as a counter list when a PID option is already specified to avoid breaking anyone who might be relying on the current syntax.

shirhatti · 2020-09-03T19:33:03Z

Thanks @sywhang for writing this up!

Although not explicitly mentioned, I assume the stdout/stderr are still wired to the target process being traced?

It might worth looking at other trace tools for prior art on what they do here. If I remember correctly valgrind outputs to stderr while the target process is still wired to stdout

shirhatti · 2020-09-03T19:38:08Z

What happens to child dotnet processes that are spawned? Are they traced by default? Is it configurable?

josalem · 2020-09-03T21:04:04Z

It might worth looking at other trace tools for prior art on what they do here. If I remember correctly valgrind outputs to stderr while the target process is still wired to stdout

I was expecting something like this, but it's good to explicitly call out the behavior.

What happens to child dotnet processes that are spawned? Are they traced by default? Is it configurable?

TL;DR - My preference would be to have the default be "trace all subprocesses" with a flag that makes it "trace first process only".

This tech is based on the Diagnostic Port work in 5.0 which is configured via an env var. Under the covers, this is just creating a Diagnostic Port and setting an environment variable for the System.Diagnostics.Process that we spawn. If that process spawns additional processes, they would see the Diagnostic Port the tool has configured and attempt to connect. If the tool has created a suspend port, then the spawned processes will all suspend and attempt to connect to the tool as well. At that point, we either just resume the runtime or setup an EventPipe session and resume the runtime. All that is to say that is should be pretty easy for us to either trace everything or just the first process.

Another thing we can do similar to the proposed functionality would be a --wait <name of app to wait for> flag, but that probably deserves its own spec, etc. 😄

sywhang · 2020-09-04T10:12:08Z

Thanks for the feedback everyone! I did take some time today to go through some of the existing tools and modified the proposal a bit.

To summarize, I changed the spec to follow what perf does and use the -- flag to indicate the executable being launched & the arg to be passed. I also removed the "run" command and instead use the -- flag in each of the commands instead (i.e. collect, monitor).

Two main reasons for doing this were:

Ease of use: It is intuitively easier for users who are already familiar with the tool to simply append -- followed by whatever command they want to trace/monitor.
Scriptability: As Noah mentioned this may be another use case that can potentially rise and not having to worry about passing around separate arguments in different order would make the UI nicer for scriptability once that becomes more of a common practice.

Some additional thoughts on other things being brought up here:

stdout/stderr are still wired to the target process being traced?
This is an interesting question I didn't think about too much - Initially I was going to make it not wired to the target process by default but I think it may be more common for the users to want the stdout/stderr of the target process being traced, and not the trace itself. What would you suggest here?

TL;DR - My preference would be to have the default be "trace all subprocesses" with a flag that makes it "trace first process only".

Actually, what I was thinking was the opposite - make the executable being specified on the CLI be the one and only process that gets traced, and not any of its subprocesses.

This tech is based on the Diagnostic Port work in 5.0 which is configured via an env var

Yes it is. And for what it's worth I don't think that's something all users need to know or frankly, even care about. From an end user's perspective they may be confused on why all of the subprocesses are being collected by default when it wasn't explicitly configured to do so - the process they launch are explicitly typed by them so they know what is being traced, but tracing all subprocesses doesn't really fall into that category. I also don't know any other dev tools that you can use to launch a process and does something to that process do the same thing to all of its subprocesses. Say you launch a debugger that spawns child processes - by default the debugger won't attach to those processes.

But it is a good point and I think we can add a flag in case the user wants to trace all those processes.

Another thing we can do similar to the proposed functionality would be a --wait flag, but that probably deserves its own spec, etc. 😄

Yes :p Let's wait on that for a bit.

josalem

Actually, what I was thinking was the opposite - make the executable being specified on the CLI be the one and only process that gets traced, and not any of its subprocesses.

I don't have any data or hypothetical scenarios for arguing in either direction, so I'm okay making that the default so long as we have a flag to trace all subprocesses. Perhaps a --trace-children or --follow-fork flag? One evokes the equivalent from windbg and the other from lldb/gdb.

I don't think that's something all users need to know or frankly, even care about

Agreed! 😃

Other than the few nits on the spec itself, I think this is going in a good direction.

The last couple things I think we should try to answer, either in this spec or as part of the implementation:

How do we handle file/session rollover for long-lived processes? Should we?
Can we stop tracing before the end of the process?
- E.g., if I'm doing fusion log collection, but I don't care about anything after startup, do I need to wait till process end to get my log?

josalem · 2020-09-04T18:38:07Z

documentation/design-docs/dotnet-tools.md

+
+    2. Launch my-server-app.exe and start collecting metrics into a CSV file from its startup with the runtime and Kestrel counters and update interval of once per minute.
+```
+    dotnet-counters collect System.Runtime Microsoft-AspNetCore-Server-Kestrel --format csv -- C:\path\to\my-server-app.exe --refresh-interval 60


I would expect that everything after -- will be used with the subprocess. Did you mean to have --refresh-interval 60 before --?

josalem · 2020-09-04T18:38:33Z

documentation/design-docs/dotnet-tools.md

+                            [--format <csv|json>]
+                            [--refreshInterval <sec>]
+                            counter_list
+                            [ -- <command-to-execute>]


Suggested change

[ -- <command-to-execute>]

[ -- <command-to-execute> [command options]]

josalem · 2020-09-04T18:39:23Z

documentation/design-docs/dotnet-tools.md

+    -h, --help
+        Show command line help
+
+    --exec,--executable


Do we still need --exec if we use the -- <command> [command args] model?

josalem · 2020-09-04T18:39:50Z

documentation/design-docs/dotnet-tools.md

+                     [--profile <profile_name>]
+                     [--providers <list-of-comma-separated-providers>]
+                     [--format <trace-file-format>]
+                     -- <command-to-execute>


Suggested change

-- <command-to-execute>

[-- <command-to-execute> [command args]]

sywhang · 2020-09-11T01:36:57Z

I've been thinking about what to do with the UI. The main question around this is what we do with the UI (stdout/stdin/stderr) for target application being launched since this also affects the lifetime of the target app (i.e. stop tracing before the app exits, etc.).

Some options I have thought of:

The target app's stdout/stdin/stderr are redirected and ignored.
Interleave the output window. We can get a little "fancy" with the output window and display stdout from the target app in a separate "window" within the CLI window. For instance, something that looks like the following:

=========== counters ============
[System.Runtime]
    CPU Usage (%)              3
    Working Set (MB)           42 
=========== app output ===========
hello from application
hi

This still leaves the question of what to do about stdin and the lifetime of the target app. (Whether stdin should be used for the counter, or the app, or both => stop tracing/counter session means stop both tracing/counter AND the app).

The target app that gets launched takes over the stdout/stdin/stderr. To get the UI, use dotnet-counters/dotnet-monitor "attach" command on a separate console (we can play around with the name here) to attach to the existing session and see the tools UI. The user can stop the session on that UI separately.
This is similar to option 3, but instead of having the app start spewing out the session immediately, we can make the "run" command be a simple launcher that starts the app and hangs it at startup. The user can use a separate command to "start" the diagnostic session and get the UI directly.

None of these options seem "complete" enough to cover all use cases, and I can think of use cases that would make each one of these options "better" than others. I was hoping to get thoughts on these options, or any other thoughts people might have.

josalem · 2020-09-12T00:22:11Z

Right off the bat, my head goes to how most debuggers on Linux work:

If you run the process via the debugger (lldb /my-bin -- --my-args), then the debugger owns stdin, but shares std{out|wrn|err}. Output from the target app can garble the debugger output and the debugger makes no attempt to correct or prevent that.
You can also run the debugger in attach mode (lldb -w -n myapp) in one console and run your app in another. This solves all issues surrounding std* since the target app has its own terminal, and the debugger does as well.

If we follow this model, we effectively have two use cases:

I don't care about app output or my app has no output
I do care about app output or my app is interactive

We've already settled on using a dotnet-tool collect ... [-- <my exe> [args]] syntax. That covers use case 1. What if we bring in the --wait flag here and use that to cover use case 2?

console A

$ dotnet-trace collect ... --wait myapp
Waiting for a .NET process named "myapp".
Configure the app environment to connect by setting DOTNET_DiagnosticPorts=<randomly generated port>
...
Connected!
<regular UI for trace>

console B

$ (export DOTNET_DiagnosticPorts=<randomly generated port>; ./myapp)
<myapp UI/output>

This would give us a simple UX for simple apps (1), but also gives us a much more flexible UX at the cost of simplicity (2).

Another thought I had was to reduce the frequency of updates to stdout for our tooling and to skip line rewriting. That's more difficult with dotnet-counters since the stdout output is the purpose of the monitor verb so reducing the frequency reduces the value. It would probably work well for trace though.

One way we can handle the "trace ends before app" issues is to make users press ctrl-c twice to end the target app. The first press triggers the trace stopping logic and the second gets forwarded to the app.

Applying the last two observations to use cases 1 and 2 from above gets us the following:

case 1: dotnet-tool collect ... [-- <exe> [args]] =>
- interleaved stdout
- forward stdin
- 2x ctrl-c to end inner app
- ENTER doesn't stop trace
case 2: --wait with two consoles =>
- no change to UI for tools
- no worries about target app's UI

Because we are adding this functionality on to collect and monitor, I think the --wait flag should be trivial to add. We could solve use case 1 with this first spec and follow up with the --wait flag spec to cover use case 2.

noahfalk · 2020-09-12T01:42:49Z

I very much like @josalems suggestion to split the scenario into those two cases that tradeoff simplicity for access to interactive console. A few spots I was uncertain of:

Another thought I had was to reduce the frequency of updates to stdout

Was this suggestion an entirely different alternative or a suggestion of behavior to add to case 1 where we are asserting the user doesn't care about the output? If it is the latter then it wasn't clear what our rationale for changing the output would be because we are already assuming the user doesn't care about their app output.

One way we can handle the "trace ends before app" issues is to make users press ctrl-c twice

What do you think are the odds that a user won't care about their app's output but does care about having it keep running after the trace is complete? My guess is that this is a fairly uncommon set of requirements and based on that I'd propose serving this scenario with case 2 so that we make case 1 simpler. In case 1 the app would end as soon as the tracing does, at the first ctrl-C.

josalem · 2020-09-12T01:56:36Z

Was this suggestion an entirely different alternative or a suggestion of behavior to add to case 1 where we are asserting the user doesn't care about the output? If it is the latter then it wasn't clear what our rationale for changing the output would be because we are already assuming the user doesn't care about their app output.

It was meant to be another suggestion on top of case 1, but your comment has made me reconsider and I don’t think it’s worth including.

I’m okay, okay with your suggestions about the crew-c handling as well Noah. I’d rather case 1 be as simple as possible and we leave case 2 for anyone with more advanced requirements. If we get feedback that case 1 needs more options or different behavior, then we can course correct at that point 😄

noahfalk

Arg, I wrote this feedback weeks ago and forgot to hit submit : (

noahfalk · 2020-09-04T10:43:27Z

documentation/design-docs/dotnet-tools.md

+
+    1. Launch my-server-app.exe and start monitoring it from its startup with default list of counters and interval.
+```
+    dotnet-counters monitor -- C:\path\to\my-server-app.exe


Not a big deal either way, but will it also work if I write?

dotnet-counters monitor C:\path\to\my-server-app.exe

Typically the "--" means everything after that point should be treated as an argument, not an option, regardless if it starts with a -. However in this case there are no tokens that start with "-" after the "--" so it isn't ambiguous. As far as I know most tools that utilize the "--" syntax don't require it in the non-ambiguous case.

noahfalk · 2020-09-04T10:46:49Z

documentation/design-docs/dotnet-tools.md


 ```

+MONITOR / COLLECT on Startup (on .NET 5 or later)


The error case is also pretty interesting. What happens if I run this command on an executable that doesn't wind up starting a runtime version >= 5?

noahfalk · 2020-09-04T10:49:36Z

documentation/design-docs/dotnet-tools.md

+
+    2. Launch my-server-app.exe and start collecting metrics into a CSV file from its startup with the runtime and Kestrel counters and update interval of once per minute.
+```
+    dotnet-counters collect System.Runtime Microsoft-AspNetCore-Server-Kestrel --format csv -- C:\path\to\my-server-app.exe --refresh-interval 60


Based on my understanding of how "--" is usually interpretted in other apps (and probably how System.CommandLine would interpret it?) I would expect what this actually should do is attempt to run the process:

System.Runtime Microsoft-AspNetCore-Server-Kestrel C:\path\to\my-server-app.exe --refresh-interval 60

I don't think that is what you meant to have it do. My understanding of the parse rules are something like this pseudo-code:

bool parsingOptions = true; foreach(string token in commandLine.Split(whitespace)) { if(token == "--" && parsingOptions) { parsingOptions = false; continue; } if(parsingOptions && token.StartsWith("-")) { // this is an option, skip it an potentially the next token too if the option consumes a value (not shown) continue; } arguments.Add(token); }

System.Runtime and Microsoft-AspNetCore-Server-Kestrel are arguments because they don't start with a "-". "--format csv" is an option because it does start with a dash and it consumes the following token. "--" disables all further options. "C:\path\to\my-server-app.exe" "--refresh-interval" and "60" are all arguments because regardless what they start with option parsing is disabled. Then the command-line is inferred to be all the arguments joined together with single spaces.

This is why counters using using a counter_list as its current argument list is problematic, it collides with interpreting arguments as being parts of the command-line. My suggestion is to change the syntax to something like this:

dotnet-counters collect --counters System.Runtime,Microsoft-AspNetCore-Server-Kestrel --format csv --refresh-interval 60 -- C:\path\to\my-server-app.exe

Back-compat with old syntax can still work by observing if the -p {PID} option is present and interpreting any arguments as elements in the counter list in that case only.

noahfalk · 2020-09-04T11:12:19Z

documentation/design-docs/dotnet-tools.md

+
+```
+    dotnet-counters collect [-h||--help]
+                            [--exec|--executable <path>]


Suggested change

[--exec|--executable <path>]

noahfalk · 2020-09-04T11:12:58Z

documentation/design-docs/dotnet-tools.md

+                            [--mode <monitor|collect>]
+                            [--format <csv|json>]
+                            [--refreshInterval <sec>]
+                            counter_list


Suggested change

counter_list

--counters counter_list

noahfalk · 2020-09-04T11:13:53Z

documentation/design-docs/dotnet-tools.md

+    dotnet-counters collect [-h||--help]
+                            [--exec|--executable <path>]
+                            [-o|--output <name>]
+                            [--mode <monitor|collect>]


Suggested change

[--mode <monitor|collect>]

noahfalk · 2022-07-18T22:11:31Z

I am closing old PRs that have had no progress in a long time. You are welcome to resume the work any time and open a new PR.

sywhang added 2 commits September 2, 2020 17:02

spec for dotnet-counters and dotnet-trace run

e58639f

counters mode

cb27452

sywhang added enhancement New feature or request diagnostic global tooling labels Sep 3, 2020

josalem reviewed Sep 3, 2020

View reviewed changes

update spec

c02484b

josalem reviewed Sep 5, 2020

View reviewed changes

noahfalk reviewed Oct 9, 2020

View reviewed changes

noahfalk mentioned this pull request Oct 9, 2020

Enable dotnet-trace and dotnet-counters collection from startup #1635

Merged

noahfalk mentioned this pull request Jan 31, 2021

[Feature Request] Make dotnet-trace trace from startup more useable for applications that have output, or are interactive #1946

Closed

Base automatically changed from master to main March 12, 2021 05:48

hoyosjs requested a review from a team as a code owner March 12, 2021 05:48

noahfalk closed this Jul 18, 2022

github-actions bot locked and limited conversation to collaborators Jan 17, 2024

	dotnet-counters collect [-h\|\|--help]
	dotnet-counters run [-h\|\|--help]

	[ -- <command-to-execute>]
	[ -- <command-to-execute> [command options]]

	-- <command-to-execute>
	[-- <command-to-execute> [command args]]

[Design PR] Spec for dotnet-counters and dotnet-trace run #1522

[Design PR] Spec for dotnet-counters and dotnet-trace run #1522

Uh oh!

Conversation

sywhang commented Sep 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josalem left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sywhang commented Sep 3, 2020

Uh oh!

noahfalk commented Sep 3, 2020

Uh oh!

shirhatti commented Sep 3, 2020

Uh oh!

shirhatti commented Sep 3, 2020

Uh oh!

josalem commented Sep 3, 2020

Uh oh!

sywhang commented Sep 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josalem left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sywhang commented Sep 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josalem commented Sep 12, 2020

Uh oh!

noahfalk commented Sep 12, 2020

Uh oh!

josalem commented Sep 12, 2020

Uh oh!

noahfalk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noahfalk commented Jul 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sywhang commented Sep 3, 2020 •

edited

Loading

sywhang commented Sep 4, 2020 •

edited

Loading

sywhang commented Sep 11, 2020 •

edited

Loading