Skip to content

Conversation

@sywhang
Copy link
Contributor

@sywhang sywhang commented Sep 3, 2020

Related: #1466

I've been working on implementing dotnet-counters run and dotnet-trace run and got them working locally. Before I proceed any further I wanted to bring up a spec PR so that we can all chime in on what this command should really do. My current proposal is for dotnet-counters and dotnet-trace only, but dotnet-monitor could benefit from the same command and I believe the tool was written with the run command in mind.

My proposal is to have the run command be an analogous experience to dotnet-trace collect and dotnet-counters collect/monitor, except it launches the target app and starts collecting it.

For dotnet-counters, a user may want to use the collect command instead of the monitor command, so a flag was added so that the user can specify it depending on the need.

I am open to suggestions on shaping this command so please feel free to share your thoughts!

cc @josalem @noahfalk @shirhatti

Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Yay, a priori documentation!!) I agree that dotnet-<tool> run ... should mirror dotnet-<tool> collect|monitor .... Can I suggest a slight variation on how to specify the executable?

dotnet-trace run <dotnet-trace args> -- <exec + args>

For example,

dotnet-trace run --buffersize 512 --profile cpu-sampling -- dotnet MyAwesomeApp.dll -myArg1 -myArg2 val

That feels more similar to what similar tools, e.g., perf, do:

root@acafcbc23e4f:/# perf sched record -h

 Usage: perf record [<options>] [<command>]
    or: perf record [<options>] -- <command> [<options>]

 ...

I'm not 100% against the -e|--exec + --args combo, but it feels a little clunky to use the verb run and then have all the other arguments be the same as collect|monitor. It might end up being simpler if we added -e|--exec as an argument on collect|monitor instead.

If we do use -e|--exec could we not have --args as well? At least for inner loop, it's common to run a dotnet process as dotnet /bin/.../my.dll. If we use -e|--exec and --args, that becomes dotnet-trace run -e dotnet --args bin/.../my.dll. I think it might be simpler if we just took the entire invocation as single string, e.g., dotnet-trace run --exec "dotnet bin/.../my.dll --my-arg" [rest of dotnet-trace args].

Thoughts?

Syntax:

```
dotnet-counters collect [-h||--help]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dotnet-counters collect [-h||--help]
dotnet-counters run [-h||--help]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the verb run to be confusing. Without reading the proposal, I'd assumed dotnet counters run was similar to dotnet run where I could point it to a directory containing a project file as opposed to needing to specify an executable

@sywhang
Copy link
Contributor Author

sywhang commented Sep 3, 2020

@josalem Thanks for the thoughtful suggestions!!

Yay, a priori documentation

We used to do this more often... I figured it would be nice if we do this at least for feature work that we do in this space :p

It might end up being simpler if we added -e|--exec as an argument on collect|monitor instead.

I actually quite like this idea since we might end up adding more commands to dotnet-trace as well (i.e. report), and it would fit in nicely with that.

I'll wait for more people to put in their thoughts and if people are happy with what you suggested here, I would be happy to move forward with this proposal instead.

@noahfalk
Copy link
Member

noahfalk commented Sep 3, 2020

Its nice to be able to discuss on the spec and +1 for most of what John said : )

I think a very useful scenario is that if someone has a command line X (ie MyApp.exe -appArg1) that runs their app, it is very convenient if the command to run with a diagnostic tool can be produced solely by concatenating some prefix:

foo bar other stuff X  (for example dotnet-trace collect -- MyApp.exe -appArg1)

This makes it easy to give instructions, to manually compose the command line, or to automate composing it in a script. Other options are possible but take progressively more effort to explain or use:

foo bar other stuff "X"
foo bar "X" other stuff
foo bar --exec "X_executable" --args "X_arguments" other stuff

Doing the top option typically means allowing the "--" to ensure there is no ambiguity between options for the diagnostic tool and arguments for the app to be run that also happen to start with "-". I'd hope System.CommandLine already has support for it as it is a pattern that I've seen in a number of other tools.

I like the suggestion to re-use the collect/monitor verbs rather than adding a new run verb. Combining this with the above is a little obnoxious because dotnet-counters already uses positional arguments as the counter list. I wouldn't be opposed to changing the syntax so that counters are now a comma separated list with an explicit "--counters" option to free up the positional arguments for the target app command line. We could also be nice and treat the positional arguments as a counter list when a PID option is already specified to avoid breaking anyone who might be relying on the current syntax.

@shirhatti
Copy link
Contributor

Thanks @sywhang for writing this up!

Although not explicitly mentioned, I assume the stdout/stderr are still wired to the target process being traced?

It might worth looking at other trace tools for prior art on what they do here. If I remember correctly valgrind outputs to stderr while the target process is still wired to stdout

@shirhatti
Copy link
Contributor

What happens to child dotnet processes that are spawned? Are they traced by default? Is it configurable?

@josalem
Copy link
Contributor

josalem commented Sep 3, 2020

It might worth looking at other trace tools for prior art on what they do here. If I remember correctly valgrind outputs to stderr while the target process is still wired to stdout

I was expecting something like this, but it's good to explicitly call out the behavior.

What happens to child dotnet processes that are spawned? Are they traced by default? Is it configurable?

TL;DR - My preference would be to have the default be "trace all subprocesses" with a flag that makes it "trace first process only".

This tech is based on the Diagnostic Port work in 5.0 which is configured via an env var. Under the covers, this is just creating a Diagnostic Port and setting an environment variable for the System.Diagnostics.Process that we spawn. If that process spawns additional processes, they would see the Diagnostic Port the tool has configured and attempt to connect. If the tool has created a suspend port, then the spawned processes will all suspend and attempt to connect to the tool as well. At that point, we either just resume the runtime or setup an EventPipe session and resume the runtime. All that is to say that is should be pretty easy for us to either trace everything or just the first process.

Another thing we can do similar to the proposed functionality would be a --wait <name of app to wait for> flag, but that probably deserves its own spec, etc. 😄

@sywhang
Copy link
Contributor Author

sywhang commented Sep 4, 2020

Thanks for the feedback everyone! I did take some time today to go through some of the existing tools and modified the proposal a bit.

To summarize, I changed the spec to follow what perf does and use the -- flag to indicate the executable being launched & the arg to be passed. I also removed the "run" command and instead use the -- flag in each of the commands instead (i.e. collect, monitor).

Two main reasons for doing this were:

  1. Ease of use: It is intuitively easier for users who are already familiar with the tool to simply append -- followed by whatever command they want to trace/monitor.

  2. Scriptability: As Noah mentioned this may be another use case that can potentially rise and not having to worry about passing around separate arguments in different order would make the UI nicer for scriptability once that becomes more of a common practice.

Some additional thoughts on other things being brought up here:

stdout/stderr are still wired to the target process being traced?
This is an interesting question I didn't think about too much - Initially I was going to make it not wired to the target process by default but I think it may be more common for the users to want the stdout/stderr of the target process being traced, and not the trace itself. What would you suggest here?

TL;DR - My preference would be to have the default be "trace all subprocesses" with a flag that makes it "trace first process only".

Actually, what I was thinking was the opposite - make the executable being specified on the CLI be the one and only process that gets traced, and not any of its subprocesses.

This tech is based on the Diagnostic Port work in 5.0 which is configured via an env var

Yes it is. And for what it's worth I don't think that's something all users need to know or frankly, even care about. From an end user's perspective they may be confused on why all of the subprocesses are being collected by default when it wasn't explicitly configured to do so - the process they launch are explicitly typed by them so they know what is being traced, but tracing all subprocesses doesn't really fall into that category. I also don't know any other dev tools that you can use to launch a process and does something to that process do the same thing to all of its subprocesses. Say you launch a debugger that spawns child processes - by default the debugger won't attach to those processes.

But it is a good point and I think we can add a flag in case the user wants to trace all those processes.

Another thing we can do similar to the proposed functionality would be a --wait flag, but that probably deserves its own spec, etc. 😄

Yes :p Let's wait on that for a bit.

Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what I was thinking was the opposite - make the executable being specified on the CLI be the one and only process that gets traced, and not any of its subprocesses.

I don't have any data or hypothetical scenarios for arguing in either direction, so I'm okay making that the default so long as we have a flag to trace all subprocesses. Perhaps a --trace-children or --follow-fork flag? One evokes the equivalent from windbg and the other from lldb/gdb.

I don't think that's something all users need to know or frankly, even care about

Agreed! 😃

Other than the few nits on the spec itself, I think this is going in a good direction.

The last couple things I think we should try to answer, either in this spec or as part of the implementation:

  • How do we handle file/session rollover for long-lived processes? Should we?
  • Can we stop tracing before the end of the process?
    • E.g., if I'm doing fusion log collection, but I don't care about anything after startup, do I need to wait till process end to get my log?


2. Launch my-server-app.exe and start collecting metrics into a CSV file from its startup with the runtime and Kestrel counters and update interval of once per minute.
```
dotnet-counters collect System.Runtime Microsoft-AspNetCore-Server-Kestrel --format csv -- C:\path\to\my-server-app.exe --refresh-interval 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that everything after -- will be used with the subprocess. Did you mean to have --refresh-interval 60 before --?

[--format <csv|json>]
[--refreshInterval <sec>]
counter_list
[ -- <command-to-execute>]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[ -- <command-to-execute>]
[ -- <command-to-execute> [command options]]

-h, --help
Show command line help
--exec,--executable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need --exec if we use the -- <command> [command args] model?

[--profile <profile_name>]
[--providers <list-of-comma-separated-providers>]
[--format <trace-file-format>]
-- <command-to-execute>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- <command-to-execute>
[-- <command-to-execute> [command args]]

@sywhang
Copy link
Contributor Author

sywhang commented Sep 11, 2020

I've been thinking about what to do with the UI. The main question around this is what we do with the UI (stdout/stdin/stderr) for target application being launched since this also affects the lifetime of the target app (i.e. stop tracing before the app exits, etc.).

Some options I have thought of:

  1. The target app's stdout/stdin/stderr are redirected and ignored.

  2. Interleave the output window. We can get a little "fancy" with the output window and display stdout from the target app in a separate "window" within the CLI window. For instance, something that looks like the following:

=========== counters ============
[System.Runtime]
    CPU Usage (%)              3
    Working Set (MB)           42 
=========== app output ===========
hello from application
hi

This still leaves the question of what to do about stdin and the lifetime of the target app. (Whether stdin should be used for the counter, or the app, or both => stop tracing/counter session means stop both tracing/counter AND the app).

  1. The target app that gets launched takes over the stdout/stdin/stderr. To get the UI, use dotnet-counters/dotnet-monitor "attach" command on a separate console (we can play around with the name here) to attach to the existing session and see the tools UI. The user can stop the session on that UI separately.

  2. This is similar to option 3, but instead of having the app start spewing out the session immediately, we can make the "run" command be a simple launcher that starts the app and hangs it at startup. The user can use a separate command to "start" the diagnostic session and get the UI directly.

None of these options seem "complete" enough to cover all use cases, and I can think of use cases that would make each one of these options "better" than others. I was hoping to get thoughts on these options, or any other thoughts people might have.

@josalem
Copy link
Contributor

josalem commented Sep 12, 2020

Right off the bat, my head goes to how most debuggers on Linux work:

  • If you run the process via the debugger (lldb /my-bin -- --my-args), then the debugger owns stdin, but shares std{out|wrn|err}. Output from the target app can garble the debugger output and the debugger makes no attempt to correct or prevent that.
  • You can also run the debugger in attach mode (lldb -w -n myapp) in one console and run your app in another. This solves all issues surrounding std* since the target app has its own terminal, and the debugger does as well.

If we follow this model, we effectively have two use cases:

  1. I don't care about app output or my app has no output
  2. I do care about app output or my app is interactive

We've already settled on using a dotnet-tool collect ... [-- <my exe> [args]] syntax. That covers use case 1. What if we bring in the --wait flag here and use that to cover use case 2?

console A

$ dotnet-trace collect ... --wait myapp
Waiting for a .NET process named "myapp".
Configure the app environment to connect by setting DOTNET_DiagnosticPorts=<randomly generated port>
...
Connected!
<regular UI for trace>

console B

$ (export DOTNET_DiagnosticPorts=<randomly generated port>; ./myapp)
<myapp UI/output>

This would give us a simple UX for simple apps (1), but also gives us a much more flexible UX at the cost of simplicity (2).


Another thought I had was to reduce the frequency of updates to stdout for our tooling and to skip line rewriting. That's more difficult with dotnet-counters since the stdout output is the purpose of the monitor verb so reducing the frequency reduces the value. It would probably work well for trace though.


One way we can handle the "trace ends before app" issues is to make users press ctrl-c twice to end the target app. The first press triggers the trace stopping logic and the second gets forwarded to the app.


Applying the last two observations to use cases 1 and 2 from above gets us the following:

  • case 1: dotnet-tool collect ... [-- <exe> [args]] =>
    • interleaved stdout
    • forward stdin
    • 2x ctrl-c to end inner app
    • ENTER doesn't stop trace
  • case 2: --wait with two consoles =>
    • no change to UI for tools
    • no worries about target app's UI

Because we are adding this functionality on to collect and monitor, I think the --wait flag should be trivial to add. We could solve use case 1 with this first spec and follow up with the --wait flag spec to cover use case 2.

@noahfalk
Copy link
Member

I very much like @josalems suggestion to split the scenario into those two cases that tradeoff simplicity for access to interactive console. A few spots I was uncertain of:

Another thought I had was to reduce the frequency of updates to stdout

Was this suggestion an entirely different alternative or a suggestion of behavior to add to case 1 where we are asserting the user doesn't care about the output? If it is the latter then it wasn't clear what our rationale for changing the output would be because we are already assuming the user doesn't care about their app output.

One way we can handle the "trace ends before app" issues is to make users press ctrl-c twice

What do you think are the odds that a user won't care about their app's output but does care about having it keep running after the trace is complete? My guess is that this is a fairly uncommon set of requirements and based on that I'd propose serving this scenario with case 2 so that we make case 1 simpler. In case 1 the app would end as soon as the tracing does, at the first ctrl-C.

@josalem
Copy link
Contributor

josalem commented Sep 12, 2020

Was this suggestion an entirely different alternative or a suggestion of behavior to add to case 1 where we are asserting the user doesn't care about the output? If it is the latter then it wasn't clear what our rationale for changing the output would be because we are already assuming the user doesn't care about their app output.

It was meant to be another suggestion on top of case 1, but your comment has made me reconsider and I don’t think it’s worth including.

I’m okay, okay with your suggestions about the crew-c handling as well Noah. I’d rather case 1 be as simple as possible and we leave case 2 for anyone with more advanced requirements. If we get feedback that case 1 needs more options or different behavior, then we can course correct at that point 😄

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arg, I wrote this feedback weeks ago and forgot to hit submit : (


1. Launch my-server-app.exe and start monitoring it from its startup with default list of counters and interval.
```
dotnet-counters monitor -- C:\path\to\my-server-app.exe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal either way, but will it also work if I write?

dotnet-counters monitor C:\path\to\my-server-app.exe

Typically the "--" means everything after that point should be treated as an argument, not an option, regardless if it starts with a -. However in this case there are no tokens that start with "-" after the "--" so it isn't ambiguous. As far as I know most tools that utilize the "--" syntax don't require it in the non-ambiguous case.

```

MONITOR / COLLECT on Startup (on .NET 5 or later)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error case is also pretty interesting. What happens if I run this command on an executable that doesn't wind up starting a runtime version >= 5?


2. Launch my-server-app.exe and start collecting metrics into a CSV file from its startup with the runtime and Kestrel counters and update interval of once per minute.
```
dotnet-counters collect System.Runtime Microsoft-AspNetCore-Server-Kestrel --format csv -- C:\path\to\my-server-app.exe --refresh-interval 60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding of how "--" is usually interpretted in other apps (and probably how System.CommandLine would interpret it?) I would expect what this actually should do is attempt to run the process:

System.Runtime Microsoft-AspNetCore-Server-Kestrel C:\path\to\my-server-app.exe --refresh-interval 60

I don't think that is what you meant to have it do. My understanding of the parse rules are something like this pseudo-code:

bool parsingOptions = true;
foreach(string token in commandLine.Split(whitespace))
{
    if(token == "--" && parsingOptions)
    {
        parsingOptions = false;
        continue;
    }

    if(parsingOptions && token.StartsWith("-"))
    {
        // this is an option, skip it an potentially the next token too if the option consumes a value (not shown)
        continue;
    }

    arguments.Add(token);
}

System.Runtime and Microsoft-AspNetCore-Server-Kestrel are arguments because they don't start with a "-". "--format csv" is an option because it does start with a dash and it consumes the following token. "--" disables all further options. "C:\path\to\my-server-app.exe" "--refresh-interval" and "60" are all arguments because regardless what they start with option parsing is disabled. Then the command-line is inferred to be all the arguments joined together with single spaces.

This is why counters using using a counter_list as its current argument list is problematic, it collides with interpreting arguments as being parts of the command-line. My suggestion is to change the syntax to something like this:

dotnet-counters collect --counters System.Runtime,Microsoft-AspNetCore-Server-Kestrel --format csv --refresh-interval 60 -- C:\path\to\my-server-app.exe 

Back-compat with old syntax can still work by observing if the -p {PID} option is present and interpreting any arguments as elements in the counter list in that case only.


```
dotnet-counters collect [-h||--help]
[--exec|--executable <path>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[--exec|--executable <path>]

[--mode <monitor|collect>]
[--format <csv|json>]
[--refreshInterval <sec>]
counter_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
counter_list
--counters counter_list

dotnet-counters collect [-h||--help]
[--exec|--executable <path>]
[-o|--output <name>]
[--mode <monitor|collect>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[--mode <monitor|collect>]

@noahfalk
Copy link
Member

I am closing old PRs that have had no progress in a long time. You are welcome to resume the work any time and open a new PR.

@noahfalk noahfalk closed this Jul 18, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Jan 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants