Skip to content

Conversation

@Forgind
Copy link
Contributor

@Forgind Forgind commented Mar 23, 2022

Context

We currently log all environment variables, which is useful in giving you a sense of where things come from sometimes but doesn't really tell you if those environment variables are used or what other environment variables we might have expected. This shifts to logging only those environment variables we actually used. This also partially mitigates potential privacy issues, as random environment variables with secrets aren't logged unless they happen to correspond to something MSBuild actually uses during the build.

Changes Made

Keep track of environment variable usages for looking at properties. Log those in the binlog instead of all environment variables.

More specifically, we'd previously kept track of environment-derived properties under more generic types such as ProjectPropertyNotXmlBacked. I added an extension specifically for environment-derived properties. That derived class keeps track of relatively few new things: logging context and whether it's been logged already. In some cases, that logging context is up-to-date; in others, it's invalidated before we could use it, in which case we replace it with the correct logging context as brought forward from wherever we actually had one.

With the logging context and information about the property, once we know we need to log it, we log an EnvironmentVariableRead build event. With this change, that event is now properly serialized to the main node, where it is outputted to the binlog and text logs.

The binlog viewer was also modified to better accommodate the EnvironmentVariableRead events.

Finally, the binlog writer was modified such that, unless you specify that you should log all environment variables, it would not output the environment for BuildStarted. I made a similar change for the text logger. I also filtered which properties made it to the event args to weed out environment-derived properties.

Testing

Verified that environment variables are not logged in the binlog by default. Verified that they are logged if they are used. (Also checked the text logs for both.) Modified some unit tests and added one.

Notes

It seems that all environment-derived properties used in execution go through this method. Ideally, that would make this PR tiny: log at that point rather that an BuildStarted. However, there is no single logging context possible at that point, which meant that was brought in from the various ways properties could be used.

Forgind added 2 commits March 22, 2022 16:06
This is both a privacy issue and a waste from a performance perspective.
@marcpopMSFT
Copy link
Member

@Forgind @KirillOsenkov We should chat offline about this change. I had thought that Kirill wanted to know which environment variables were used in the build so we could hide the rest from the binlogs. This PR appears to log whenever an environment property is used which is somewhat different. It also does it by checking for an existing environment variable every time a property is accessed so I'd expect there to be perf implications.

Since every environment variable is converted to a property, shouldn't we just record the initial list at that time and so we won't have to call GetEnvironmentVariable later? Rainer also mentioned that there was code that sort of did this already but had perf issues so it might be better to track down that code and see if the perf issues can be resolved.

@rainersigwald
Copy link
Member

The prior art in this area is around #3432, which moved forward with #5038 but isn't finished. Would it be possible to pick that up instead?

@Forgind
Copy link
Contributor Author

Forgind commented Mar 23, 2022

This keeps track of which environment variables were used as properties in the build and (by default) hides the rest from binlogs. It isn't actually logging anything when an environment property is used, just putting that in a dictionary that we log once with BuildFinishedEventArgs(2). I do access and environment variable every time I add it to the dictionary, but that part is just an implementation detail; I can get the (possibly but probably not stale) value from a dictionary full of environment variables instead.

We don't have an initial list of properties that will be read as environment variables. All we have is which end up being read as environment variables, as you see here. The previous environment variable logging didn't care about which were actually used and just logged everything.

I am aware of that issue and PR. The PR was ultimately reverted because of performance issues. It's a big change that does more than this PR attempts. I'd also have to look into which RPS tests it failed, try to recreate them, analyze what went wrong from a performance perspective, and try to fix it. It's fully possible in my mind that at the end of that, it may be impossible to truly fix that PR without losing functionality. This is a much more scoped PR that accomplishes the most important part of maneely's PR without taking weeks. So I'd really rather not pick up #5038 and just try to optimize it.

@Forgind
Copy link
Contributor Author

Forgind commented Mar 23, 2022

Add environment variables to text log (replace missing ones)

@marcpopMSFT
Copy link
Member

My main concern is that we know that this has been tried before and had problematic performance impact. There are also a few different ways of solving this. So I think it's worth having design discussion on what the goal is.

Open questions in my mind:

  • Should this be opt in to limit perf impact?
  • Can we record the initial list of env variables so we aren't calling Environment.Get... every time.
  • Should we include a bool on the property object on whether it was used or not and merge with the environment list at the end once?
  • Is there value in sharing with customers what properties aren't used that aren't environment variables (to clean up project files perhaps though there is risk that a property would get used with slightly different build params)?

@KirillOsenkov
Copy link
Member

The previous approach logged a few event args, such as EnvironmentVariableReadEventArgs, etc. It was expensive because of logging all these extra args messages.

This approach is much more targeted and from what I saw the perf impact should be negligible (we should measure however).

We should be careful about adding fields on a class that is potentially instantiated millions of times.

Tracking used properties in general I think should be separate from this PR. Here we just want the minimum surgical fix to track and log only the environment variables that were actually used. Everything else I think should be out of scope.

private static object LookupProperty(IPropertyProvider<T> properties, string propertyName, int startIndex, int endIndex, IElementLocation elementLocation, UsedUninitializedProperties usedUninitializedProperties)
{
T property = properties.GetProperty(propertyName, startIndex, endIndex);
if (!string.IsNullOrEmpty(property?.EvaluatedValue))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use pattern-matching here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I am confused, this seems to just make a list of all properties ever read? We only need the properties that were resolved to an environment variable. I would imagine that we track it somewhere close to where we take a snapshot of the environment at the start of the build?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think we should move this logic down to line 1519, in the else block (when we have an escaped value).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically here we should check whether the property we got came from the environment (see

_properties.Set(property);
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and yes, move this new if block into the else block, where we've verified that the property is not null. here is too early.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_data.SetProperty(environmentProperty.Name, ((IProperty)environmentProperty).EvaluatedValueEscaped, isGlobalProperty: false, mayBeReserved: false, isEnvironmentVariable: true);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by pattern-matching I mean:

if (property?.EvaluatedValue is { Length: > 0 } evaluatedValue) { ... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the added value of pattern matching here of string.IsNullOrEmpty?

Copy link
Member

@KirillOsenkov KirillOsenkov Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worried that if the EvaluatedValue getter does anything expensive, or is not idempotent, we'll be calling it twice. It might be benign or not. I prefer to only call properties once and reuse, just to be safe in general. Pattern matching lets you capture and cache the value.

@KirillOsenkov
Copy link
Member

We should also add a unit-test that builds this project and ensures that BuildStarted has no environment variables, and BuildFinished has a single environment variable OS=Windows_NT (only assert the name, not the value as it may change on other OSs)

<Project>

  <PropertyGroup>
    <A>$(OS)</A>
  </PropertyGroup>

  <Target Name="Build">
    <Message Text="$(A)" />
  </Target>
  
</Project>

@KirillOsenkov
Copy link
Member

I know it looks awful threading logging context everywhere, but I was thinking: maybe ultimately it's the right thing to do? We want to be able to log from all evaluation locations, so in the future when we want to add more logging messages perhaps it should come in handy?

Remember that until recently we weren't able to log anything from evaluation at all, so perhaps consider this trailblazing for future generations.

@KirillOsenkov
Copy link
Member

Hmm, in my testing it didn't log the OS variable here:

<Project>

  <PropertyGroup Condition="$(OS) == 'Windows_NT'">
    <A>$(UserProfile)</A>
  </PropertyGroup>

  <Target Name="Build">
    <Message Text="$(A)" />
  </Target>

</Project>

@KirillOsenkov
Copy link
Member

I'm afraid we're going to need tests for all these usage scenarios (e.g. Condition on property groups).

I'm also still worried about the EvaluatedValue property - how come it's never accessed? Would be nice to place a breakpoint and see if this is ever hit during either a real-world build or a full test run. The worry is that there's a way to bypass the logging of environment variables if we access through that property - I want to know if it's at all possible.

[DebuggerBrowsable(DebuggerBrowsableState.Never)]
string IProperty.EvaluatedValueEscaped => _escapedValue;

string IProperty.EvaluatedValueEscaped
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put breakpoints on other ways to access value to see if/when they're hit.

Run all tests with a Debugger.Launch as well as try building msbuild/roslyn/C++ project.

Once we understand this better, write it down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Augment unit test with Env Vars to be much more comprehensive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if other PR works with unit test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check conditions on:
[ ] - task execution
[ ] - itemgroup in target
[ ] - itemgroup not in target
[ ] - propertygroup in target
[ ] - import statement

@Forgind
Copy link
Contributor Author

Forgind commented May 16, 2022

Interesting...I just tested it, and it logged the EnvVar variable in

Hmm, in my testing it didn't log the OS variable here:

<Project>

  <PropertyGroup Condition="$(OS) == 'Windows_NT'">
    <A>$(UserProfile)</A>
  </PropertyGroup>

  <Target Name="Build">
    <Message Text="$(A)" />
  </Target>

</Project>

Fixed this case. I'd foolishly thought a condition on a target might be the same as a condition on a propertygroup. I'll check a couple other places.

}
else
{
if (property is ProjectPropertyInstance.EnvironmentDerivedProjectPropertyInstance environmentDerivedProperty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to also check for EnvironmentDerivedProjectProperty here? or is it OK?

Copy link
Contributor Author

@Forgind Forgind Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this. I believe it's ok. I looked through uses of EnvironmentDerivedProjectProperties, and I believe they're only relevant at evaluation time, at which point our logging context hasn't yet been invalidated, so they get logged then.

@KirillOsenkov
Copy link
Member

I re-did the perf measurements and was a bit surprised (pleasantly). Could totally be a fluke, but this PR seems slightly faster than baseline and also produces smaller binlogs.

  Baseline: This PR:
  8.828 9.079
  9.204 8.448
  9.875 9.471
  9.114 8.066
  10.119 7.928
  8.6 7.809
  8.566 8.238
  8.781 8.799
  8.615 8.14
Average: 9.078 8.442

Binlog size goes down slightly from 5.137 MB -> 5.058 MB.

I'm guessing this is because we don't log the properties that came from the environment so those huge property lists are now way smaller.

image

Copy link
Member

@KirillOsenkov KirillOsenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it some more and it all looks good to me.

I think I'm reasonably happy with this and as far as I'm concerned it can go in and we can get it battle tested. If we find more cases we can fix in follow-up PRs.

We could also file a bug to track adding some tests later on. I think I'd want to start getting some real world mileage on this.

/// <summary>
/// Log all environment variables whether or not they are used in a build in the binary log.
/// </summary>
public readonly bool LogAllEnvironmentVariables = !string.IsNullOrEmpty(Environment.GetEnvironmentVariable("MSBUILDLOGALLENVIRONMENTVARIABLES"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also behind a changewave?

.ShouldBeEmpty();
.ShouldHaveSingleItem()
.EnvironmentVariableName
.ShouldBe("DEFINED_ENVIRONMENT_VARIABLE2");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment on line 4521 (and similar) please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw one comment to update—were you thinking I should add more, or was that all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the only comment about empty environment, it's all we need. I think I thought there were others in other test cases?


string _initialLoadFilesWriteable = Environment.GetEnvironmentVariable("MSBUILDLOADALLFILESASWRITEABLE");
try
using (TestEnvironment env = TestEnvironment.Create())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for new code I strongly prefer using statements rather than blocks, where possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind explaining why? I don't have a strong preference, but blocks make it clearer to me where scope starts and ends. The only advantage I know of to using statements is that they don't require extra indenting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced indentation. If the using doesn't line up nicely with the containing block, I prefer it, but if it does (as here where it's the whole method), I like the statement which has less ceremony.

LoggingEventType.BuildMessageEvent => ReadBuildMessageEventFromStream(translator, message, helpKeyword, senderName),
LoggingEventType.BuildWarningEvent => ReadBuildWarningEventFromStream(translator, message, helpKeyword, senderName),
LoggingEventType.EnvironmentVariableReadEvent => ReadEnvironmentVariableReadEventFromStream(translator, message, helpKeyword, senderName),
_ => null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to preserve the error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This is the only way this method can return null, and I added a VerifyThrow right after ReadEventFromStream to log the error. That said, KirillOsenkov was also surprised by this. Those are two points in favor of reverting that part, at least. My argument in favor of keeping it as-is is that return switch is beautiful and clear, and the effect is identical. Preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the semantics are identical then I'm fine with leaving as is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return switch is beautiful and clear

You don't have to avoid it here; you can throw inside a switch:

Suggested change
_ => null,
_ => throw ...,

But yeah if the semantics are identical you can leave this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw... would be different semantically; we were using ErrorUtilities.Throw... before, which doesn't throw if (!s_throwExceptions). I don't think that's too important, but I thought it best to avoid that change anyway.

@Forgind
Copy link
Contributor Author

Forgind commented Jun 1, 2022

@KirillOsenkov, what project did you do the perf testing on? I'm curious if there might be a difference between a big project and a smaller one, or perhaps between frameworks/OSs/etc.

@KirillOsenkov
Copy link
Member

I only did a ~270KLOC project on Windows with ~40 projects, fully incremental build. Didn't do much more than that.

@Forgind
Copy link
Contributor Author

Forgind commented Jun 2, 2022

I did some testing for a clean build of a small console app. With this change, it was 5ms slower on average (so easily within standard deviation), and the binlogs were about 2kb smaller.

@Forgind Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Jun 13, 2022
@rainersigwald rainersigwald merged commit c2ec5c9 into dotnet:main Jun 15, 2022
@KirillOsenkov
Copy link
Member

Awesome!!! Congrats everyone! Huge thanks Nathan!

Do we know which release this should ship in?

@rainersigwald
Copy link
Member

Unless something goes wrong, this should be in 17.3 preview 3.

@Forgind Forgind mentioned this pull request Jun 22, 2022
rainersigwald pushed a commit that referenced this pull request Jun 27, 2022
MSBuild has a command-line switch that tells it to output all environment variables. C++ used that fairly regularly. This PR automatically opts out of the environment variable logging change in #7484 for the duration of the one build.

### Changes Made
Made the Trait settable via environment variable or with the file logger parameter. Since the parameter needs to be applied first, this means the Trait has to be set after the start of the build, so it is no longer read-only. In tests, Traits.Instance is reset every time a variable is read, so setting it at the start of the test does not work. For that reason, I had to make it a static variable on Traits instead of Traits.Instance.

Of importance, the Trait is reset (to the environment variable's value) at logger shutdown for any BaseConsoleLogger. There is theoretically a race condition here, since the Trait is set at logger initialization (if the parameter is set) and used for the BuildStarted event. If there are two builds executing concurrently in the entrypoint node, and loggers are initialized for both, then one finishes before the other has gotten to BuildStarted, the latter will not log the environment. On the other hand, that doesn't affect whether your build succeeds, and the resolution is to just run the build again, and it will likely succeed.

### Testing
It passed unit tests. I also created an experimental insertion, and it passed the C++ test.
rainersigwald added a commit that referenced this pull request Jun 30, 2022
Forgind added a commit that referenced this pull request Jun 30, 2022
Summary
Reverts environment-variable tracking from #7484.

We put in a change to, instead of logging all environment variables at the start of any diagnostic build (including in binlogs), only log those environment variables we actually use as part of properties. This was on by default. It hit a few speed bumps, hence multiple commits to revert here. The latest was that if Project System Tools (used to collect binlogs) is installed, builds fail half the time for any project.

Customer Impact
This reverts the change. Customers go back to seeing all environment variables at the start of a build, but they do not see builds failing if they have project system tools installed.

Regression?
Yes

Testing
We verified that this resolves the reported issue manually.

Risk
Low
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants