Log environment-derived properties #7484

Forgind · 2022-03-23T15:57:30Z

Context

We currently log all environment variables, which is useful in giving you a sense of where things come from sometimes but doesn't really tell you if those environment variables are used or what other environment variables we might have expected. This shifts to logging only those environment variables we actually used. This also partially mitigates potential privacy issues, as random environment variables with secrets aren't logged unless they happen to correspond to something MSBuild actually uses during the build.

Changes Made

Keep track of environment variable usages for looking at properties. Log those in the binlog instead of all environment variables.

More specifically, we'd previously kept track of environment-derived properties under more generic types such as ProjectPropertyNotXmlBacked. I added an extension specifically for environment-derived properties. That derived class keeps track of relatively few new things: logging context and whether it's been logged already. In some cases, that logging context is up-to-date; in others, it's invalidated before we could use it, in which case we replace it with the correct logging context as brought forward from wherever we actually had one.

With the logging context and information about the property, once we know we need to log it, we log an EnvironmentVariableRead build event. With this change, that event is now properly serialized to the main node, where it is outputted to the binlog and text logs.

The binlog viewer was also modified to better accommodate the EnvironmentVariableRead events.

Finally, the binlog writer was modified such that, unless you specify that you should log all environment variables, it would not output the environment for BuildStarted. I made a similar change for the text logger. I also filtered which properties made it to the event args to weed out environment-derived properties.

Testing

Verified that environment variables are not logged in the binlog by default. Verified that they are logged if they are used. (Also checked the text logs for both.) Modified some unit tests and added one.

Notes

It seems that all environment-derived properties used in execution go through this method. Ideally, that would make this PR tiny: log at that point rather that an BuildStarted. However, there is no single logging context possible at that point, which meant that was brought in from the various ways properties could be used.

This is both a privacy issue and a waste from a performance perspective.

marcpopMSFT · 2022-03-23T18:58:36Z

@Forgind @KirillOsenkov We should chat offline about this change. I had thought that Kirill wanted to know which environment variables were used in the build so we could hide the rest from the binlogs. This PR appears to log whenever an environment property is used which is somewhat different. It also does it by checking for an existing environment variable every time a property is accessed so I'd expect there to be perf implications.

Since every environment variable is converted to a property, shouldn't we just record the initial list at that time and so we won't have to call GetEnvironmentVariable later? Rainer also mentioned that there was code that sort of did this already but had perf issues so it might be better to track down that code and see if the perf issues can be resolved.

rainersigwald · 2022-03-23T19:24:04Z

The prior art in this area is around #3432, which moved forward with #5038 but isn't finished. Would it be possible to pick that up instead?

Forgind · 2022-03-23T19:32:28Z

This keeps track of which environment variables were used as properties in the build and (by default) hides the rest from binlogs. It isn't actually logging anything when an environment property is used, just putting that in a dictionary that we log once with BuildFinishedEventArgs(2). I do access and environment variable every time I add it to the dictionary, but that part is just an implementation detail; I can get the (possibly but probably not stale) value from a dictionary full of environment variables instead.

We don't have an initial list of properties that will be read as environment variables. All we have is which end up being read as environment variables, as you see here. The previous environment variable logging didn't care about which were actually used and just logged everything.

I am aware of that issue and PR. The PR was ultimately reverted because of performance issues. It's a big change that does more than this PR attempts. I'd also have to look into which RPS tests it failed, try to recreate them, analyze what went wrong from a performance perspective, and try to fix it. It's fully possible in my mind that at the end of that, it may be impossible to truly fix that PR without losing functionality. This is a much more scoped PR that accomplishes the most important part of maneely's PR without taking weeks. So I'd really rather not pick up #5038 and just try to optimize it.

Forgind · 2022-03-23T20:38:39Z

Add environment variables to text log (replace missing ones)

marcpopMSFT · 2022-03-24T18:15:57Z

My main concern is that we know that this has been tried before and had problematic performance impact. There are also a few different ways of solving this. So I think it's worth having design discussion on what the goal is.

Open questions in my mind:

Should this be opt in to limit perf impact?
Can we record the initial list of env variables so we aren't calling Environment.Get... every time.
Should we include a bool on the property object on whether it was used or not and merge with the environment list at the end once?
Is there value in sharing with customers what properties aren't used that aren't environment variables (to clean up project files perhaps though there is risk that a property would get used with slightly different build params)?

KirillOsenkov · 2022-03-24T18:24:10Z

The previous approach logged a few event args, such as EnvironmentVariableReadEventArgs, etc. It was expensive because of logging all these extra args messages.

This approach is much more targeted and from what I saw the perf impact should be negligible (we should measure however).

We should be careful about adding fields on a class that is potentially instantiated millions of times.

Tracking used properties in general I think should be separate from this PR. Here we just want the minimum surgical fix to track and log only the environment variables that were actually used. Everything else I think should be out of scope.

KirillOsenkov · 2022-03-28T05:32:40Z

src/Build/Evaluation/Expander.cs

            private static object LookupProperty(IPropertyProvider<T> properties, string propertyName, int startIndex, int endIndex, IElementLocation elementLocation, UsedUninitializedProperties usedUninitializedProperties)
            {
                T property = properties.GetProperty(propertyName, startIndex, endIndex);
+                if (!string.IsNullOrEmpty(property?.EvaluatedValue))


maybe use pattern-matching here

Hmm, I am confused, this seems to just make a list of all properties ever read? We only need the properties that were resolved to an environment variable. I would imagine that we track it somewhere close to where we take a snapshot of the environment at the start of the build?

Also I think we should move this logic down to line 1519, in the else block (when we have an escaped value).

Basically here we should check whether the property we got came from the environment (see

msbuild/src/Build/Instance/ProjectInstance.cs

Line 1482 in 058a026

_properties.Set(property);

)

and yes, move this new if block into the else block, where we've verified that the property is not null. here is too early.

msbuild/src/Build/Evaluation/Evaluator.cs

Line 1200 in 058a026

_data.SetProperty(environmentProperty.Name, ((IProperty)environmentProperty).EvaluatedValueEscaped, isGlobalProperty: false, mayBeReserved: false, isEnvironmentVariable: true);

by pattern-matching I mean:

if (property?.EvaluatedValue is { Length: > 0 } evaluatedValue) { ... }

What's the added value of pattern matching here of string.IsNullOrEmpty?

worried that if the EvaluatedValue getter does anything expensive, or is not idempotent, we'll be calling it twice. It might be benign or not. I prefer to only call properties once and reuse, just to be safe in general. Pattern matching lets you capture and cache the value.

src/Build/Logging/BinaryLogger/BinaryLogger.cs

src/Build/Logging/BinaryLogger/BuildEventArgsReader.cs

src/Framework/BuildFinishedEventArgs.cs

src/Shared/EnvironmentUtilities.cs

KirillOsenkov · 2022-03-28T18:40:32Z

We should also add a unit-test that builds this project and ensures that BuildStarted has no environment variables, and BuildFinished has a single environment variable OS=Windows_NT (only assert the name, not the value as it may change on other OSs)

<Project>

  <PropertyGroup>
    <A>$(OS)</A>
  </PropertyGroup>

  <Target Name="Build">
    <Message Text="$(A)" />
  </Target>
  
</Project>

src/Build/Evaluation/Expander.cs

…-var-properties

KirillOsenkov · 2022-05-14T00:15:09Z

I know it looks awful threading logging context everywhere, but I was thinking: maybe ultimately it's the right thing to do? We want to be able to log from all evaluation locations, so in the future when we want to add more logging messages perhaps it should come in handy?

Remember that until recently we weren't able to log anything from evaluation at all, so perhaps consider this trailblazing for future generations.

KirillOsenkov · 2022-05-16T01:02:21Z

Hmm, in my testing it didn't log the OS variable here:

<Project>

  <PropertyGroup Condition="$(OS) == 'Windows_NT'">
    <A>$(UserProfile)</A>
  </PropertyGroup>

  <Target Name="Build">
    <Message Text="$(A)" />
  </Target>

</Project>

KirillOsenkov · 2022-05-16T01:05:07Z

I'm afraid we're going to need tests for all these usage scenarios (e.g. Condition on property groups).

I'm also still worried about the EvaluatedValue property - how come it's never accessed? Would be nice to place a breakpoint and see if this is ever hit during either a real-world build or a full test run. The worry is that there's a way to bypass the logging of environment variables if we access through that property - I want to know if it's at all possible.

Forgind · 2022-05-16T15:20:57Z

src/Build/Instance/ProjectPropertyInstance.cs

        [DebuggerBrowsable(DebuggerBrowsableState.Never)]
-        string IProperty.EvaluatedValueEscaped => _escapedValue;
-
+        string IProperty.EvaluatedValueEscaped


Put breakpoints on other ways to access value to see if/when they're hit.

Run all tests with a Debugger.Launch as well as try building msbuild/roslyn/C++ project.

Once we understand this better, write it down.

Augment unit test with Env Vars to be much more comprehensive

See if other PR works with unit test

Check conditions on:
[ ] - task execution
[ ] - itemgroup in target
[ ] - itemgroup not in target
[ ] - propertygroup in target
[ ] - import statement

Forgind · 2022-05-16T18:16:50Z

Interesting...I just tested it, and it logged the EnvVar variable in

Hmm, in my testing it didn't log the OS variable here:

<Project>

  <PropertyGroup Condition="$(OS) == 'Windows_NT'">
    <A>$(UserProfile)</A>
  </PropertyGroup>

  <Target Name="Build">
    <Message Text="$(A)" />
  </Target>

</Project>

Fixed this case. I'd foolishly thought a condition on a target might be the same as a condition on a propertygroup. I'll check a couple other places.

KirillOsenkov · 2022-05-20T16:46:55Z

src/Build/Evaluation/Expander.cs

                }
                else
                {
+                    if (property is ProjectPropertyInstance.EnvironmentDerivedProjectPropertyInstance environmentDerivedProperty)


do we need to also check for EnvironmentDerivedProjectProperty here? or is it OK?

Sorry I missed this. I believe it's ok. I looked through uses of EnvironmentDerivedProjectProperties, and I believe they're only relevant at evaluation time, at which point our logging context hasn't yet been invalidated, so they get logged then.

KirillOsenkov · 2022-05-20T17:14:04Z

I re-did the perf measurements and was a bit surprised (pleasantly). Could totally be a fluke, but this PR seems slightly faster than baseline and also produces smaller binlogs.

	Baseline:	This PR:
	8.828	9.079
	9.204	8.448
	9.875	9.471
	9.114	8.066
	10.119	7.928
	8.6	7.809
	8.566	8.238
	8.781	8.799
	8.615	8.14
Average:	9.078	8.442

Binlog size goes down slightly from 5.137 MB -> 5.058 MB.

I'm guessing this is because we don't log the properties that came from the environment so those huge property lists are now way smaller.

KirillOsenkov

I tested it some more and it all looks good to me.

I think I'm reasonably happy with this and as far as I'm concerned it can go in and we can get it battle tested. If we find more cases we can fix in follow-up PRs.

We could also file a bug to track adding some tests later on. I think I'd want to start getting some real world mileage on this.

src/Shared/UnitTests/EngineTestEnvironment.cs

src/Build/Logging/BinaryLogger/BinaryLogger.cs

rainersigwald · 2022-04-12T13:53:29Z

src/Framework/Traits.cs

+        /// <summary>
+        /// Log all environment variables whether or not they are used in a build in the binary log.
+        /// </summary>
+        public readonly bool LogAllEnvironmentVariables = !string.IsNullOrEmpty(Environment.GetEnvironmentVariable("MSBUILDLOGALLENVIRONMENTVARIABLES"));


Also behind a changewave?

src/Build/Logging/BinaryLogger/BuildEventArgsWriter.cs

src/MSBuildTaskHost/MSBuildTaskHost.csproj

rainersigwald · 2022-04-12T13:56:25Z

src/Build.UnitTests/Evaluation/Evaluator_Tests.cs

-                        .ShouldBeEmpty();
+                        .ShouldHaveSingleItem()
+                        .EnvironmentVariableName
+                        .ShouldBe("DEFINED_ENVIRONMENT_VARIABLE2");


Update comment on line 4521 (and similar) please.

I just saw one comment to update—were you thinking I should add more, or was that all?

If that's the only comment about empty environment, it's all we need. I think I thought there were others in other test cases?

rainersigwald · 2022-04-12T13:57:24Z

src/Build.UnitTests/Utilities_Tests.cs

-
-            string _initialLoadFilesWriteable = Environment.GetEnvironmentVariable("MSBUILDLOADALLFILESASWRITEABLE");
-            try
+            using (TestEnvironment env = TestEnvironment.Create())


nit: for new code I strongly prefer using statements rather than blocks, where possible.

Mind explaining why? I don't have a strong preference, but blocks make it clearer to me where scope starts and ends. The only advantage I know of to using statements is that they don't require extra indenting.

Reduced indentation. If the using doesn't line up nicely with the containing block, I prefer it, but if it does (as here where it's the whole method), I like the statement which has less ceremony.

src/Build/Evaluation/IProperty.cs

src/Build/Instance/ProjectPropertyInstance.cs

rainersigwald · 2022-04-12T14:10:55Z

src/Shared/LogMessagePacketBase.cs

+                LoggingEventType.BuildMessageEvent => ReadBuildMessageEventFromStream(translator, message, helpKeyword, senderName),
+                LoggingEventType.BuildWarningEvent => ReadBuildWarningEventFromStream(translator, message, helpKeyword, senderName),
+                LoggingEventType.EnvironmentVariableReadEvent => ReadEnvironmentVariableReadEventFromStream(translator, message, helpKeyword, senderName),
+                _ => null,


I think it's important to preserve the error here.

I agree. This is the only way this method can return null, and I added a VerifyThrow right after ReadEventFromStream to log the error. That said, KirillOsenkov was also surprised by this. Those are two points in favor of reverting that part, at least. My argument in favor of keeping it as-is is that return switch is beautiful and clear, and the effect is identical. Preference?

If the semantics are identical then I'm fine with leaving as is

return switch is beautiful and clear

You don't have to avoid it here; you can throw inside a switch:

Suggested change

_ => null,

_ => throw ...,

But yeah if the semantics are identical you can leave this.

throw... would be different semantically; we were using ErrorUtilities.Throw... before, which doesn't throw if (!s_throwExceptions). I don't think that's too important, but I thought it best to avoid that change anyway.

Forgind · 2022-06-01T21:42:35Z

@KirillOsenkov, what project did you do the perf testing on? I'm curious if there might be a difference between a big project and a smaller one, or perhaps between frameworks/OSs/etc.

KirillOsenkov · 2022-06-02T00:27:01Z

I only did a ~270KLOC project on Windows with ~40 projects, fully incremental build. Didn't do much more than that.

Forgind · 2022-06-02T23:35:37Z

I did some testing for a clean build of a small console app. With this change, it was 5ms slower on average (so easily within standard deviation), and the binlogs were about 2kb smaller.

KirillOsenkov · 2022-06-15T19:59:01Z

Awesome!!! Congrats everyone! Huge thanks Nathan!

Do we know which release this should ship in?

rainersigwald · 2022-06-15T20:00:07Z

Unless something goes wrong, this should be in 17.3 preview 3.

MSBuild has a command-line switch that tells it to output all environment variables. C++ used that fairly regularly. This PR automatically opts out of the environment variable logging change in #7484 for the duration of the one build. ### Changes Made Made the Trait settable via environment variable or with the file logger parameter. Since the parameter needs to be applied first, this means the Trait has to be set after the start of the build, so it is no longer read-only. In tests, Traits.Instance is reset every time a variable is read, so setting it at the start of the test does not work. For that reason, I had to make it a static variable on Traits instead of Traits.Instance. Of importance, the Trait is reset (to the environment variable's value) at logger shutdown for any BaseConsoleLogger. There is theoretically a race condition here, since the Trait is set at logger initialization (if the parameter is set) and used for the BuildStarted event. If there are two builds executing concurrently in the entrypoint node, and loggers are initialized for both, then one finishes before the other has gotten to BuildStarted, the latter will not log the environment. On the other hand, that doesn't affect whether your build succeeds, and the resolution is to just run the build again, and it will likely succeed. ### Testing It passed unit tests. I also created an experimental insertion, and it passed the C++ test.

This reverts commit c2ec5c9.

Summary Reverts environment-variable tracking from #7484. We put in a change to, instead of logging all environment variables at the start of any diagnostic build (including in binlogs), only log those environment variables we actually use as part of properties. This was on by default. It hit a few speed bumps, hence multiple commits to revert here. The latest was that if Project System Tools (used to collect binlogs) is installed, builds fail half the time for any project. Customer Impact This reverts the change. Customers go back to seeing all environment variables at the start of a build, but they do not see builds failing if they have project system tools installed. Regression? Yes Testing We verified that this resolves the reported issue manually. Risk Low

Forgind added 2 commits March 22, 2022 16:06

Log environment variables read as properties

c7e5c0a

Only log random environment variables if asked

418b210

This is both a privacy issue and a waste from a performance perspective.

Forgind added a commit to Forgind/MSBuildStructuredLog that referenced this pull request Mar 23, 2022

Use changes from dotnet/msbuild#7484

6c73bcf

Forgind mentioned this pull request Mar 23, 2022

Log environment-derived properties per dotnet/msbuild#7484 KirillOsenkov/MSBuildStructuredLog#571

Merged

KirillOsenkov self-requested a review March 23, 2022 16:56

Don't write a full (empty) dictionary

fb96241

Forgind added 3 commits March 23, 2022 14:01

Only log nonempty env vars

cf9c722

Fix NRE

d7c03b2

Make tests pass

5b58243

KirillOsenkov reviewed Mar 28, 2022

View reviewed changes

src/Build/Logging/BinaryLogger/BinaryLogger.cs Outdated Show resolved Hide resolved

KirillOsenkov reviewed Mar 28, 2022

View reviewed changes

src/Build/Logging/BinaryLogger/BuildEventArgsReader.cs Outdated Show resolved Hide resolved

KirillOsenkov reviewed Mar 28, 2022

View reviewed changes

src/Framework/BuildFinishedEventArgs.cs Outdated Show resolved Hide resolved

KirillOsenkov reviewed Mar 28, 2022

View reviewed changes

src/Shared/EnvironmentUtilities.cs Outdated Show resolved Hide resolved

Forgind added 2 commits March 28, 2022 15:47

Only find environment properties

fdee113

PR comments

be584ed

KirillOsenkov reviewed Mar 29, 2022

View reviewed changes

src/Build/Evaluation/Expander.cs Outdated Show resolved Hide resolved

Forgind added 6 commits March 30, 2022 16:36

Log on first property read

16332a7

Account for prior logging and EvaluationFinished

2a749b8

Tweak tests

150a442

Random cleanup

dfe55d7

Make it work across nodes

bb51b1f

Merge branch 'main' of https://github.com/dotnet/msbuild into log-env…

133f760

…-var-properties

One more case

01ee110

Forgind commented May 16, 2022

View reviewed changes

Fix another case

ad92323

Forgind added 2 commits May 17, 2022 13:48

Fix condition on task execution

0316ba7

Fixed other cases mentioned in PR

44aa9b6

KirillOsenkov reviewed May 20, 2022

View reviewed changes

KirillOsenkov approved these changes May 20, 2022

View reviewed changes

rainersigwald reviewed May 23, 2022

View reviewed changes

Forgind added 2 commits May 27, 2022 10:54

PR comments

f7ccf3d

Using statement

22525b6

Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Jun 13, 2022

rainersigwald approved these changes Jun 13, 2022

View reviewed changes

rainersigwald merged commit c2ec5c9 into dotnet:main Jun 15, 2022

Forgind mentioned this pull request Jun 22, 2022

Maybe fix cpp #7726

Merged

Forgind mentioned this pull request Jun 27, 2022

Reduce allocations in routing build events #7751

Merged

rainersigwald mentioned this pull request Jun 30, 2022

Revert environment variable tracking #7766

Merged

rainersigwald added a commit that referenced this pull request Jun 30, 2022

Revert "Log environment-derived properties (#7484)"

d93a24d

This reverts commit c2ec5c9.

filipnavara mentioned this pull request Jul 23, 2023

[.net8] Investigate some binlogs that break MSBuild.StructuredLogger and our tests dotnet/macios#18568

Closed

Log environment-derived properties #7484

Log environment-derived properties #7484

Uh oh!

Conversation

Forgind commented Mar 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes Made

Testing

Notes

Uh oh!

marcpopMSFT commented Mar 23, 2022

Uh oh!

rainersigwald commented Mar 23, 2022

Uh oh!

Forgind commented Mar 23, 2022

Uh oh!

Forgind commented Mar 23, 2022

Uh oh!

marcpopMSFT commented Mar 24, 2022

Uh oh!

KirillOsenkov commented Mar 24, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KirillOsenkov Mar 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KirillOsenkov commented Mar 28, 2022

Uh oh!

Uh oh!

KirillOsenkov commented May 14, 2022

Uh oh!

KirillOsenkov commented May 16, 2022

Uh oh!

KirillOsenkov commented May 16, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Forgind commented May 16, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Forgind Jun 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KirillOsenkov commented May 20, 2022

Uh oh!

KirillOsenkov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Forgind commented Mar 23, 2022 •

edited

Loading

KirillOsenkov Mar 29, 2022 •

edited

Loading

Forgind Jun 1, 2022 •

edited

Loading