-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Log environment-derived properties #7484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is both a privacy issue and a waste from a performance perspective.
|
@Forgind @KirillOsenkov We should chat offline about this change. I had thought that Kirill wanted to know which environment variables were used in the build so we could hide the rest from the binlogs. This PR appears to log whenever an environment property is used which is somewhat different. It also does it by checking for an existing environment variable every time a property is accessed so I'd expect there to be perf implications. Since every environment variable is converted to a property, shouldn't we just record the initial list at that time and so we won't have to call GetEnvironmentVariable later? Rainer also mentioned that there was code that sort of did this already but had perf issues so it might be better to track down that code and see if the perf issues can be resolved. |
|
This keeps track of which environment variables were used as properties in the build and (by default) hides the rest from binlogs. It isn't actually logging anything when an environment property is used, just putting that in a dictionary that we log once with BuildFinishedEventArgs(2). I do access and environment variable every time I add it to the dictionary, but that part is just an implementation detail; I can get the (possibly but probably not stale) value from a dictionary full of environment variables instead. We don't have an initial list of properties that will be read as environment variables. All we have is which end up being read as environment variables, as you see here. The previous environment variable logging didn't care about which were actually used and just logged everything. I am aware of that issue and PR. The PR was ultimately reverted because of performance issues. It's a big change that does more than this PR attempts. I'd also have to look into which RPS tests it failed, try to recreate them, analyze what went wrong from a performance perspective, and try to fix it. It's fully possible in my mind that at the end of that, it may be impossible to truly fix that PR without losing functionality. This is a much more scoped PR that accomplishes the most important part of maneely's PR without taking weeks. So I'd really rather not pick up #5038 and just try to optimize it. |
|
Add environment variables to text log (replace missing ones) |
|
My main concern is that we know that this has been tried before and had problematic performance impact. There are also a few different ways of solving this. So I think it's worth having design discussion on what the goal is. Open questions in my mind:
|
|
The previous approach logged a few event args, such as EnvironmentVariableReadEventArgs, etc. It was expensive because of logging all these extra args messages. This approach is much more targeted and from what I saw the perf impact should be negligible (we should measure however). We should be careful about adding fields on a class that is potentially instantiated millions of times. Tracking used properties in general I think should be separate from this PR. Here we just want the minimum surgical fix to track and log only the environment variables that were actually used. Everything else I think should be out of scope. |
src/Build/Evaluation/Expander.cs
Outdated
| private static object LookupProperty(IPropertyProvider<T> properties, string propertyName, int startIndex, int endIndex, IElementLocation elementLocation, UsedUninitializedProperties usedUninitializedProperties) | ||
| { | ||
| T property = properties.GetProperty(propertyName, startIndex, endIndex); | ||
| if (!string.IsNullOrEmpty(property?.EvaluatedValue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use pattern-matching here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I am confused, this seems to just make a list of all properties ever read? We only need the properties that were resolved to an environment variable. I would imagine that we track it somewhere close to where we take a snapshot of the environment at the start of the build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think we should move this logic down to line 1519, in the else block (when we have an escaped value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically here we should check whether the property we got came from the environment (see
msbuild/src/Build/Instance/ProjectInstance.cs
Line 1482 in 058a026
| _properties.Set(property); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yes, move this new if block into the else block, where we've verified that the property is not null. here is too early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msbuild/src/Build/Evaluation/Evaluator.cs
Line 1200 in 058a026
| _data.SetProperty(environmentProperty.Name, ((IProperty)environmentProperty).EvaluatedValueEscaped, isGlobalProperty: false, mayBeReserved: false, isEnvironmentVariable: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by pattern-matching I mean:
if (property?.EvaluatedValue is { Length: > 0 } evaluatedValue) { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the added value of pattern matching here of string.IsNullOrEmpty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worried that if the EvaluatedValue getter does anything expensive, or is not idempotent, we'll be calling it twice. It might be benign or not. I prefer to only call properties once and reuse, just to be safe in general. Pattern matching lets you capture and cache the value.
|
We should also add a unit-test that builds this project and ensures that BuildStarted has no environment variables, and BuildFinished has a single environment variable OS=Windows_NT (only assert the name, not the value as it may change on other OSs) |
|
I know it looks awful threading logging context everywhere, but I was thinking: maybe ultimately it's the right thing to do? We want to be able to log from all evaluation locations, so in the future when we want to add more logging messages perhaps it should come in handy? Remember that until recently we weren't able to log anything from evaluation at all, so perhaps consider this trailblazing for future generations. |
|
Hmm, in my testing it didn't log the |
|
I'm afraid we're going to need tests for all these usage scenarios (e.g. Condition on property groups). I'm also still worried about the |
| [DebuggerBrowsable(DebuggerBrowsableState.Never)] | ||
| string IProperty.EvaluatedValueEscaped => _escapedValue; | ||
|
|
||
| string IProperty.EvaluatedValueEscaped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put breakpoints on other ways to access value to see if/when they're hit.
Run all tests with a Debugger.Launch as well as try building msbuild/roslyn/C++ project.
Once we understand this better, write it down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Augment unit test with Env Vars to be much more comprehensive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if other PR works with unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check conditions on:
[ ] - task execution
[ ] - itemgroup in target
[ ] - itemgroup not in target
[ ] - propertygroup in target
[ ] - import statement
|
Interesting...I just tested it, and it logged the EnvVar variable in
Fixed this case. I'd foolishly thought a condition on a target might be the same as a condition on a propertygroup. I'll check a couple other places. |
| } | ||
| else | ||
| { | ||
| if (property is ProjectPropertyInstance.EnvironmentDerivedProjectPropertyInstance environmentDerivedProperty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to also check for EnvironmentDerivedProjectProperty here? or is it OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this. I believe it's ok. I looked through uses of EnvironmentDerivedProjectProperties, and I believe they're only relevant at evaluation time, at which point our logging context hasn't yet been invalidated, so they get logged then.
|
I re-did the perf measurements and was a bit surprised (pleasantly). Could totally be a fluke, but this PR seems slightly faster than baseline and also produces smaller binlogs.
Binlog size goes down slightly from 5.137 MB -> 5.058 MB. I'm guessing this is because we don't log the properties that came from the environment so those huge property lists are now way smaller. |
KirillOsenkov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it some more and it all looks good to me.
I think I'm reasonably happy with this and as far as I'm concerned it can go in and we can get it battle tested. If we find more cases we can fix in follow-up PRs.
We could also file a bug to track adding some tests later on. I think I'd want to start getting some real world mileage on this.
src/Framework/Traits.cs
Outdated
| /// <summary> | ||
| /// Log all environment variables whether or not they are used in a build in the binary log. | ||
| /// </summary> | ||
| public readonly bool LogAllEnvironmentVariables = !string.IsNullOrEmpty(Environment.GetEnvironmentVariable("MSBUILDLOGALLENVIRONMENTVARIABLES")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also behind a changewave?
| .ShouldBeEmpty(); | ||
| .ShouldHaveSingleItem() | ||
| .EnvironmentVariableName | ||
| .ShouldBe("DEFINED_ENVIRONMENT_VARIABLE2"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment on line 4521 (and similar) please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw one comment to update—were you thinking I should add more, or was that all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the only comment about empty environment, it's all we need. I think I thought there were others in other test cases?
|
|
||
| string _initialLoadFilesWriteable = Environment.GetEnvironmentVariable("MSBUILDLOADALLFILESASWRITEABLE"); | ||
| try | ||
| using (TestEnvironment env = TestEnvironment.Create()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for new code I strongly prefer using statements rather than blocks, where possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind explaining why? I don't have a strong preference, but blocks make it clearer to me where scope starts and ends. The only advantage I know of to using statements is that they don't require extra indenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced indentation. If the using doesn't line up nicely with the containing block, I prefer it, but if it does (as here where it's the whole method), I like the statement which has less ceremony.
| LoggingEventType.BuildMessageEvent => ReadBuildMessageEventFromStream(translator, message, helpKeyword, senderName), | ||
| LoggingEventType.BuildWarningEvent => ReadBuildWarningEventFromStream(translator, message, helpKeyword, senderName), | ||
| LoggingEventType.EnvironmentVariableReadEvent => ReadEnvironmentVariableReadEventFromStream(translator, message, helpKeyword, senderName), | ||
| _ => null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to preserve the error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This is the only way this method can return null, and I added a VerifyThrow right after ReadEventFromStream to log the error. That said, KirillOsenkov was also surprised by this. Those are two points in favor of reverting that part, at least. My argument in favor of keeping it as-is is that return switch is beautiful and clear, and the effect is identical. Preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the semantics are identical then I'm fine with leaving as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return switch is beautiful and clear
You don't have to avoid it here; you can throw inside a switch:
| _ => null, | |
| _ => throw ..., |
But yeah if the semantics are identical you can leave this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw... would be different semantically; we were using ErrorUtilities.Throw... before, which doesn't throw if (!s_throwExceptions). I don't think that's too important, but I thought it best to avoid that change anyway.
|
@KirillOsenkov, what project did you do the perf testing on? I'm curious if there might be a difference between a big project and a smaller one, or perhaps between frameworks/OSs/etc. |
|
I only did a ~270KLOC project on Windows with ~40 projects, fully incremental build. Didn't do much more than that. |
|
I did some testing for a clean build of a small console app. With this change, it was 5ms slower on average (so easily within standard deviation), and the binlogs were about 2kb smaller. |
|
Awesome!!! Congrats everyone! Huge thanks Nathan! Do we know which release this should ship in? |
|
Unless something goes wrong, this should be in 17.3 preview 3. |
MSBuild has a command-line switch that tells it to output all environment variables. C++ used that fairly regularly. This PR automatically opts out of the environment variable logging change in #7484 for the duration of the one build. ### Changes Made Made the Trait settable via environment variable or with the file logger parameter. Since the parameter needs to be applied first, this means the Trait has to be set after the start of the build, so it is no longer read-only. In tests, Traits.Instance is reset every time a variable is read, so setting it at the start of the test does not work. For that reason, I had to make it a static variable on Traits instead of Traits.Instance. Of importance, the Trait is reset (to the environment variable's value) at logger shutdown for any BaseConsoleLogger. There is theoretically a race condition here, since the Trait is set at logger initialization (if the parameter is set) and used for the BuildStarted event. If there are two builds executing concurrently in the entrypoint node, and loggers are initialized for both, then one finishes before the other has gotten to BuildStarted, the latter will not log the environment. On the other hand, that doesn't affect whether your build succeeds, and the resolution is to just run the build again, and it will likely succeed. ### Testing It passed unit tests. I also created an experimental insertion, and it passed the C++ test.
This reverts commit c2ec5c9.
Summary Reverts environment-variable tracking from #7484. We put in a change to, instead of logging all environment variables at the start of any diagnostic build (including in binlogs), only log those environment variables we actually use as part of properties. This was on by default. It hit a few speed bumps, hence multiple commits to revert here. The latest was that if Project System Tools (used to collect binlogs) is installed, builds fail half the time for any project. Customer Impact This reverts the change. Customers go back to seeing all environment variables at the start of a build, but they do not see builds failing if they have project system tools installed. Regression? Yes Testing We verified that this resolves the reported issue manually. Risk Low

Context
We currently log all environment variables, which is useful in giving you a sense of where things come from sometimes but doesn't really tell you if those environment variables are used or what other environment variables we might have expected. This shifts to logging only those environment variables we actually used. This also partially mitigates potential privacy issues, as random environment variables with secrets aren't logged unless they happen to correspond to something MSBuild actually uses during the build.
Changes Made
Keep track of environment variable usages for looking at properties. Log those in the binlog instead of all environment variables.
More specifically, we'd previously kept track of environment-derived properties under more generic types such as ProjectPropertyNotXmlBacked. I added an extension specifically for environment-derived properties. That derived class keeps track of relatively few new things: logging context and whether it's been logged already. In some cases, that logging context is up-to-date; in others, it's invalidated before we could use it, in which case we replace it with the correct logging context as brought forward from wherever we actually had one.
With the logging context and information about the property, once we know we need to log it, we log an EnvironmentVariableRead build event. With this change, that event is now properly serialized to the main node, where it is outputted to the binlog and text logs.
The binlog viewer was also modified to better accommodate the EnvironmentVariableRead events.
Finally, the binlog writer was modified such that, unless you specify that you should log all environment variables, it would not output the environment for BuildStarted. I made a similar change for the text logger. I also filtered which properties made it to the event args to weed out environment-derived properties.
Testing
Verified that environment variables are not logged in the binlog by default. Verified that they are logged if they are used. (Also checked the text logs for both.) Modified some unit tests and added one.
Notes
It seems that all environment-derived properties used in execution go through this method. Ideally, that would make this PR tiny: log at that point rather that an BuildStarted. However, there is no single logging context possible at that point, which meant that was brought in from the various ways properties could be used.