-
Couldn't load subscription status.
- Fork 840
add FlakyAttribute to mark flaky tests #1222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| public const string macOS1012 = "OSX.1012." + HelixSuffix + ";"; | ||
| public const string Windows10 = "Windows.10.Amd64.ClientRS4.VS2017.Open;"; // Doesn't have the default suffix! | ||
|
|
||
| private const string HelixSuffix = "Amd64.Open"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we did/will have Arm64 queues too, they just were exploding so they aren't ready for frontline action yet. Maybe we need a Flaky Queue too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a flaky queue, with this logic we can have our test scripts run both flaky and non-flaky tests, but only fail the build on non-flaky failures.
We can definitely update this as we have new queue explosions :). I can put Amd64 in the field names for now to ensure they don't break when we add Arm64 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah what I mean is its possible we will want to run on new queues (i.e. arm testing) where the queues themselves have a high failure rate, today we just removed the queue entirely, but it would be nice to be able to run all tests on a flaky queue (just in the don't fail the build bucket). We would have to mark every test with [Flaky(OnHelixQueues="imsoflaky")] I think without the concept. Not something we need now, but something to keep in mind for the future
| [Flaky("http://example.com", OnAzDO = false, OnHelixQueues = HelixQueues.macOS1012Amd64 + HelixQueues.Fedora28Amd64)] | ||
| public void FlakyInSpecificHelixQueue() | ||
| { | ||
| var queueName = Environment.GetEnvironmentVariable("HELIX"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HaoK I noticed dotnet/aspnetcore#8231 is adding this env var, can you double-check that my logic here seems sound? The idea is to actually make this test properly fail on the specified Helix queues so we can leave it in as validation that flaky tests are indeed treated properly, so I want this test to actually be correctly detecting the queue it's running in :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be time to have a HelixUtils to consolidate all the helix stuff in one central file, or you could use SkipOnHelixAttribute.GetTargetHelixQueue() when that PR is merged.
Logic looks fine. One other pivot we should probably consider, is adding the concept of Platform as well, so there's an easy way to mark Flaky(OnlyPlatforms = "OSX;Linux") so we don't have to enumerate every queue
So maybe we need to :
public class HelixQueueDefinition {
public string Id;
public string Platform
}
public static class HelixPlatforms {
public string Windows = "Windows";
public string OSX = "OSX";
public string Linux = "Linux";
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just do that by building combo-consts like so:
// Common Groupings
public const string Ubuntu = Ubuntu1604Amd64 + Ubuntu1810Amd64;
public const string Centos = Centos7Amd64;
public const string Debian = Debian8Amd64 + Debian9Amd64;
public const string Redhat = Redhat7Amd64;
public const string Fedora = Fedora27Amd64 + Fedora28Amd64;
public const string Windows = Windows10Amd64;
public const string macOS = macOS1012Amd64;
public const string Linux = Ubuntu + Centos + Debian + Redhat + Fedora;We could also add other pivots to the flaky traits. I already changed it a bit so that the trait we insert is Flaky:Helix:Queue:[queuename], we could have other facets in that sequence and as long as the script running the tests injects all the necessary filters, it should work.
| public string OnHelixQueues | ||
| { | ||
| get => string.Join(";", _queues); | ||
| set => _queues = new List<string>(value.Split(';')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to pass in StringSplitOptions.RemoveEmptyEntries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, though it's not broken without it I believe. It would just generate the trait Flaky:Helix:=true which we never care about :).
| [Flaky("http://example.com")] | ||
| public void AlwaysFlaky() | ||
| { | ||
| throw new Exception("Flakey!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flaky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, also I don't want this test to fail for dev builds so I'll have it detect either AzDO or Helix and only fail there.
| /// <summary> | ||
| /// Gets or sets a boolean indicating if this test is flaky on Azure DevOps Pipelines. Defaults to <see langword="true" />. | ||
| /// </summary> | ||
| public bool OnAzDO { get; set; } = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need AzDO build definition name (OS)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps
|
This needs to be merged with dotnet/aspnetcore#8231 |
Yes, but it's slightly tricky since |
I wonder if SkipOnHelixAttribute even needs an ability to specify queues. Is there a case where we want to skip non-flaky tests on a single helix queue? |
|
Yeah skipping tests on specific platform/queues seems like it would always be useful, but maybe we already have that with OSSkip. It might just be a short/medium term thing where some tests don't work on some helix queues that aren't the same set as OSSkip |
| } | ||
|
|
||
| [Fact] | ||
| [Flaky("http://example.com", OnHelixQueues = HelixQueues.None)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use these on a few existing tests we know are flaky? Start putting the badge of shame up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. We'll need build goop to actually exclude them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you could use it when this is in, but it won't actually do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it more fun, it looks like Extensions and AspNetCore will need slightly different build goop, but it looks doable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The good news is that helix isn't used on extensions... so you won't be getting many flaky helix tests on there hopefully :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code seems fine, though I don't know much about xunit traits and discoverers.
|
|
||
| <!-- TODO: Replace with below when https://github.com/dotnet/arcade/issues/2182 is in --> | ||
| <!-- <TestToRun Include="@(NonFlakyRuns);@(FlakyRuns)" /> --> | ||
| <TestToRun Include="@(NonFlakyRuns)" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we do not run flaky tests at all right now because of dotnet/arcade#2182
|
Our AzDO jobs are not named as consistently as our Helix queues. Perhaps we should do that before going too far down the path of selective AzDO flakiness? For example, this is what the helper static class looks like for AzDO (equiv of public static class AzP
{
// Pipeline names end in ';' because it makes it easier to concat these into a list using a constant expression:
// AzurePipelines.macOS + AzurePipelines.Windows
public const string All = "all;";
public const string None = "none;";
public static class Extensions
{
public const string CodeCheck = "Code check Job;";
public const string WindowsDebug = "Windows debug;";
public const string WindowsRelease = "Windows release;";
public const string Windows = WindowsDebug + WindowsRelease;
public const string UbuntuDebug = "Ubuntu 16.04 debug;";
public const string UbuntuRelease = "Ubuntu 16.04 release;";
public const string Ubuntu = UbuntuDebug + UbuntuRelease;
public const string macOSDebug = "OSX debug";
public const string macOSRelease = "OSX release";
public const string macOS = macOSDebug + macOSRelease;
}
public static class AspNetCore
{
public const string CodeCheck = "Code check Job;";
public const string Windows = "Test: Windows Server 2016 x64 Job;";
public const string macOS = "Test: macOS 10.13 Job;";
public const string Ubuntu = "Test: Ubuntu 16.04 x64 Job;";
}
}The name we get from AzDO appears to be |
Azure Pipelines instead of Azure DevOps
| // It seems like sometimes the MSBuild goop that adds the test framework can end up in a bad state and not actually add it | ||
| // Not sure yet why that happens but the exception isn't clear so I'm adding this error so we can detect it better. | ||
| // -anurse | ||
| throw new InvalidOperationException("LoggedTest base class was used but nothing initialized it! The test framework may not be enabled. Try cleaning your 'obj' directory."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hit this while testing... not sure why, but want to make it a little easier to catch and possibly diagnose if it happens again. It doesn't seem to happen at all in AzP because we use a clean environment. The ".cache" file for my assembly attributes was broken somehow which meant the MSBuild goop never actually generated new Assembly Attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I think we are missing an issue tracking reenabling this on helix, If you can isolate/fix the root cause, we could take this out https://github.com/aspnet/AspNetCore/blob/master/eng/targets/Helix.props#L18
How is Helix more important then AzDO here? Just because we don't have that many queues on AzDO?
You can base AzDo skip on the fact that we are running there + |
|
At least for AspNetCore the current plan is to try and move all testing to helix, so eventually we won't have any tests running on azdo |
Let's do that. I'll update. |
|
Ok, I think this is ready to go. I'd like another review sign-off (since all my changes) from someone before merging. The attribute works a little differently now, instead of separate properties, there's just one The build script specifies all the traits it wants to select/exclude, for example: <_FlakyRunAdditionalArgs>-trait "Flaky:All=true" -trait "Flaky:AzP:All=true" -trait "Flaky:AzP:OS:$(AGENT_OS)=true"</_FlakyRunAdditionalArgs>
<_NonFlakyRunAdditionalArgs>-notrait "Flaky:All=true" -notrait "Flaky:AzP:All=true" -notrait "Flaky:AzP:OS:$(AGENT_OS)=true"</_NonFlakyRunAdditionalArgs>If no "filters" are provided, the test is considered flaky everywhere: [Flaky("http://example.com")]
public void AlwaysFlaky() {...}If any "filters" are provided, all relevant filters must be provided: [Flaky("http://example.com", HelixQueues.All)]
public void FlakyInHelixOnly() {...}
[Flaky("http://example.com", HelixQueues.All, AzurePipelines.Windows)]
public void FlakyInHelixAndAzPWindowsOnly() {...} |
|
Whoops, didn't finish updating the test 😺 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
part of aspnet/AspNetCoredotnet/extensions#8237\n\nCommit migrated from dotnet/extensions@42e9a7d
part of aspnet/AspNetCoredotnet/extensions#8237\n\nCommit migrated from dotnet/extensions@42e9a7d
part of aspnet/AspNetCoredotnet/Extensions#8237 Commit migrated from dotnet/extensions@42e9a7d
part of aspnet/AspNetCoredotnet/Extensions#8237 Commit migrated from dotnet/extensions@42e9a7d
part of aspnet/AspNetCoredotnet/Extensions#8237 Commit migrated from dotnet/extensions@42e9a7d
part of aspnet/AspNetCoredotnet/Extensions#8237 Commit migrated from dotnet/extensions@42e9a7d
part of aspnet/AspNetCoredotnet/Extensions#8237 Commit migrated from dotnet/extensions@42e9a7d
part of dotnet/aspnetcore#8237
The attribute does nothing but apply traits right now, we still need to change our build to respect those traits. As a result, this PR will fail right now since it will run the "Flaky on AzDO" test :).
Despite being a draft, the
FlakyAttributeand friends code is ready for review.Examples on how to run/skip
dotnet test --filter "Flaky:AzDO!=true"dotnet test --filter "Flaky:Helix:all!=true&Flaky:Helix:X!=truedotnet test --filter "Flaky:AzDO=true"TODO:
After this PR: