Skip to content

Conversation

@analogrelay
Copy link

@analogrelay analogrelay commented Mar 6, 2019

part of dotnet/aspnetcore#8237

The attribute does nothing but apply traits right now, we still need to change our build to respect those traits. As a result, this PR will fail right now since it will run the "Flaky on AzDO" test :).

Despite being a draft, the FlakyAttribute and friends code is ready for review.

Examples on how to run/skip

  • Run all non-flaky tests on AzDO: dotnet test --filter "Flaky:AzDO!=true"
  • Run all non-flaky tests on Helix Queue "X": dotnet test --filter "Flaky:Helix:all!=true&Flaky:Helix:X!=true
  • Run all flaky tests on AzDO: dotnet test --filter "Flaky:AzDO=true"

TODO:

  • Add build goop to make this all work in Extensions (which uses Arcade)
  • Add support for selecting AzDO jobs to skip on

After this PR:

  • Add build goop for aspnet/AspNetCore

public const string macOS1012 = "OSX.1012." + HelixSuffix + ";";
public const string Windows10 = "Windows.10.Amd64.ClientRS4.VS2017.Open;"; // Doesn't have the default suffix!

private const string HelixSuffix = "Amd64.Open";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we did/will have Arm64 queues too, they just were exploding so they aren't ready for frontline action yet. Maybe we need a Flaky Queue too :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a flaky queue, with this logic we can have our test scripts run both flaky and non-flaky tests, but only fail the build on non-flaky failures.

We can definitely update this as we have new queue explosions :). I can put Amd64 in the field names for now to ensure they don't break when we add Arm64 though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah what I mean is its possible we will want to run on new queues (i.e. arm testing) where the queues themselves have a high failure rate, today we just removed the queue entirely, but it would be nice to be able to run all tests on a flaky queue (just in the don't fail the build bucket). We would have to mark every test with [Flaky(OnHelixQueues="imsoflaky")] I think without the concept. Not something we need now, but something to keep in mind for the future

[Flaky("http://example.com", OnAzDO = false, OnHelixQueues = HelixQueues.macOS1012Amd64 + HelixQueues.Fedora28Amd64)]
public void FlakyInSpecificHelixQueue()
{
var queueName = Environment.GetEnvironmentVariable("HELIX");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HaoK I noticed dotnet/aspnetcore#8231 is adding this env var, can you double-check that my logic here seems sound? The idea is to actually make this test properly fail on the specified Helix queues so we can leave it in as validation that flaky tests are indeed treated properly, so I want this test to actually be correctly detecting the queue it's running in :)

Copy link
Member

@HaoK HaoK Mar 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be time to have a HelixUtils to consolidate all the helix stuff in one central file, or you could use SkipOnHelixAttribute.GetTargetHelixQueue() when that PR is merged.

Logic looks fine. One other pivot we should probably consider, is adding the concept of Platform as well, so there's an easy way to mark Flaky(OnlyPlatforms = "OSX;Linux") so we don't have to enumerate every queue

So maybe we need to :

   public class HelixQueueDefinition {
       public string Id;
       public string Platform
   }
   public static class HelixPlatforms {
       public string Windows = "Windows";
       public string OSX = "OSX";
       public string Linux = "Linux";
   }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just do that by building combo-consts like so:

        // Common Groupings
        public const string Ubuntu = Ubuntu1604Amd64 + Ubuntu1810Amd64;
        public const string Centos = Centos7Amd64;
        public const string Debian = Debian8Amd64 + Debian9Amd64;
        public const string Redhat = Redhat7Amd64;
        public const string Fedora = Fedora27Amd64 + Fedora28Amd64;

        public const string Windows = Windows10Amd64;
        public const string macOS = macOS1012Amd64;
        public const string Linux = Ubuntu + Centos + Debian + Redhat + Fedora;

We could also add other pivots to the flaky traits. I already changed it a bit so that the trait we insert is Flaky:Helix:Queue:[queuename], we could have other facets in that sequence and as long as the script running the tests injects all the necessary filters, it should work.

public string OnHelixQueues
{
get => string.Join(";", _queues);
set => _queues = new List<string>(value.Split(';'));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to pass in StringSplitOptions.RemoveEmptyEntries?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, though it's not broken without it I believe. It would just generate the trait Flaky:Helix:=true which we never care about :).

[Flaky("http://example.com")]
public void AlwaysFlaky()
{
throw new Exception("Flakey!");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flaky

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, also I don't want this test to fail for dev builds so I'll have it detect either AzDO or Helix and only fail there.

/// <summary>
/// Gets or sets a boolean indicating if this test is flaky on Azure DevOps Pipelines. Defaults to <see langword="true" />.
/// </summary>
public bool OnAzDO { get; set; } = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need AzDO build definition name (OS)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps

@pakrym
Copy link

pakrym commented Mar 6, 2019

This needs to be merged with dotnet/aspnetcore#8231

@analogrelay
Copy link
Author

This needs to be merged with aspnet/AspNetCore#8231

Yes, but it's slightly tricky since SkipOnHelixAttribute is a shared-source file in aspnet/AspNetCore (not sure why it wasn't just added to this project since it depends on ITestCondition from it)

@pakrym
Copy link

pakrym commented Mar 6, 2019

Yes, but it's slightly tricky since SkipOnHelixAttribute is a shared-source file in aspnet/AspNetCore (not sure why it wasn't just added to this project since it depends on ITestCondition from it)

I wonder if SkipOnHelixAttribute even needs an ability to specify queues. Is there a case where we want to skip non-flaky tests on a single helix queue?

@HaoK
Copy link
Member

HaoK commented Mar 6, 2019

Yeah skipping tests on specific platform/queues seems like it would always be useful, but maybe we already have that with OSSkip. It might just be a short/medium term thing where some tests don't work on some helix queues that aren't the same set as OSSkip

}

[Fact]
[Flaky("http://example.com", OnHelixQueues = HelixQueues.None)]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use these on a few existing tests we know are flaky? Start putting the badge of shame up.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. We'll need build goop to actually exclude them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you could use it when this is in, but it won't actually do anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it more fun, it looks like Extensions and AspNetCore will need slightly different build goop, but it looks doable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good news is that helix isn't used on extensions... so you won't be getting many flaky helix tests on there hopefully :)

Copy link

@ryanbrandenburg ryanbrandenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code seems fine, though I don't know much about xunit traits and discoverers.


<!-- TODO: Replace with below when https://github.com/dotnet/arcade/issues/2182 is in -->
<!-- <TestToRun Include="@(NonFlakyRuns);@(FlakyRuns)" /> -->
<TestToRun Include="@(NonFlakyRuns)" />
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we do not run flaky tests at all right now because of dotnet/arcade#2182

@analogrelay
Copy link
Author

Our AzDO jobs are not named as consistently as our Helix queues. Perhaps we should do that before going too far down the path of selective AzDO flakiness? For example, this is what the helper static class looks like for AzDO (equiv of HelixQueues):

    public static class AzP
    {
        // Pipeline names end in ';' because it makes it easier to concat these into a list using a constant expression:
        //  AzurePipelines.macOS + AzurePipelines.Windows

        public const string All = "all;";
        public const string None = "none;";

        public static class Extensions
        {
            public const string CodeCheck = "Code check Job;";
            public const string WindowsDebug = "Windows debug;";
            public const string WindowsRelease = "Windows release;";
            public const string Windows = WindowsDebug + WindowsRelease;
            public const string UbuntuDebug = "Ubuntu 16.04 debug;";
            public const string UbuntuRelease = "Ubuntu 16.04 release;";
            public const string Ubuntu = UbuntuDebug + UbuntuRelease;
            public const string macOSDebug = "OSX debug";
            public const string macOSRelease = "OSX release";
            public const string macOS = macOSDebug + macOSRelease;
        }

        public static class AspNetCore
        {
            public const string CodeCheck = "Code check Job;";
            public const string Windows = "Test: Windows Server 2016 x64 Job;";
            public const string macOS = "Test: macOS 10.13 Job;";
            public const string Ubuntu = "Test: Ubuntu 16.04 x64 Job;";
        }
    }

The name we get from AzDO appears to be Agent.JobName which is the display name (ugly). We could change our YAML to define a specific environment variable with a consistent set of names across both Extensions and AspNetCore (and EF?) to help out here. My thinking is that we leave AzDO as a hammer for now (on for all or off for all) and file an issue to track improvement here. Thoughts?

// It seems like sometimes the MSBuild goop that adds the test framework can end up in a bad state and not actually add it
// Not sure yet why that happens but the exception isn't clear so I'm adding this error so we can detect it better.
// -anurse
throw new InvalidOperationException("LoggedTest base class was used but nothing initialized it! The test framework may not be enabled. Try cleaning your 'obj' directory.");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hit this while testing... not sure why, but want to make it a little easier to catch and possibly diagnose if it happens again. It doesn't seem to happen at all in AzP because we use a clean environment. The ".cache" file for my assembly attributes was broken somehow which meant the MSBuild goop never actually generated new Assembly Attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I think we are missing an issue tracking reenabling this on helix, If you can isolate/fix the root cause, we could take this out https://github.com/aspnet/AspNetCore/blob/master/eng/targets/Helix.props#L18

@pakrym
Copy link

pakrym commented Mar 6, 2019

My thinking is that we leave AzDO as a hammer for now (on for all or off for all) and file an issue to track improvement here. Thoughts?

How is Helix more important then AzDO here? Just because we don't have that many queues on AzDO?

Our AzDO jobs are not named as consistently as our Helix queues

You can base AzDo skip on the fact that we are running there + Agent.OS for OS.

@HaoK
Copy link
Member

HaoK commented Mar 6, 2019

At least for AspNetCore the current plan is to try and move all testing to helix, so eventually we won't have any tests running on azdo

@analogrelay
Copy link
Author

You can base AzDo skip on the fact that we are running there + Agent.OS for OS.

Let's do that. I'll update.

@analogrelay analogrelay marked this pull request as ready for review March 6, 2019 22:02
@analogrelay
Copy link
Author

analogrelay commented Mar 6, 2019

Ok, I think this is ready to go. I'd like another review sign-off (since all my changes) from someone before merging. The attribute works a little differently now, instead of separate properties, there's just one params constructor argument that takes a list of "filters". Filters are strings like AzP:All, AzP:OS:Darwin or Helix:Queue:Abc which come from two helper static classes. These strings are simply prepended with Flaky: and added as xUnit traits with a value of true.

The build script specifies all the traits it wants to select/exclude, for example:

      <_FlakyRunAdditionalArgs>-trait "Flaky:All=true" -trait "Flaky:AzP:All=true" -trait "Flaky:AzP:OS:$(AGENT_OS)=true"</_FlakyRunAdditionalArgs>
      <_NonFlakyRunAdditionalArgs>-notrait "Flaky:All=true" -notrait "Flaky:AzP:All=true" -notrait "Flaky:AzP:OS:$(AGENT_OS)=true"</_NonFlakyRunAdditionalArgs>

If no "filters" are provided, the test is considered flaky everywhere:

        [Flaky("http://example.com")]
        public void AlwaysFlaky() {...}

If any "filters" are provided, all relevant filters must be provided:

        [Flaky("http://example.com", HelixQueues.All)]
        public void FlakyInHelixOnly() {...}

        [Flaky("http://example.com", HelixQueues.All, AzurePipelines.Windows)]
        public void FlakyInHelixAndAzPWindowsOnly() {...}

@analogrelay
Copy link
Author

Whoops, didn't finish updating the test 😺

Copy link

@pakrym pakrym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@HaoK HaoK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@analogrelay analogrelay merged commit 42e9a7d into master Mar 6, 2019
@natemcmaster natemcmaster deleted the anurse/aspnetcore-8237-flaky-attribute branch May 3, 2019 06:34
JunTaoLuo pushed a commit to dotnet/aspnetcore that referenced this pull request Feb 12, 2020
part of aspnet/AspNetCoredotnet/extensions#8237\n\nCommit migrated from dotnet/extensions@42e9a7d
JunTaoLuo pushed a commit to dotnet/aspnetcore that referenced this pull request Feb 15, 2020
part of aspnet/AspNetCoredotnet/extensions#8237\n\nCommit migrated from dotnet/extensions@42e9a7d
maryamariyan pushed a commit to maryamariyan/runtime that referenced this pull request Feb 28, 2020
part of aspnet/AspNetCoredotnet/Extensions#8237


Commit migrated from dotnet/extensions@42e9a7d
maryamariyan pushed a commit to maryamariyan/runtime that referenced this pull request Mar 2, 2020
part of aspnet/AspNetCoredotnet/Extensions#8237


Commit migrated from dotnet/extensions@42e9a7d
maryamariyan pushed a commit to maryamariyan/runtime that referenced this pull request Mar 11, 2020
part of aspnet/AspNetCoredotnet/Extensions#8237


Commit migrated from dotnet/extensions@42e9a7d
maryamariyan pushed a commit to maryamariyan/runtime that referenced this pull request Mar 11, 2020
part of aspnet/AspNetCoredotnet/Extensions#8237


Commit migrated from dotnet/extensions@42e9a7d
maryamariyan pushed a commit to maryamariyan/runtime that referenced this pull request Mar 27, 2020
part of aspnet/AspNetCoredotnet/Extensions#8237


Commit migrated from dotnet/extensions@42e9a7d
@ghost ghost locked as resolved and limited conversation to collaborators May 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants