-
Notifications
You must be signed in to change notification settings - Fork 237
TEP-0090: Matrix - Failure Strategies #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,6 +42,7 @@ see-also: | |
| - [Alternatives](#alternatives) | ||
| - [Fan Out](#fan-out) | ||
| - [Concurrency Control](#concurrency-control) | ||
| - [Failure Strategies](#failure-strategies) | ||
| - [Design](#design) | ||
| - [Parameters](#parameters) | ||
| - [Substituting String Parameters in the Tasks](#substituting-string-parameters-in-the-tasks) | ||
|
|
@@ -206,15 +207,13 @@ specified in a `matrix`. | |
|
|
||
| The following are out of scope for this TEP: | ||
|
|
||
| 1. Terminating early when one of the `TaskRuns` or `Runs` created in parallel fails. As is currently, running `TaskRuns` | ||
| and `Runs` have to complete execution before termination. | ||
| 2. Configuring the `TaskRuns` or `Runs` created in a given `matrix` to execute sequentially. This remains an option | ||
| 1. Configuring the `TaskRuns` or `Runs` created in a given `matrix` to execute sequentially. This remains an option | ||
| that we can explore later. | ||
| 3. Excluding generating a `TaskRun` or `Run` for a specific combination in the `matrix`. This remains an option we can | ||
| 2. Excluding generating a `TaskRun` or `Run` for a specific combination in the `matrix`. This remains an option we can | ||
| explore later if needed. | ||
| 4. Including generating a `TaskRun` or `Run` for a specific combination in the `matrix`. This can be handled by adding | ||
| 3. Including generating a `TaskRun` or `Run` for a specific combination in the `matrix`. This can be handled by adding | ||
| the items that produce that combination into the `matrix`. This remains an option we can explore later if needed. | ||
| 5. Supporting producing `Results` from fanned out `PipelineTasks`. We plan to address this after [TEP-0075][tep-0075] | ||
| 4. Supporting producing `Results` from fanned out `PipelineTasks`. We plan to address this after [TEP-0075][tep-0075] | ||
| and [TEP-0076][tep-0076] have landed. | ||
|
|
||
| ### Requirements | ||
|
|
@@ -718,6 +717,18 @@ If needed, we can also explore providing more granular controls for maximum numb | |
| or `Runs` from `Matrices` - either at `PipelineRun`, `Pipeline` or `PipelineTask` levels - later. | ||
| This is an option we can pursue after gathering user feedback - it's out of scope for this TEP. | ||
|
|
||
| ### Failure Strategies | ||
|
|
||
| In failure scenarios, the `TaskRuns` or `Runs` created from a `Matrix` will fail fast. That is, when | ||
| any `TaskRun` or `Run` from a given fanned-out `PipelineTask` fails or is cancelled then the other | ||
| `TaskRuns` or `Runs` from the same `PipelineTask` will be cancelled. | ||
|
|
||
| This approach makes it easier to control the execution of the many `TaskRuns` or `Runs` that can be | ||
| created from a `Matrix` (up to 256 for now). Moreover, failing fast is the default behavior of `Matrix` | ||
| in other Continuous Delivery systems - see [GitHub Actions - Handling Failures in Matrix][ghm-failfast]. | ||
|
|
||
| If needed, we can explore supporting other failure strategies later. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the right call - in Jenkins Pipeline, we inherited certain assumptions around failure behavior for parallel executions (both in a matrix and not) which resulted in us having to support making fail-fast configurable, and I don't think it was worth doing that in the end. When we're starting from scratch, like here, we have the opportunity to say "this is how things work" and adjust if sufficient demand comes up in the future, rather than prematurely optimizing. So, yeah. 👍 |
||
|
|
||
| ## Design | ||
|
|
||
| In this section, we go into the details of the `Matrix` in relation to: | ||
|
|
@@ -1647,4 +1658,5 @@ However, this approach has the following disadvantages: | |
| [git-clone]: https://github.com/tektoncd/catalog/tree/main/task/git-clone/0.5 | ||
| [when]: https://github.com/tektoncd/pipeline/blob/6cb0f4ccfce095495ca2f0aa20e5db8a791a1afe/docs/pipelines.md#guard-task-execution-using-when-expressions | ||
| [retries]: https://github.com/tektoncd/pipeline/blob/6cb0f4ccfce095495ca2f0aa20e5db8a791a1afe/docs/pipelines.md#using-the-retries-parameter | ||
| [timeouts]: https://github.com/tektoncd/pipeline/blob/6cb0f4ccfce095495ca2f0aa20e5db8a791a1afe/docs/pipelines.md#configuring-the-failure-timeout | ||
| [timeouts]: https://github.com/tektoncd/pipeline/blob/6cb0f4ccfce095495ca2f0aa20e5db8a791a1afe/docs/pipelines.md#configuring-the-failure-timeout | ||
| [ghm-failfast]: https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs#handling-failures | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jerop, looking for more clarification, how are we failing already running
taskRunsorrunsfrom the samepipelineTask? Sending termination signal?Can we implement something like
stoppingmode by default, the similar approach we take when a task in a pipeline fails?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason I am asking is I am assuming
stoppingis easier to implement 🤣 And does not result in partial execution ...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pritidesai yes, similarly to how we cancel
TaskRunsfor example - it may be easier to implementstoppingmode but wondering if that's the behavior we want forMatrixseparate from the implementation - maybe partial execution could be the reason not to do this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stoppingwill be consistent to how we addressfailureof a non-matrixpipelineTask.Just an example, if building an image of one application fails, I do not want that failure to stop building images of 20 other applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense - added this to the API WG agenda on Monday so we can decide on the best way forward