Skip to content

Conversation

@tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Jun 6, 2024

  • One-line PR description: Graduate the JobSuccessPolicy to Beta.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 6, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 6, 2024
@soltysh
Copy link
Contributor

soltysh commented Jun 7, 2024

Lastly, make sure to transition this PR from WIP to a final PR 😉

This means that the terminating policies are respected rather than the successPolicies,
if the Job doesn't have the `FailureTarget` or `SuccessCriteriaMet` conditions yet.

### Future Work
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we move it out of the scope I would suggest to remove the "Optional Second Alpha" section from Graduation Criteria.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.

@tenzen-y
Copy link
Member Author

tenzen-y commented Jun 7, 2024

Lastly, make sure to transition this PR from WIP to a final PR 😉

Thank you for reviewing this PR! I definitely finalize this PR soon!

@tenzen-y tenzen-y mentioned this pull request Jun 10, 2024
29 tasks
@tenzen-y tenzen-y force-pushed the beta-graduation-3998 branch 2 times, most recently from 43687a4 to 2d92c7f Compare June 10, 2024 23:18
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 11, 2024
@tenzen-y tenzen-y changed the title WIP: KEP3998: Graduate Job SuccessPolicy to Beta KEP3998: Graduate Job SuccessPolicy to Beta Jun 11, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 11, 2024
@tenzen-y tenzen-y changed the title KEP3998: Graduate Job SuccessPolicy to Beta KEP-3998: Graduate Job SuccessPolicy to Beta Jun 11, 2024
@tenzen-y
Copy link
Member Author

@soltysh @mimowo @alculquicondor @atiratree @wojtek-t I finalized this JobSuccessPolicy graduation KEP.
PTAL. Thank you!

@wojtek-t wojtek-t self-assigned this Jun 11, 2024
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor nits

In the alpha stage, the upgrade->downgrade->upgrade testing was added in the integration tests
[here](https://github.com/kubernetes/kubernetes/blob/6346b9d1327c4b8be2398d9715bdae5475e27569/test/integration/job/job_test.go#L794).

In terms of a manual test for the upgrade and rollback, we can use th v1.30.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In terms of a manual test for the upgrade and rollback, we can use th v1.30.
In terms of a manual test for the upgrade and rollback, we can use the v1.30.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


3. Simulate upgrade by enabling the feature for api server and control-plane.

Then, very that the pod with index=2 is terminated and the Job has `SuccessCriteriaMet` and `Complete` conditions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then, very that the pod with index=2 is terminated and the Job has `SuccessCriteriaMet` and `Complete` conditions.
Then, verify that the pod with index=2 is terminated and the Job has `SuccessCriteriaMet` and `Complete` conditions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor nits from PRR perspective, other than that LGTM


- E2E tests passed as designed in [TestPlan](#test-plan).
- Added a new reasons to the existing `job_finished_total` metric in [Monitoring Requirements](#monitoring-requirements).
- Introduced a new `job_succeeded_total` metric in [Monitoring Requirements](#monitoring-requirements).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any tests for feature enablement/disablement?

nit: You'r mentioning you did an integration test - can you please link it?
[I see it linked below so it's not a blocking issue, but would help for the future]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Thank you for the suggestion!

which indicates the following situations:

- `JobSuccessPolicy` indicates a job is declared as `SuccessCriteriaMet` because the job meets `spec.succesPolicy`.
- `Completions` indicates a job is declared as `SuccessCriteriaMet` because the job meets `spec.completions`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will enabling / using this feature result in any new API calls?

Please remove the "No" from there :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I was supposed to remove the "No" in the alpha stage :)

Thank you!

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several nits, mostly from Michal, so I'll let him do the final pass.

In general this is good from sig-apps pov
/approve

- when the job meets the successPolicy and the number of failed pods per indexes exceeds `backoffLimitPerIndex` in all indexes.

To avoid the above conflicts, terminating policies are evaluated the first before successPolicies.
This means that the terminating policies are respected rather than the successPolicies,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the ## Release Signoff Checklist should be updated such that all required bits are checked as needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the section. If I am missing something there, please let me know.
Thanks!

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2024
@soltysh
Copy link
Contributor

soltysh commented Jun 11, 2024

/hold
for Wojtek's PRR approval

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2024
@tenzen-y tenzen-y force-pushed the beta-graduation-3998 branch from 7242051 to 141ea75 Compare June 11, 2024 12:24
@tenzen-y
Copy link
Member Author

@mimowo @wojtek-t Thank you for reviewing this KEP! I addressed all comments. PTAL Thanks!

@wojtek-t
Copy link
Member

/approve PRR

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: soltysh, tenzen-y, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mimowo
Copy link
Contributor

mimowo commented Jun 11, 2024

Thank you!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2024
@tenzen-y
Copy link
Member Author

Thanks all!
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2024
@k8s-ci-robot k8s-ci-robot merged commit 85fabef into kubernetes:master Jun 11, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.31 milestone Jun 11, 2024
@tenzen-y tenzen-y deleted the beta-graduation-3998 branch June 11, 2024 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants