-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Stagger ppc64le jobs using cron syntax #35692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sats-23 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @sats-23. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
One potential downside to this approach is that you'll end up maintaining these entries statically, and I'm not aware of a better automated solution for handling this?! |
True — while we’ll need to account for the syntax of each newly introduced job, this approach gives us the best chance to keep our cluster resources fully optimized and significantly reduce the risk of scheduling timeouts. |
A better approach would be to force kubernetes to allocate a specific node for every prow pod. Should be very easy to do with antiaffinity rules or allocating all the resources on the node to the job as we do with the gke build cluster |
Thanks for the suggestion @upodroid — that’s a valid point. In this case, all worker nodes have the same configuration and resource capacity, so using node-level antiaffinity or dedicating entire nodes to individual jobs wouldn’t give us much additional isolation. |
Problem:
Currently, except unit tests and master integration tests which are run at 1h frequency, all other jobs seem to run at 3h, 6h or 8h interval. These jobs are often triggered at the same time.
These periodics combined with dev triggered pre-submits can often lead to a resource crunch on cluster.
Solution:
-Stagger the job timings using cron syntax.
-The interval of each job has been maintained the same.
-The approx finish times of the jobs are considered to help club lighter workloads in closer intervals.