Topology aware scheduler plugin #119

AlexeyPerevalov · 2020-11-30T14:06:31Z

Initially it was proposed as built-in plugin kubernetes/enhancements#1858

Co-authored-by: Swati Sehgal [email protected]
Signed-off-by: Swati Sehgal [email protected]
Signed-off-by: Alexey Perevalov [email protected]

k8s-ci-robot · 2020-11-30T14:06:39Z

Welcome @AlexeyPerevalov!

It looks like this is your first PR to kubernetes-sigs/scheduler-plugins 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/scheduler-plugins has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2020-11-30T14:06:40Z

Hi @AlexeyPerevalov. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Co-authored-by: Swati Sehgal <[email protected]> Signed-off-by: Swati Sehgal <[email protected]> Signed-off-by: Alexey Perevalov <[email protected]>

AlexeyPerevalov · 2020-11-30T14:18:30Z

kep/119-node-resource-topology-aware-scheduling/README.md

+
+## Feature enablement and rollback
+* **How can this feature be enabled / disabled in a live cluster?**
+    - This plugin doesn't require special feature gate, but it expects: TopologyManager and CPUManager feature gate enabled on the worker node\


If I'm not wrong if special KubeSchedulerConfiguration is not defined, than plugin is not loading.

Huang-Wei · 2020-12-01T01:21:28Z

@AlexeyPerevalov We haven't come to a conclusion where to hold the API, have we?

If we get a consensus on kubernetes/kubernetes#96275, then we can implement it in in-tree scheduler.

AlexeyPerevalov · 2020-12-01T15:29:46Z

@AlexeyPerevalov We haven't come to a conclusion where to hold the API, have we?

If we get a consensus on kubernetes/kubernetes#96275, then we can implement it in in-tree scheduler.

We discussed the ability to include it as built-in API in the sig-architecture meeting, where opinions divided. The staging is also raised concerns. So we will collect usage feedbacks for NodeResourceTopology plugin and for API,
API could be placed in scheduler-plugins or in somewhere in https://github.com/k8stopologyawareschedwg,

swatisehgal · 2020-12-01T20:01:48Z

If we get a consensus on kubernetes/kubernetes#96275, then we can implement it in in-tree scheduler.

Discussion upstream is still going on and we are pursuing acceptance of in-tree scheduler plugin (and will continue to do so). While we work on showcasing the advantages of Topology aware scheduling in a more measurable way in order to make a strong case for in-tree acceptance of this feature, it was suggested that we consider an out-of-tree approach for better feature velocity. Hence the enhancement proposal here.

Huang-Wei · 2020-12-01T20:29:25Z

@AlexeyPerevalov @swatisehgal Thanks for the clarification. This proposal looks good to me (actually we have already reviewed this KEP in the upstream). It's just that I'd like to hold the merge in this repo until we got a conclusion on where to place the APIs. Let me know your thoughts.

swatisehgal · 2020-12-02T18:40:44Z

@Huang-Wei Thanks for reviewing the proposal. In this KEP, in the CRD API section here we are proposing that the API be placed in kubernetes-sigs/scheduler-plugins in pkg/apis/noderesourcetopology-api directory. What do you think about that? Getting this scheduler plugin merged as an out-of tree scheduler plugin would be a great starting point for us as it would allow relevant stakeholders to play around with it allowing us to gather feedback and help mature the plugin and the API. This would also help drive conversations upstream (with sig-node, sig-scheduling and sig-architecture) to figure out how we as a community want to shape and introduce this feature natively in kubernetes. Let us know your thoughts.

Huang-Wei · 2020-12-02T19:11:09Z

@swatisehgal does that mean the work pushing the API to staging and scheduler plugin to in-tree being postponed? Or you're still pursuing that?

The other aspect is the API review. Similar to the Topology Manager feature, sig-node should be responsible to own and review the topology API. Sig-scheduling is out of expertise in this area actually. So if we store the CRD API in scheduler-plugins, can we involved sig-node to review it to ensure the API design is appropriate?

swatisehgal · 2020-12-04T11:04:43Z

@swatisehgal does that mean the work pushing the API to staging and scheduler plugin to in-tree being postponed? Or you're still pursuing that?

@Huang-Wei We need to do some work to showcase the value-proposition of topology-aware scheduling to make a strong case for the API and scheduler-plugin in-tree. We think that this will take some time but when we are ready with that we will resume the discussion with SIG-arch and SIG-node. So, to answer your question we are pursuing pushing in-tree in the background but postponing the discussion for now.

The other aspect is the API review. Similar to the Topology Manager feature, sig-node should be responsible to own and review the topology API. Sig-scheduling is out of expertise in this area actually. So if we store the CRD API in scheduler-plugins, can we involved sig-node to review it to ensure the API design is appropriate?

We will take care of this and bring this to SIG-Node. Thanks for letting us know the expectations.

- Capturing API updates and RBAC rules (for accessing NodeResourceTopology CRD) - Capturing the import path of the CRD API as https://github.com/k8stopologyawareschedwg/noderesourcetopology-api rather than placing the API defnition in the scheduler-plugin repo itself. Signed-off-by: Swati Sehgal <[email protected]>

Updates to the KEP based on the lastest changes

AlexeyPerevalov · 2021-01-27T09:14:15Z

/ok-to-test

k8s-ci-robot · 2021-01-27T09:14:28Z

@AlexeyPerevalov: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

denkensk · 2021-01-29T06:32:06Z

/ok-to-test

draveness · 2021-03-10T12:27:08Z

kep/119-node-resource-topology-aware-scheduling/README.md

+
+type ResourceInfo struct {
+	Name        string             `json:"name"`
+	Allocatable intstr.IntOrString `json:"allocatable"`


How can we maintain the consistency for NodeResourceTopology.Allocatable and reserved CPUs. Let's suppose if schedule a pod on the node.

In Filter and Score extension, we use the NodeResourceTopology.Allocatable to filter and select the host.

In Reserve extension, we reserve several CPUs for the current pod (kubelet don't know the reserved CPUs)

In the PostBind extension, we should release the CPUs as Kubelet would update the CRD to reflect the CPU usage, e.g. reduce the allocatable field.

Since kubelet and scheduler do not talk to each other, is it possible that there could be a small timeslot between releasing the CPUs in the PostBind extension and kubelet update the CRD? And is a wrong scheduling decision could be made in that gap?

Yes, this KEP didn't cover issue you mentioned, but it should. We can't follow approach of noderesources/fit.go built-in plugin, where plugin relies on NodeInfo.Allocatable, which updates once pod nominated, and it reduces the gap. We can't follow it, because we can't predict which NUMA node will be chosen on the worker node.
I think in future we can try to add some "hint", e.g. annotation at PostBind stage to inform kubelet which NUMA node was selected and use it to reduce amount of available resource of the worker node, as it implemented for Fit plugin.

Is it possible if we select a socket or sockets during scheduling and the kubelet allocates the pod on that. I think after we introduce NUMA into scheduling, the socket may become the minimum scheduling unit.

Socket it's another term, several NUMA nodes could be on one socket. Here we're talking about NUMA nodes, yes as I said, if we would manage to add some annotation it's possible for kubelet to make its decision based on scheduler's evaluation.

Annotations would work but sound like a patch to me. Can we introduce a common approach to control the behaviour on how kubelet set cgroups, including cpuset, cfs and etc?

I think external resource management like https://github.com/nokia/CPU-pooler or https://github.com/intel/cri-resource-manager/ could address cgroup management issue.

seanmalloy · 2021-03-31T04:41:48Z

/kind feature

Huang-Wei · 2021-05-13T17:32:39Z

/lgtm
/approve

k8s-ci-robot · 2021-05-13T17:32:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AlexeyPerevalov, Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Huang-Wei]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Huang-Wei · 2021-05-13T18:53:21Z

/retest

Huang-Wei · 2021-05-13T19:34:19Z

/retest

Huang-Wei · 2021-05-13T21:28:17Z

/retest

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 30, 2020

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 30, 2020

k8s-ci-robot requested review from cwdsuzhou and seanmalloy November 30, 2020 14:06

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 30, 2020

Topology aware scheduler plugin

dbacf97

Co-authored-by: Swati Sehgal <[email protected]> Signed-off-by: Swati Sehgal <[email protected]> Signed-off-by: Alexey Perevalov <[email protected]>

AlexeyPerevalov force-pushed the KEPTopologyAwareSchedulerPerNUMA branch from ad23c8b to dbacf97 Compare November 30, 2020 14:10

AlexeyPerevalov commented Nov 30, 2020

View reviewed changes

swatisehgal and others added 2 commits January 22, 2021 15:02

Merge pull request #5 from swatisehgal/update-kep

bc00990

Updates to the KEP based on the lastest changes

This was referenced Jan 25, 2021

Simplified version of topology manager in kube-scheduler kubernetes/enhancements#1858

Closed

Remove NodeResourceTopology API from staging kubernetes/kubernetes#96275

Merged

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 29, 2021

AlexeyPerevalov mentioned this pull request Jan 29, 2021

Topology aware scheduler per numa #143

Merged

draveness reviewed Mar 10, 2021

View reviewed changes

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 31, 2021

k8s-ci-robot assigned Huang-Wei May 13, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 13, 2021

k8s-ci-robot merged commit fc58f9a into kubernetes-sigs:master May 13, 2021

swatisehgal mentioned this pull request Sep 30, 2021

REQUEST: New kubernetes-sigs membership for @swatisehgal kubernetes/org#3023

Closed

7 tasks

Topology aware scheduler plugin #119

Topology aware scheduler plugin #119

Uh oh!

Conversation

AlexeyPerevalov commented Nov 30, 2020

Uh oh!

k8s-ci-robot commented Nov 30, 2020

Uh oh!

k8s-ci-robot commented Nov 30, 2020

Uh oh!

AlexeyPerevalov Nov 30, 2020

Choose a reason for hiding this comment

Uh oh!

Huang-Wei commented Dec 1, 2020

Uh oh!

AlexeyPerevalov commented Dec 1, 2020

Uh oh!

swatisehgal commented Dec 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Huang-Wei commented Dec 1, 2020

Uh oh!

swatisehgal commented Dec 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Huang-Wei commented Dec 2, 2020

Uh oh!

swatisehgal commented Dec 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexeyPerevalov commented Jan 27, 2021

Uh oh!

k8s-ci-robot commented Jan 27, 2021

Uh oh!

denkensk commented Jan 29, 2021

Uh oh!

draveness Mar 10, 2021

Choose a reason for hiding this comment

Uh oh!

AlexeyPerevalov Mar 11, 2021

Choose a reason for hiding this comment

Uh oh!

draveness Mar 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexeyPerevalov Mar 16, 2021

Choose a reason for hiding this comment

Uh oh!

draveness Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexeyPerevalov Apr 6, 2021

Choose a reason for hiding this comment

Uh oh!

seanmalloy commented Mar 31, 2021

Uh oh!

Huang-Wei commented May 13, 2021

Uh oh!

k8s-ci-robot commented May 13, 2021

Uh oh!

Huang-Wei commented May 13, 2021

Uh oh!

Huang-Wei commented May 13, 2021

Uh oh!

Huang-Wei commented May 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

swatisehgal commented Dec 1, 2020 •

edited

Loading

swatisehgal commented Dec 2, 2020 •

edited

Loading

swatisehgal commented Dec 4, 2020 •

edited

Loading

draveness Mar 11, 2021 •

edited

Loading

draveness Mar 31, 2021 •

edited

Loading