Skip to content

Commit 2d90789

Browse files
committed
Add GetAllocatableResource to PodResource API
In order to simplify and make more understandable the KEP, and to comply with the new process, we extract the unit of work still ongoing in this KEP from kubernetes#1884 Work in this area was done during the 1.20 and 1.21 cycles in kubernetes/kubernetes#95734 Rationale, discussion and documentation for all the changes including the one proposed in this KEP have been described in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments and reported here were relevant Signed-off-by: Francesco Romani <[email protected]>
1 parent 996261d commit 2d90789

File tree

3 files changed

+312
-0
lines changed

3 files changed

+312
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2403
2+
stable:
3+
approver: "@johnbelamaric"
Lines changed: 264 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,264 @@
1+
title: Extend kubelet pod resource assignment endpoint to return allocatable resources
2+
3+
## Table of Contents
4+
5+
<!-- toc -->
6+
- [Release Signoff Checklist](#release-signoff-checklist)
7+
- [Summary](#summary)
8+
- [Motivation](#motivation)
9+
- [Goals](#goals)
10+
- [Proposal](#proposal)
11+
- [User Stories](#user-stories)
12+
- [Topology aware scheduling](#topology-aware-scheduling)
13+
- [Risks and Mitigations](#risks-and-mitigations)
14+
- [Design Details](#design-details)
15+
- [Proposed API](#proposed-api)
16+
- [Test Plan](#test-plan)
17+
- [Graduation Criteria](#graduation-criteria)
18+
- [Alpha](#alpha)
19+
- [Alpha to Beta Graduation](#alpha-to-beta-graduation)
20+
- [Beta to G.A Graduation](#beta-to-ga-graduation)
21+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
22+
- [Version Skew Strategy](#version-skew-strategy)
23+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
24+
- [Feature enablement and rollback](#feature-enablement-and-rollback)
25+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
26+
- [Monitoring requirements](#monitoring-requirements)
27+
- [Dependencies](#dependencies)
28+
- [Scalability](#scalability)
29+
- [Troubleshooting](#troubleshooting)
30+
- [Implementation History](#implementation-history)
31+
- [Alternatives](#alternatives)
32+
- [Add v1alpha1 Kubelet GRPC service, at <code>/var/lib/kubelet/pod-resources/kubelet.sock</code>, which returns a list of <a href="https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto#L734">CreateContainerRequest</a>s used to create containers.](#add-v1alpha1-kubelet-grpc-service-at--which-returns-a-list-of-createcontainerrequests-used-to-create-containers)
33+
- [Add a field to Pod Status.](#add-a-field-to-pod-status)
34+
- [Use the Kubelet Device Manager Checkpoint file](#use-the-kubelet-device-manager-checkpoint-file)
35+
- [Add a field to the Pod Spec:](#add-a-field-to-the-pod-spec)
36+
<!-- /toc -->
37+
38+
## Release Signoff Checklist
39+
40+
Items marked with (R) are required *prior to targeting to a milestone / release*.
41+
42+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements](https://github.com/kubernetes/enhancements/issues/2403)
43+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
44+
- [X] (R) Design details are appropriately documented
45+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
46+
- [X] (R) Graduation criteria is in place
47+
- [X] (R) Production readiness review completed
48+
- [X] Production readiness review approved
49+
- [X] "Implementation History" section is up-to-date for milestone
50+
- ~~ [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] ~~
51+
- [X] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
52+
53+
[kubernetes.io]: https://kubernetes.io/
54+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
55+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
56+
[kubernetes/website]: https://git.k8s.io/website
57+
58+
## Summary
59+
60+
This document presents an addition to the kubelet pod resources endpoint (pod resources API) which allows third party consumers to learn about the
61+
compute device allocation, thus, alongside the existing pod resources API endpoint, properly evaluate the node capacity.
62+
63+
## Motivation
64+
65+
### Goals
66+
67+
* Enable node monitoring agents to know the allocatable compute resources on a node, thus properly calculate the node compute resource utilization.
68+
69+
## Proposal
70+
71+
### User Stories
72+
73+
#### Node Feature Discovery
74+
75+
Enable the Node Feature Discovery to [expose hardware topology information](https://github.com/kubernetes-sigs/node-feature-discovery/issues/333).
76+
77+
#### Topology aware scheduling
78+
79+
This interface can be used to track down allocated resources with information about the NUMA topology of the worker node in general way.
80+
This interface can be used to the available resources on the worker node. The kubelet is the best source of information because it manages concrete resources assignment. The information can then be used in NUMA aware scheduling.
81+
Combining the information reported by the `List` API, which pertains the current allocation, with the information reported by the `GetAllocatableResources` API, monitoring agent can reliably report the compute device
82+
utilization and availability.
83+
84+
85+
### Risks and Mitigations
86+
87+
This API is read-only, which removes a large class of risks. The aspects that we consider below are as follows:
88+
- What are the risks associated with the API service itself?
89+
- What are the risks associated with the data itself?
90+
91+
| Risk | Impact | Mitigation |
92+
| --------------------------------------------------------- | ------------- | ---------- |
93+
| Too many requests risk impacting the kubelet performances | High | Implement rate limiting and or passive caching, follow best practices for gRPC resource management. |
94+
| Improper access to the data | Low | Server is listening on a root owned unix socket. This can be limited with proper pod security policies. |
95+
96+
97+
## Design Details
98+
99+
### Proposed API
100+
101+
We propose to extend the existing pod resources gRPC service of the Kubelet, listening on a unix socket at `/var/lib/kubelet/pod-resources/kubelet.sock`.
102+
103+
The GRPC Service will expose and additional endpoint:
104+
- 'GetAllocatableResources`, which returns a single AllocatableResourcesResponse, enabling monitor applications to query for the allocatable set of resources available on the node.
105+
106+
The extended interface is shown in proto below:
107+
```protobuf
108+
// PodResources is a service provided by the kubelet that provides information about the
109+
// node resources consumed by pods and containers on the node
110+
service PodResources {
111+
rpc List(ListPodResourcesRequest) returns (ListPodResourcesResponse) {}
112+
rpc GetAllocatableResources(AllocatableResourcesRequest) returns (AllocatableResourcesResponse) {}
113+
}
114+
115+
message AllocatableResourcesRequest {}
116+
117+
// AvailableResourcesResponses contains informations about all the devices known by the kubelet
118+
message AllocatableResourcesResponse {
119+
repeated ContainerDevices devices = 1;
120+
repeated int64 cpu_ids = 2;
121+
}
122+
123+
// ListPodResourcesRequest is the request made to the PodResources service
124+
message ListPodResourcesRequest {}
125+
126+
// ListPodResourcesResponse is the response returned by List function
127+
message ListPodResourcesResponse {
128+
repeated PodResources pod_resources = 1;
129+
}
130+
131+
// PodResources contains information about the node resources assigned to a pod
132+
message PodResources {
133+
string name = 1;
134+
string namespace = 2;
135+
repeated ContainerResources containers = 3;
136+
}
137+
138+
// ContainerResources contains information about the resources assigned to a container
139+
message ContainerResources {
140+
string name = 1;
141+
repeated ContainerDevices devices = 2;
142+
repeated int64 cpu_ids = 3;
143+
}
144+
145+
// Topology describes hardware topology of the resource
146+
message TopologyInfo {
147+
repeated NUMANode nodes = 1;
148+
}
149+
150+
// NUMA representation of NUMA node
151+
message NUMANode {
152+
int64 ID = 1;
153+
}
154+
155+
// ContainerDevices contains information about the devices assigned to a container
156+
message ContainerDevices {
157+
string resource_name = 1;
158+
repeated string device_ids = 2;
159+
TopologyInfo topology = 3;
160+
}
161+
```
162+
163+
### Test Plan
164+
165+
The implementation PR adds a suite of E2E tests which cover both the existing `List` endpoint already implemented in the podresources API and
166+
the new proposed `GetAllocatableResources` API.
167+
168+
### Graduation Criteria
169+
170+
#### Alpha
171+
- [X] Implement the new service API.
172+
- [X] Ensure proper e2e node tests are in place.
173+
174+
#### Alpha to Beta Graduation
175+
- [X] The new API is consumed by other public software components (e.g. NFD).
176+
- [X] No major bugs reported in the previous cycle.
177+
178+
#### Beta to G.A Graduation
179+
- [X] Allowing time for feedback (1 year).
180+
- [X] Risks have been addressed.
181+
182+
### Upgrade / Downgrade Strategy
183+
184+
With gRPC the version is part of the service name.
185+
Old versions and new versions should always be served and listened by the kubelet.
186+
187+
To a cluster admin upgrading to the newest API version, means upgrading Kubernetes to a newer version as well as upgrading the monitoring component.
188+
189+
To a vendor changes in the API should always be backwards compatible.
190+
191+
### Version Skew Strategy
192+
193+
Kubelet will always be backwards compatible, so going forward existing plugins are not expected to break.
194+
195+
## Production Readiness Review Questionnaire
196+
### Feature enablement and rollback
197+
198+
* **How can this feature be enabled / disabled in a live cluster?**
199+
- [X] Feature gate (also fill in values in `kep.yaml`).
200+
- Feature gate name: `KubeletPodResourcesGetAllocatable`.
201+
- Components depending on the feature gate: N/A.
202+
203+
* **Does enabling the feature change any default behavior?** No
204+
* **Can the feature be disabled once it has been enabled (i.e. can we rollback the enablement)?** Yes, through feature gates.
205+
* **What happens if we reenable the feature if it was previously rolled back?** The service recovers state from kubelet.
206+
* **Are there any tests for feature enablement/disablement?** No, however no data is created or deleted.
207+
208+
### Rollout, Upgrade and Rollback Planning
209+
210+
* **How can a rollout fail? Can it impact already running workloads?** Kubelet would fail to start. Errors would be caught in the CI.
211+
* **What specific metrics should inform a rollback?** Not Applicable, metrics wouldn't be available.
212+
* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?** Not Applicable.
213+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?** No.
214+
215+
### Monitoring requirements
216+
* **How can an operator determine if the feature is in use by workloads?**
217+
- Look at the `pod_resources_endpoint_requests_total` metric exposed by the kubelet.
218+
- Look at hostPath mounts of privileged containers.
219+
* **What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?**
220+
- [X] Metrics
221+
- Metric name: `pod_resources_endpoint_requests_total`
222+
- Components exposing the metric: kubelet
223+
224+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** N/A or refer to Kubelet SLIs.
225+
* **Are there any missing metrics that would be useful to have to improve observability if this feature?** No.
226+
227+
228+
### Dependencies
229+
230+
* **Does this feature depend on any specific services running in the cluster?** Not applicable.
231+
232+
### Scalability
233+
234+
* **Will enabling / using this feature result in any new API calls?** No.
235+
* **Will enabling / using this feature result in introducing new API types?** No.
236+
* **Will enabling / using this feature result in any new calls to cloud provider?** No.
237+
* **Will enabling / using this feature result in increasing size or count of the existing API objects?** No.
238+
* **Will enabling / using this feature result in increasing time taken by any operations covered by [existing SLIs/SLOs][]?** No. Feature is out of existing any paths in kubelet.
239+
* **Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?** DDOSing the API can lead to resource exhaustion. It is planned to be addressed as part of G.A.
240+
Feature only collects data when requests comes in, data is then garbage collected. Data collected is proportional to the number of pods on the node.
241+
242+
### Troubleshooting
243+
244+
* **How does this feature react if the API server and/or etcd is unavailable?**: No effect.
245+
* **What are other known failure modes?** No known failure modes
246+
* **What steps should be taken if SLOs are not being met to determine the problem?** N/A
247+
248+
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
249+
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
250+
251+
## Implementation History
252+
253+
- 2021-02-02: KEP extracted from [previous iteration](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments)
254+
- 2021-02-04: KEP polished, added feature gate, clarified the graduation criterias.
255+
256+
## Alternatives
257+
258+
### Add a new endpoint
259+
* Pros:
260+
* No changes to existing APIs
261+
* Cons:
262+
* Requires the client to consume two APIs
263+
* This work nicely fits in the boundaries and purpose of the podresources API
264+
* The changes proposed in this KEP are very low-risk and backward compatible
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
title: Extend kubelet pod resource assignment endpoint to return allocatable resources
2+
kep-number: 2403
3+
authors:
4+
- "@fromanirh"
5+
- "@alexeyperevalov"
6+
owning-sig: sig-node
7+
participating-sigs: []
8+
status: implementable
9+
creation-date: "2021-02-02"
10+
reviewers:
11+
- "@derekwaynecarr"
12+
- "@renaudwastaken"
13+
approvers:
14+
- "@sig-node-leads"
15+
prr-approvers: []
16+
see-also:
17+
- "keps/sig-node/606-compute-device-assignment/"
18+
- "keps/sig-node/2043-pod-resource-concrete-assigments/"
19+
replaces: []
20+
21+
# The target maturity stage in the current dev cycle for this KEP.
22+
stage: alpha
23+
24+
# The most recent milestone for which work toward delivery of this KEP has been
25+
# done. This can be the current (upcoming) milestone, if it is being actively
26+
# worked on.
27+
latest-milestone: "v1.21"
28+
29+
# The milestone at which this feature was, or is targeted to be, at each stage.
30+
milestone:
31+
alpha: "v1.21"
32+
beta: "v1.22"
33+
stable: "v1.23"
34+
35+
# The following PRR answers are required at alpha release
36+
# List the feature gate name and the components for which it must be enabled
37+
feature-gates:
38+
- name: "KubeletPodResourcesGetAllocatable"
39+
components:
40+
- kubelet
41+
disable-supported: false
42+
43+
# The following PRR answers are required at beta release
44+
metrics:
45+
- pod_resources_endpoint_requests_total

0 commit comments

Comments
 (0)