-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-2837: PodLevelResources changes for 1.33 alpha stage #5145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -17,6 +17,7 @@ | |||||
| - [Components/Features changes](#componentsfeatures-changes) | ||||||
| - [Cgroup Structure Remains unchanged](#cgroup-structure-remains-unchanged) | ||||||
| - [PodSpec API changes](#podspec-api-changes) | ||||||
| - [PodStatus API changes](#podstatus-api-changes) | ||||||
| - [PodSpec Validation Rules](#podspec-validation-rules) | ||||||
| - [Proposed Validation & Defaulting Rules](#proposed-validation--defaulting-rules) | ||||||
| - [Comprehensive Tabular View](#comprehensive-tabular-view) | ||||||
|
|
@@ -32,17 +33,20 @@ | |||||
| - [Admission Controller](#admission-controller) | ||||||
| - [Eviction Manager](#eviction-manager) | ||||||
| - [Pod Overhead](#pod-overhead) | ||||||
| - [Hugepages](#hugepages) | ||||||
| - [Memory Manager](#memory-manager) | ||||||
| - [In-Place Pod Resize](#in-place-pod-resize) | ||||||
| - [API changes](#api-changes) | ||||||
| - [Resize Restart Policy](#resize-restart-policy) | ||||||
| - [Implementation Details](#implementation-details) | ||||||
| - [[Scopred for Beta] CPU Manager](#scopred-for-beta-cpu-manager) | ||||||
| - [[Scoped for Beta] Topology Manager](#scoped-for-beta-topology-manager) | ||||||
| - [[Scoped for Beta] User Experience Survey](#scoped-for-beta-user-experience-survey) | ||||||
| - [[Scoped for Beta] Surfacing Pod Resource Requirements](#scoped-for-beta-surfacing-pod-resource-requirements) | ||||||
| - [The Challenge of Determining Effective Pod Resource Requirements](#the-challenge-of-determining-effective-pod-resource-requirements) | ||||||
| - [Goals of surfacing Pod Resource Requirements](#goals-of-surfacing-pod-resource-requirements) | ||||||
| - [Implementation Details](#implementation-details) | ||||||
| - [Implementation Details](#implementation-details-1) | ||||||
dchen1107 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| - [Notes for implementation](#notes-for-implementation) | ||||||
| - [[Scoped for Beta] HugeTLB cgroup](#scoped-for-beta-hugetlb-cgroup) | ||||||
| - [[Scoped for Beta] Topology Manager](#scoped-for-beta-topology-manager) | ||||||
| - [[Scoped for Beta] Memory Manager](#scoped-for-beta-memory-manager) | ||||||
| - [[Scoped for Beta] CPU Manager](#scoped-for-beta-cpu-manager) | ||||||
| - [[Scoped for Beta] In-Place Pod Resize](#scoped-for-beta-in-place-pod-resize) | ||||||
| - [[Scoped for Beta] VPA](#scoped-for-beta-vpa) | ||||||
| - [[Scoped for Beta] Cluster Autoscaler](#scoped-for-beta-cluster-autoscaler) | ||||||
| - [[Scoped for Beta] Support for Windows](#scoped-for-beta-support-for-windows) | ||||||
|
|
@@ -383,7 +387,7 @@ consumption of the pod. | |||||
|
|
||||||
| #### PodSpec API changes | ||||||
|
|
||||||
| New field in `PodSpec` | ||||||
| New field in `PodSpec`: | ||||||
|
|
||||||
| ``` | ||||||
| type PodSpec struct { | ||||||
|
|
@@ -396,6 +400,40 @@ type PodSpec struct { | |||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| #### PodStatus API changes | ||||||
|
|
||||||
| Extend `PodStatus` to include pod-level analog of the container status resource | ||||||
dchen1107 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| fields. Pod-level resource information in `PodStatus` is essential for pod-level [In-Place Pod | ||||||
| Update] | ||||||
| (https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/1287-in-place-update-pod-resources/README.md#api-changes) | ||||||
| as it provides a way to track, report and use the actual resource allocation for the | ||||||
| pod, both before and after a resize operation. | ||||||
|
|
||||||
| ``` | ||||||
| type PodStatus struct { | ||||||
| ... | ||||||
| // Resources represents the compute resource requests and limits that have been | ||||||
| // applied at the pod level. If pod-level resources are not explicitly specified, | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are only the resources supported in pod-level spec.resources (cpu, memory, and now hugepages...) aggregated here, or are other custom resources specified in the containers aggregated here? (as an aside, I think the pod-level resource validation errors if container-level resources are specified which are not included in pod-level resources at all... https://github.com/kubernetes/kubernetes/blob/ee22760391bae28954a69dff499d1cead9a9fcf0/pkg/apis/core/validation/validation.go#L4340-L4356). What happens if pod-level spec.resources sets a pod-level cpu limit, but not a memory limit, and individual containers all set a memory limits. Does this include the pod-level cpu limit and the aggregated container memory limits?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question. For resources that get configured on the pod level cgroup, this should report the actual values applied there. For everything else, I'm not sure. Do pod-level extended resources make sense today?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. DRA: I think pod-level GPUs could make sense and pod-level network interfaces are the ONLY real way to do network.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For those extended resources, this is still an open question. Luckily we can address this in later releases. Not a blocking for 1.33.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ack. I think this should be stated in the non-goals if they are not already there.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is stated in the non-goals section that only CPU, memory and hugepages are supported for now https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2837-pod-level-resource-spec/README.md#non-goals Also, what Jordan pointed out there's a bug in validation logic where if container-level resources are set for unsupported resource type, the validation logic will error out because aggregated container requests in this case will be greater than pod requests ( as pod-level resources won't be set for unsupported resources): https://github.com/kubernetes/kubernetes/blob/ee22760391bae28954a69dff499d1cead9a9fcf0/pkg/apis/core/validation/validation.go#L4340-L4356 I will fix the bug. Thanks @liggitt for finding the bug.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. kubernetes/kubernetes#130131 is the PR @liggitt |
||||||
| // then these will be the aggregate resources computed from containers. If limits are | ||||||
| // not defined for all containers (and pod-level limits are also not set), those | ||||||
| // containers remain unrestricted, and no aggregate pod-level limits will be applied. | ||||||
| // Pod-level limit aggregation is only performed, and is meaningful only, when all | ||||||
| // containers have defined limits. | ||||||
| // +featureGate=InPlacePodVerticalScaling | ||||||
ndixita marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| // +featureGate=PodLevelResources | ||||||
| // +optional | ||||||
| Resources *ResourceRequirements | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We used the same type i.e. ResourceRequirements for Resources in PodSpec as well.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other than duplication, what would be the disadvantage of de-duplicating types? I really dislike when we have fields in the API but they can't be used.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to add a new type ResourceConstraints or ResourceRequestsLimits? |
||||||
|
|
||||||
| // AllocatedResources is the total requests allocated for this pod by the node. | ||||||
| // Kubelet sets this to the accepted requests when a pod (or resize) is admitted. | ||||||
| // If pod-level requests are not set, this will be the total requests aggregated | ||||||
| // across containers in the pod. | ||||||
| // +featureGate=InPlacePodVerticalScaling | ||||||
ndixita marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| // +featureGate=PodLevelResources | ||||||
| // +optional | ||||||
| AllocatedResources ResourceList | ||||||
ndixita marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| } | ||||||
| ``` | ||||||
| #### PodSpec Validation Rules | ||||||
|
|
||||||
| ##### Proposed Validation & Defaulting Rules | ||||||
|
|
@@ -1172,6 +1210,183 @@ back to aggregating container requests. | |||||
| size of the pod's cgroup. This means the pod cgroup's resource limits will be | ||||||
| set to accommodate both pod-level requests and pod overhead. | ||||||
|
|
||||||
| #### Hugepages | ||||||
|
|
||||||
| With the proposed changes, support for hugepages(with prefix hugepages-*) will be extended to the pod-level resources specifications, alongside CPU and memory. The hugetlb cgroup for the | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
? |
||||||
| pod will then directly reflect the pod-level hugepage limits, if specified, rather than using an aggregated value from container limits. When scheduling, the scheduler will | ||||||
| consider hugepage requests at the pod level to find nodes with enough available | ||||||
| resources. | ||||||
|
|
||||||
| Containers will still need to mount an emptyDir volume to access the huge page filesystem (typically /dev/hugepages). This is the standard way for containers to interact with huge pages, and this will not change. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sounds confusing; |
||||||
|
|
||||||
| #### Memory Manager | ||||||
|
|
||||||
| With the introduction of pod-level resource specifications, the Kubernetes Memory | ||||||
| Manager will evolve to track and enforce resource limits at both the pod and | ||||||
| container levels. It will need to aggregate memory usage across all containers | ||||||
| within a pod to calculate the pod's total memory consumption. The Memory Manager | ||||||
| will then enforce the pod-level limit as the hard cap for the entire pod's memory | ||||||
| usage, preventing it from exceeding the allocated amount. While still | ||||||
| maintaining container-level limit enforcement, the Memory Manager will need to | ||||||
| coordinate with the Kubelet and eviction manager to make decisions about pod | ||||||
| eviction or individual container termination when the pod-level limit is | ||||||
| breached. | ||||||
|
|
||||||
| #### In-Place Pod Resize | ||||||
|
|
||||||
| ##### API changes | ||||||
|
|
||||||
| IPPR for pod-level resources requires extending `PodStatus` to include pod-level | ||||||
| resource fields as detailed in [PodStatus API changes](#### PodStatus API changes) | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| section. | ||||||
|
|
||||||
| ##### Resize Restart Policy | ||||||
|
|
||||||
| Pod-level resize policy is not supported in the alpha stage of Pod-level resource | ||||||
| feature. While a pod-level resize policy might be beneficial for VM-based runtimes | ||||||
| like Kata Containers (potentially allowing the hypervisor to restart the entire VM | ||||||
| on resize), this is a topic for future consideration. We plan to engage with the | ||||||
| Kata community to discuss this further and will re-evaluate the need for a pod-level | ||||||
| policy in subsequent development stages. | ||||||
|
|
||||||
| The absence of a pod-level resize policy means that container restarts are | ||||||
| exclusively managed by their individual `resizePolicy` configs. The example below of | ||||||
| a pod with pod-level resources demonstrates several key aspects of this behavior, | ||||||
| showing how containers without explicit limits (which inherit pod-level limits) interact | ||||||
| with resize policy, and how containers with specified resources remain unaffected by | ||||||
| pod-level resizes. | ||||||
|
|
||||||
| ```yaml | ||||||
| apiVersion: v1 | ||||||
| kind: Pod | ||||||
| metadata: | ||||||
| name: pod-level-resources | ||||||
| spec: | ||||||
| resources: | ||||||
| requests: | ||||||
| cpu: 100m | ||||||
| memory: 100Mi | ||||||
| limits: | ||||||
| cpu: 200m | ||||||
| memory: 200Mi | ||||||
| containers: | ||||||
| - name: c1 | ||||||
| image: registry.k8s.io/pause:latest | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: can we avoid implying that |
||||||
| resizePolicy: | ||||||
| - resourceName: "cpu" | ||||||
| restartPolicy: "NotRequired" | ||||||
| - resourceName: "memory" | ||||||
| restartPolicy: "RestartRequired" | ||||||
| - name: c2 | ||||||
| image: registry.k8s.io/pause:latest | ||||||
| resources: | ||||||
| requests: | ||||||
| cpu: 50m | ||||||
| memory: 50Mi | ||||||
| limits: | ||||||
| cpu: 100m | ||||||
| memory: 100Mi | ||||||
| resizePolicy: | ||||||
| - resourceName: "cpu" | ||||||
| restartPolicy: "NotRequired" | ||||||
| - resourceName: "memory" | ||||||
| restartPolicy: "RestartRequired" | ||||||
| ``` | ||||||
|
|
||||||
| In this example: | ||||||
| * CPU resizes: Neither container requires a restart for CPU resizes, and therefore CPU resizes at neither the container nor pod level will trigger any restarts. | ||||||
| * Container c1 (inherited memory limit): c1 does not define any container level | ||||||
| resources, so the effective memory limit of the container is determined by the | ||||||
| pod-level limit. When the pod's limit is resized, c1's effective memory limit | ||||||
| changes. Because c1's memory resizePolicy is RestartRequired, a resize of the | ||||||
| pod-level memory limit will trigger a restart of container c1. | ||||||
| * Container c2 (specified memory limit): c2 does define container-level resources, | ||||||
| so the effective memory limit of c2 is the container level limit. Therefore, a | ||||||
| resize of the pod-level memory limit doesn't change the effective container limit, | ||||||
| so the c2 is not restarted when the pod-level memory limit is resized. | ||||||
|
|
||||||
| ##### Implementation Details | ||||||
|
|
||||||
| ###### Allocating Pod-level Resources | ||||||
| Allocation of pod-level resources will work the same as container-level resources. The allocated resources checkpoint will be extended to include pod-level resources, and the pod object will be updated with the allocated resources in the pod sync loop. | ||||||
|
|
||||||
| ###### Actuating Pod-level Resource Resize | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the record, I think we should probably be periodically asserting the "correct" size for pod resources, just as I think we should for container resources. No action needed here, but when we solve one, solve both. |
||||||
| The mechanism for actuating pod-level resize remains largely unchanged from the | ||||||
| existing container-level resize process. When pod-level resource configurations are | ||||||
| applied, the system handles the resize in a similar manner as it does for | ||||||
| container-level resources. This includes extending the existing logic to incorporate | ||||||
| directly configured pod-level resource settings. | ||||||
|
|
||||||
| The same ordering rules for pod and container resource resizing will be applied for each | ||||||
| resource as needed: | ||||||
| 1. Increase pod-level cgroup (if needed) | ||||||
| 2. Decrease container resources | ||||||
| 3. Decrease pod-level cgroup (if needed) | ||||||
| 4. Increase container resources | ||||||
|
|
||||||
| ###### Tracking Actual Pod-level Resources | ||||||
| To accurately track actual pod-level resources during in-place pod resizing, several | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tallclair Given the discussion on how NRI plugins or systemd can mutate the resources (e.g. rounding), what happens when:
Are we smart enough to increase the pod?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, we don't today, but we could. We are reading the actual values from the runtime, so we could compute the the pod-level cgroups based on the sum of those instead of the allocated resources (or whichever is larger). We could even compute the diff with what we asked for, and add that to the pod.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm... but if NRI plugins change the value to be completely different, that'd just conflicts with how kubelet manages the cgroups. We can simply grab the values from the and assume those are the ones we want.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed with Yuju above. I am expecting that with those new efforts on the resource management we are doing and plan to do, the users eventually limit their NRI usages largely.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can continue discussing this, but this isn't a blocker here. |
||||||
| changes are required that are analogous to the changes made for container-level | ||||||
| in-place resizing: | ||||||
|
|
||||||
| 1. Configuration reading: Pod-level resource config is currently read as part of the | ||||||
| resize flow, but will also need to be read during pod creation. Critically, the | ||||||
| configuration must be read again after the resize operation to capture the | ||||||
| updated resource values. Currently, the configuration is only read before a | ||||||
| resize. | ||||||
ndixita marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| 2. Pod Status Update: Because the pod status is updated before the resize takes | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aside: We should probably re-evaluate this, outside the context of this KEP. Now that there are things reflected in the status that don't also trigger resync, we're going to need to resync the pod just to write another field to the status. I'm not sure off hand what the consequences of moving the status update to the end of PodSync would be. |
||||||
| effect, the status will not immediately reflect the new resource values. If a | ||||||
| container within the pod is also being resized, the container resize operation | ||||||
| will trigger a pod synchronization (pod-sync), which will refresh the pod's | ||||||
| status. However, if only pod-level resources are being resized, a pod-sync must | ||||||
| be explicitly triggered to update the pod status with the new resource | ||||||
| allocation. | ||||||
|
|
||||||
| 3. [Scoped for Beta] Caching: Actual pod resource data may be cached to minimize API server load. This cache, if implemented, must be invalidated after each successful pod resize to ensure that subsequent reads retrieve the latest information. The need for and implementation of this caching mechanism will be evaluated in the beta phase. Performance benchmarking will be conducted to determine if caching is required and, if so, what caching strategy is most appropriate. | ||||||
ndixita marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| **Note for future enhancements for Ephemeral containers with pod-level resources and | ||||||
| IPPR** | ||||||
| Previously, assigning resources to ephemeral | ||||||
| containers wasn't allowed because pod resource allocations were immutable. With | ||||||
| the introduction of in-place pod resizing, users could gain more flexibility: | ||||||
|
|
||||||
| * Adjust pod-level resources to accommodate the needs of ephemeral containers. This | ||||||
| allows for a more dynamic allocation of resources within the pod. | ||||||
| * Specify resource requests and limits directly for ephemeral containers. Kubernetes will | ||||||
| then automatically resize the pod to ensure sufficient resources are available | ||||||
| for both regular and ephemeral containers. | ||||||
|
|
||||||
| Currently, setting `resources` for ephemeral containers is disallowed as pod | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(to clarify) |
||||||
| resource allocations were immutable before In-Place Pod Resizing feature. With | ||||||
| in-place pod resize for pod-level resource allocation, users should be able to | ||||||
| either modify the pod-level resources to accommodate ephemeral containers or | ||||||
| supply resources at container-level for ephemeral containers and kubernetes will | ||||||
| resize the pod to accommodate the ephemeral containers. | ||||||
|
|
||||||
| #### [Scopred for Beta] CPU Manager | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| With the introduction of pod-level resource specifications, the CPU manager in | ||||||
| Kubernetes will adapt to manage CPU requests and limits at the pod level rather | ||||||
| than solely at the container level. This change means that the CPU manager will | ||||||
| allocate and enforce CPU resources based on the total requirements of the entire | ||||||
| pod, allowing for more flexible and efficient CPU utilization across all | ||||||
| containers within a pod. The CPU manager will need to ensure that the aggregate | ||||||
| CPU usage of all containers in a pod does not exceed the pod-level limits. | ||||||
|
|
||||||
| The CPU Manager policies are container-level configurations that control the | ||||||
| fine-grained allocation of CPU resources to containers. While CPU manager | ||||||
| policies will operate within the constraints of pod-level resource limits, they | ||||||
| do not directly apply at the pod level. | ||||||
|
|
||||||
| #### [Scoped for Beta] Topology Manager | ||||||
|
|
||||||
| Note: This section includes only high level overview; Design details will be added in Beta stage. | ||||||
|
|
||||||
| * The pod level scope for topology aligntment will consider pod level requests and limits instead of container level aggregates. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| * The hint providers will consider pod level requests and limits instead of | ||||||
| container level aggregates. | ||||||
ndixita marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| #### [Scoped for Beta] User Experience Survey | ||||||
|
|
||||||
| Before promoting the feature to Beta, we plan to conduct a UX survey to | ||||||
|
|
@@ -1291,85 +1506,6 @@ KEPs. The first change doesn’t present any user visible change, and if | |||||
| implemented, will in a small way reduce the effort for both of those KEPs by | ||||||
| providing a single place to update the pod resource calculation. | ||||||
|
|
||||||
| #### [Scoped for Beta] HugeTLB cgroup | ||||||
|
|
||||||
| Note: This section includes only high level overview; Design details will be added in Beta stage. | ||||||
|
|
||||||
| To support pod-level resource specifications for hugepages, Kubernetes will need to adjust how it handles hugetlb cgroups. Unlike memory, where an unset limit | ||||||
| means unlimited, an unset hugetlb limit is the same as setting it to 0. | ||||||
|
|
||||||
| With the proposed changes, hugepages-2Mi and hugepages-1Gi will be added to the pod-level resources section, alongside CPU and memory. The hugetlb cgroup for the | ||||||
| pod will then directly reflect the pod-level hugepage limits, rather than using an aggregated value from container limits. When scheduling, the scheduler will | ||||||
| consider hugepage requests at the pod level to find nodes with enough available resources. | ||||||
|
|
||||||
|
|
||||||
| #### [Scoped for Beta] Topology Manager | ||||||
|
|
||||||
| Note: This section includes only high level overview; Design details will be added in Beta stage. | ||||||
|
|
||||||
|
|
||||||
| * (Tentative) Only pod level scope for topology alignment will be supported if pod level requests and limits are specified without container-level requests and limits. | ||||||
| * The pod level scope for topology aligntment will consider pod level requests and limits instead of container level aggregates. | ||||||
| * The hint providers will consider pod level requests and limits instead of container level aggregates. | ||||||
|
|
||||||
|
|
||||||
| #### [Scoped for Beta] Memory Manager | ||||||
|
|
||||||
| Note: This section includes only high level overview; Design details will be | ||||||
| added in Beta stage. | ||||||
|
|
||||||
| With the introduction of pod-level resource specifications, the Kubernetes Memory | ||||||
| Manager will evolve to track and enforce resource limits at both the pod and | ||||||
| container levels. It will need to aggregate memory usage across all containers | ||||||
| within a pod to calculate the pod's total memory consumption. The Memory Manager | ||||||
| will then enforce the pod-level limit as the hard cap for the entire pod's memory | ||||||
| usage, preventing it from exceeding the allocated amount. While still | ||||||
| maintaining container-level limit enforcement, the Memory Manager will need to | ||||||
| coordinate with the Kubelet and eviction manager to make decisions about pod | ||||||
| eviction or individual container termination when the pod-level limit is | ||||||
| breached. | ||||||
|
|
||||||
|
|
||||||
| #### [Scoped for Beta] CPU Manager | ||||||
|
|
||||||
| Note: This section includes only high level overview; Design details will be | ||||||
| added in Beta stage. | ||||||
|
|
||||||
| With the introduction of pod-level resource specifications, the CPU manager in | ||||||
| Kubernetes will adapt to manage CPU requests and limits at the pod level rather | ||||||
| than solely at the container level. This change means that the CPU manager will | ||||||
| allocate and enforce CPU resources based on the total requirements of the entire | ||||||
| pod, allowing for more flexible and efficient CPU utilization across all | ||||||
| containers within a pod. The CPU manager will need to ensure that the aggregate | ||||||
| CPU usage of all containers in a pod does not exceed the pod-level limits. | ||||||
|
|
||||||
| #### [Scoped for Beta] In-Place Pod Resize | ||||||
|
|
||||||
| In-Place Pod resizing of resources is not supported in alpha stage of Pod-level | ||||||
| resources feature. **Users should avoid using in-place pod resizing if they are | ||||||
| utilizing pod-level resources.** | ||||||
|
|
||||||
| In version 1.33, the In-Place Pod resize functionality will be controlled by a | ||||||
| separate feature gate and introduced as an independent alpha feature. This is | ||||||
| necessary as it involves new fields in the PodStatus at the pod level. | ||||||
|
|
||||||
| Note for design & implementation: Previously, assigning resources to ephemeral | ||||||
| containers wasn't allowed because pod resource allocations were immutable. With | ||||||
| the introduction of in-place pod resizing, users will gain more flexibility: | ||||||
|
|
||||||
| * Adjust pod-level resources to accommodate the needs of ephemeral containers. This | ||||||
| allows for a more dynamic allocation of resources within the pod. | ||||||
| * Specify resource requests and limits directly for ephemeral containers. Kubernetes will | ||||||
| then automatically resize the pod to ensure sufficient resources are available | ||||||
| for both regular and ephemeral containers. | ||||||
|
|
||||||
| Currently, setting `resources` for ephemeral containers is disallowed as pod | ||||||
| resource allocations were immutable before In-Place Pod Resizing feature. With | ||||||
| in-place pod resize for pod-level resource allocation, users should be able to | ||||||
| either modify the pod-level resources to accommodate ephemeral containers or | ||||||
| supply resources at container-level for ephemeral containers and kubernetes will | ||||||
| resize the pod to accommodate the ephemeral containers. | ||||||
|
|
||||||
| #### [Scoped for Beta] VPA | ||||||
|
|
||||||
| TBD. Do not review for the alpha stage. | ||||||
|
|
||||||
Uh oh!
There was an error while loading. Please reload this page.