-
Notifications
You must be signed in to change notification settings - Fork 41.6k
[PodLevelResources] Pod Level Hugepage Resources #130577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PodLevelResources] Pod Level Hugepage Resources #130577
Conversation
|
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi @KevinTMtz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Ping me for final approval when kubelet reviews are done. |
13edfba to
7bc5c4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
| lcr.HugepageLimits = GetHugepageLimitsFromResources(container.Resources) |
Currently we are not going to set the cgroup values for the containers that do not specify hugepage limits, because of the starving problem caused by not having hugepages isolation between the containers that specify limits and the ones that do not, when setting the HugeTLB cgroups to max (or max minus the aggregated limits) of the containers that did not specify.
Initially, I added the commit Use pod level hugepage limits for cgroup when unset in container, however because of this situation, I reverted the change |
This reverts commit f0a1473 and kuberuntime_container_linux_test.go change.
7bc5c4c to
7b38bff
Compare
|
@KevinTMtz Is it allowed for aggregated huge pages limits to be greater than pod-level huge pages limits? Did we test this case? |
It is not possible, since hugepages can not be overcommitted. If the user tries to do something like this: This error will be obtained: |
|
/lgtm |
|
LGTM label has been added. Git tree hash: dd85775c100da0e1d90446ced8156d5fc33c1ac6
|
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: KevinTMtz, tallclair, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| // the resource is supported. | ||
| func IsSupportedPodLevelResource(name v1.ResourceName) bool { | ||
| return supportedPodLevelResources.Has(name) | ||
| return supportedPodLevelResources.Has(name) || strings.HasPrefix(string(name), v1.ResourceHugePagesPrefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple comments here:
- this is effectively relaxing validation, which means we have to make sure it isn't active by default in the first release it is present in (1.33 will choke on pods this allows in with pod-level hugepages requests/limits)
- this allowed things into the API which weren't actually implemented or tested yet (implementation is in [PodLevelResources] Propagate Pod level hugepage cgroup to containers #131089) ... we generally want to make sure implementation merges with API
both of those issues need to be addressed before pod-level resources enables by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I just realized this merged in 1.33, not 1.34. whew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kevin also clarified that this PR did implement support. The implementation is implicit - pod level huge pages were already set on the pod cgroup using the Pod{Requests,Limits} helper which already factors in pod-level resources, so once we allow huge-pages to be set at the pod-level, the implementation is already there.
What type of PR is this?
What this PR does / why we need it:
This PR implements Pod Level Hugepage Resources that require following changes:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: