Skip to content

Conversation

@KevinTMtz
Copy link
Contributor

@KevinTMtz KevinTMtz commented Mar 4, 2025

What type of PR is this?

What this PR does / why we need it:

This PR implements Pod Level Hugepage Resources that require following changes:

  1. Add hugepages to pod level supported resources
  2. Default pod level hugepage limits
  3. Containers with hugepage volume mounts with unset hugepage limits
  4. Use pod level hugepage limits for cgroup when unset in container
  5. Unit tests for pod level hugepage resources
  6. End to end tests for pod level hugepage resources

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

- Changed the Pod API to support `hugepage resources` at `spec` level for pod-level resources.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [Other doc]: https://docs.google.com/document/d/1JaqE2eRmFAPlRayv8vsAWE4SmQCVXQLr9rFPhEaPlvQ/edit?usp=sharing

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 4, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Hi @KevinTMtz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 4, 2025
@k8s-ci-robot k8s-ci-robot added area/kubectl area/kubelet area/test kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels Mar 4, 2025
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 4, 2025
@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Mar 4, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG CLI Mar 4, 2025
@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Mar 4, 2025
@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Mar 4, 2025
@thockin
Copy link
Member

thockin commented Mar 17, 2025

Ping me for final approval when kubelet reviews are done.

@KevinTMtz KevinTMtz force-pushed the pod-level-hugepages branch from 13edfba to 7bc5c4c Compare March 18, 2025 17:42
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does

lcr.HugepageLimits = GetHugepageLimitsFromResources(container.Resources)
need to be updated to fallback to pod-level resources?

@KevinTMtz
Copy link
Contributor Author

Does

lcr.HugepageLimits = GetHugepageLimitsFromResources(container.Resources)

need to be updated to fallback to pod-level resources?

Currently we are not going to set the cgroup values for the containers that do not specify hugepage limits, because of the starving problem caused by not having hugepages isolation between the containers that specify limits and the ones that do not, when setting the HugeTLB cgroups to max (or max minus the aggregated limits) of the containers that did not specify.
If we do not set the hugepage cgroup to the container with unset request/limits, the behavior will depend of the containerd HugeTLB being enabled or not:

  • Containerd HugeTLB Enabled

    • The containers with hugepage request/limit will not be starved, their limits will be respected (because only them will be able to use the hugepages, and the available hugepages will be the aggregated sum of their request/limit)
    • The containers without hugepage request/limit will not have access to any hugepages
  • Containerd HugeTLB Disabled

    • The containers with hugepage request/limit set and unset will have the same behavior, all of them will be able to access the whole hugepage resources in the pod, all of them will be able to starve all the others

Initially, I added the commit Use pod level hugepage limits for cgroup when unset in container, however because of this situation, I reverted the change

@tallclair

@KevinTMtz KevinTMtz force-pushed the pod-level-hugepages branch from 7bc5c4c to 7b38bff Compare March 20, 2025 17:55
@ndixita
Copy link
Contributor

ndixita commented Mar 20, 2025

@KevinTMtz Is it allowed for aggregated huge pages limits to be greater than pod-level huge pages limits? Did we test this case?

@KevinTMtz
Copy link
Contributor Author

@KevinTMtz Is it allowed for aggregated huge pages limits to be greater than pod-level huge pages limits? Did we test this case?

It is not possible, since hugepages can not be overcommitted.

If the user tries to do something like this:

apiVersion: v1
kind: Pod
metadata:
  name: pod-level-hugepages
spec:
  resources:
    limits:
      memory: 256Mi
      hugepages-2Mi: 512Mi
  containers:
  - name: hptest-1
    image: "hptest:0.1"
    resources:
      limits:
        cpu: 1
        hugepages-2Mi: 514Mi
    volumeMounts:
    - mountPath: /hugepages-2Mi
      name: hugepage-2mi
  volumes:
  - name: hugepage-2mi
    emptyDir:
      medium: HugePages-2Mi

This error will be obtained:

The Pod "pod-more-container" is invalid: 
* spec.resources.requests[hugepages-2Mi]: Invalid value: "512Mi": must be greater than or equal to aggregate container requests of 514Mi
* spec.resources.containers[0][hugepages-2Mi].limits: Invalid value: "514Mi": must be less than or equal to pod limits of 512Mi

@ndixita

@tallclair
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 20, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: dd85775c100da0e1d90446ced8156d5fc33c1ac6

@thockin
Copy link
Member

thockin commented Mar 20, 2025

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KevinTMtz, tallclair, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2025
@k8s-ci-robot k8s-ci-robot merged commit 838f3c0 into kubernetes:master Mar 20, 2025
18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Mar 20, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SIG CLI Mar 20, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SIG Apps Mar 20, 2025
// the resource is supported.
func IsSupportedPodLevelResource(name v1.ResourceName) bool {
return supportedPodLevelResources.Has(name)
return supportedPodLevelResources.Has(name) || strings.HasPrefix(string(name), v1.ResourceHugePagesPrefix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple comments here:

  1. this is effectively relaxing validation, which means we have to make sure it isn't active by default in the first release it is present in (1.33 will choke on pods this allows in with pod-level hugepages requests/limits)
  2. this allowed things into the API which weren't actually implemented or tested yet (implementation is in [PodLevelResources] Propagate Pod level hugepage cgroup to containers #131089) ... we generally want to make sure implementation merges with API

both of those issues need to be addressed before pod-level resources enables by default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I just realized this merged in 1.33, not 1.34. whew

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kevin also clarified that this PR did implement support. The implementation is implicit - pod level huge pages were already set on the pod cgroup using the Pod{Requests,Limits} helper which already factors in pod-level resources, so once we allow huge-pages to be set at the pod-level, the implementation is already there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubectl area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

Archived in project
Archived in project
Archived in project

Development

Successfully merging this pull request may close these issues.

9 participants