Skip to content

Conversation

@KevinTMtz
Copy link
Contributor

@KevinTMtz KevinTMtz commented Mar 27, 2025

What type of PR is this?

What this PR does / why we need it:

Follow up of [PodLevelResources] Pod Level Hugepage Resources.

This PR propagates Pod level hugepage cgroup to containers with the following changes:

  1. Pod level hugepage cgroup when unset in container
  2. Unit test propagate pod level hugepages to containers

Additionally adds:

  1. Validation logic for pod level hugepages
  2. Unit test pod level hugepage default and validation logic
  3. E2E tests for container hugepage resources immutability

Which issue(s) this PR fixes:

Fixes #132543

Special notes for your reviewer:

Does this PR introduce a user-facing change?

- Changes underlying logic to propagate Pod level hugepage cgroup to containers when they do not specify hugepage resources.
- Adds validation to enforce the hugepage aggregated container limits to be smaller or equal to pod-level limits. This was already enforced with the defaulted requests from the specified limits, however it did not make it clear about both hugepage requests and limits.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [Other doc]: https://docs.google.com/document/d/1JaqE2eRmFAPlRayv8vsAWE4SmQCVXQLr9rFPhEaPlvQ/edit?usp=sharing

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 27, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @KevinTMtz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 27, 2025
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 27, 2025
@KevinTMtz
Copy link
Contributor Author

/assign @ndixita

@bart0sh bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Apr 7, 2025
@ndixita ndixita moved this to Needs Triage in SIG Node: Pod Level Resources Jun 23, 2025
@KevinTMtz KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from b48a3ed to 3ea8028 Compare June 24, 2025 21:31
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Jun 24, 2025
@KevinTMtz KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from 3ea8028 to ec198a6 Compare June 25, 2025 17:50
@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jun 25, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 24, 2025
}

// We do not default pod-level hugepage limits if there is a hugepage request.
if _, exists := pod.Spec.Resources.Requests[key]; exists {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this looks good, make sure we have the following test scenarios covered:

  1. PodSpec in a Deployment with container hugepages limit and no request

    • I think this doesn't get defaulted (e.g. in a Deployment), and passes validation by skipping the limit && request equality check on podspec
    • the eventual pod would default container hugepages request, and still pass validation
    • make sure pod-level resources also avoids defaulting hugepages, and validation is happy with this scenario both for podspec and pod
  2. Pod with a pod-level hugepages request and no limit errors as expected

  3. Pod with a pod-level hugepages request and limit works as expected

@liggitt liggitt moved this from In progress to API review completed, 1.34 in API Reviews Jul 24, 2025
@liggitt
Copy link
Member

liggitt commented Jul 24, 2025

Defaulting change looks good, this is on a field that was alpha-gated in 1.33. The change makes it clearer that someone explicitly setting pod-level hugepages requests must set the corresponding pod-level limit themselves.

Marked as API approved, will tag once the test coverage is verified and the clarification at #131089 (comment) is updated

The hugepage aggregated container limits cannot be greater than pod-level limits.

This was already enforced with the defaulted requests from the specfied
limits, however it did not make it clear about both hugepage requests and limits.
Pod level hugepage resources are not propagated to the containers, only pod level cgroup values are propagated to the containers when they do not specify hugepage resources.
@KevinTMtz KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from 5e7693f to 1bc995c Compare July 24, 2025 21:29
@KevinTMtz
Copy link
Contributor Author

Ok, this looks good, make sure we have the following test scenarios covered:

  1. PodSpec in a Deployment with container hugepages limit and no request
    • I think this doesn't get defaulted (e.g. in a Deployment), and passes validation by skipping the limit && request equality check on podspec
    • the eventual pod would default container hugepages request, and still pass validation
    • make sure pod-level resources also avoids defaulting hugepages, and validation is happy with this scenario both for podspec and pod

Currently there are no E2E tests that focus the PodSpec, current E2E tests focus on pods. Would it be possible to add those in a follow up PR? Deployments undergo the same resource validation as pod, so we would actually be testing the same cases, however only the ones that are valid without the defaulting.

In regards to unit test, the newly added pkg/api/testing/validate_pod_level_defaults_test.go covers those cases.

  1. Pod with a pod-level hugepages request and no limit errors as expected

Unit and E2E tests added in #130577 together with the new tests added in this PR cover this functionality.

  1. Pod with a pod-level hugepages request and limit works as expected

Unit and E2E tests added in #130577 together with the new tests added in this PR cover this functionality.

@liggitt
Copy link
Member

liggitt commented Jul 24, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 24, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 6922a6382899bc3073cb388b5738aeef21eb5eec

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KevinTMtz, liggitt, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 24, 2025
@KevinTMtz
Copy link
Contributor Author

/retest

2 similar comments
@KevinTMtz
Copy link
Contributor Author

/retest

@KevinTMtz
Copy link
Contributor Author

/retest

@pacoxu
Copy link
Member

pacoxu commented Jul 25, 2025

/test pull-kubernetes-e2e-gce

@k8s-ci-robot
Copy link
Contributor

@KevinTMtz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce 1bc995c link unknown /test pull-kubernetes-e2e-gce
pull-kubernetes-integration 1bc995c link unknown /test pull-kubernetes-integration

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@KevinTMtz
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 3fd1251 into kubernetes:master Jul 25, 2025
12 of 14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jul 25, 2025
@github-project-automation github-project-automation bot moved this from Archive-it to Done in SIG Node CI/Test Board Jul 25, 2025
@github-project-automation github-project-automation bot moved this from Needs Review to Done in SIG Node: Pod Level Resources Jul 25, 2025
@github-project-automation github-project-automation bot moved this from Needs Reviewer to Done in SIG Node: code and documentation PRs Jul 25, 2025
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in SIG Apps Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Status: API review completed, 1.34
Archived in project

Development

Successfully merging this pull request may close these issues.

Support for Pod-Level Hugepage Resources

9 participants