Skip to content

Conversation

@MenD32
Copy link
Contributor

@MenD32 MenD32 commented Apr 27, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler. ResourceClaims support the AdminAccess field which is used to allow cluster administrators to access devices already in use. This changes the CA's business logic by introducing the idea that "some resourceclaims don't reserve their allocated devices".

Which issue(s) this PR fixes:

Fixes #7685

Special notes for your reviewer:

Does this PR introduce a user-facing change?

ResourceClaims with AdminAccess will now be ignored when calculating node utilization for scaledown

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/a55eefc6051d6684d8cc7521e1f4de6319625e23/keps/sig-auth/5018-dra-adminaccess/README.md

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 27, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @MenD32. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot requested review from vadasambar and x13n April 27, 2025 18:45
@k8s-ci-robot k8s-ci-robot added area/cluster-autoscaler size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 27, 2025
result := map[string]map[string][]string{}
for _, claim := range claims {
alloc := claim.Status.Allocation
claimCopy := ClaimWithoutAdminAccessRequests(claim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cleaner to do this in CalculateDynamicResourceUtilization ? We'd have to wrap an enumerator around ClaimWithoutAdminAccessRequests, but at least we could prune the claims data at the source so that any additional downstream usages of it in the future get that pruning for free. (I don't think that CA would ever want to operate upon claim devices that are tagged for AdminAccess.)

wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think other calls to groupAllocatedDevices would expect adminAccess resource requests to be removed?

wantErr: cmpopts.AnyError,
},
} {
if tc.testName != "DRA slices and claims present, DRA enabled -> DRA util returned despite being lower than CPU" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, that was for debugging the test, i need to remove this

@MenD32 MenD32 requested a review from jackfrancis April 29, 2025 10:07
@MenD32 MenD32 force-pushed the feat/dra-admin-access branch from 94b849e to bece2b0 Compare April 29, 2025 10:23
@jackfrancis
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 29, 2025
@jackfrancis
Copy link
Contributor

/lgtm

/assign @towca

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 29, 2025
@MenD32
Copy link
Contributor Author

MenD32 commented May 26, 2025

@towca, could you please take a look at this PR when you get a chance?

// remove AdminAccessRequests from the claim before calculating utilization
claims[i] = ClaimWithoutAdminAccessRequests(claim)
}
allocatedDevices, err := groupAllocatedDevices(claims)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just modify groupAllocatedDevices to skip Devices with AdminAccess=true? Seems much simpler and doesn't require any copying.

Copy link
Contributor Author

@MenD32 MenD32 Jun 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make the code cleaner... do you think this can add unintentional side effects for function calls?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most callsites would probably want to skip the admin-access devices because of the same reason as util calculations, but we can always add a parameter to the function.

wantHighestUtilization: 0.2,
wantHighestUtilizationName: apiv1.ResourceName(fmt.Sprintf("%s/%s", fooDriver, "pool1")),
},
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test case with both kinds of claims together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

Devices: resourceapi.DeviceAllocationResult{
Results: []resourceapi.DeviceRequestAllocationResult{
{Request: fmt.Sprintf("request-%d", podDevIndex), Driver: driverName, Pool: poolName, Device: devName},
{Request: devReqName, Driver: driverName, Pool: poolName, Device: devName},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result has an AdminAccess field as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, totally missed that, I'll fix

claimCopy.UID = uuid.NewUUID()
claimCopy.Name = fmt.Sprintf("%s-%s", claim.Name, nameSuffix)
claimCopy.OwnerReferences = []metav1.OwnerReference{PodClaimOwnerReference(newOwner)}
claimCopy = ClaimWithoutAdminAccessRequests(claimCopy)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it doesn't feel right modifying the claims to remove the admin-access allocation results. The claims are checked and possibly allocated by the DRA scheduler plugin during Filter. They just don't "block" the allocated devices from being allocated for other claims. If we remove the results we essentially have an "invalid" allocated claim where not all requests have an allocation. Not sure if the DRA scheduler plugin Filters would pass for such a Pod.

IMO we should duplicate the admin-access allocation results that are not Node-local without sanitization. The non-Node-local devices are then "double-booked", but this is fine because admin-access doesn't actually book them. We should still sanitize the Node-local results to avoid pointing to devices that definitely aren't available on the new Node. This should leave the claim in a relatively valid state - Node-local allocations are correctly pointing to the devices from the new Node, non-Node-local allocations point to the same devices as the initial claim did. The only assumption we're making is that if a non-Node-local device is available on oldNode, it will be available on newNode as well.

It seems that we can just slightly modify this function to achieve this:

  • If a result Pool isn't in oldNodePoolNames but it has admin-access set - add it to sanitizedAllocations as-is instead of returning an error.
  • I wonder if we can just remove the nodeSelector check, it's pretty redundant with checking against oldNodePoolNames. Otherwise we'd have to move the check after sanitizing result and do something like "don't error if there were any non-Node-local, admin-access results during sanitization".
  • I'd also definitely add new test cases to unit tests for this function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MenD32 ping on this comment. Or maybe you responded and Github is just hiding it from me 😅?

Copy link
Contributor Author

@MenD32 MenD32 Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I forgot to ping you when I addressed that in a commit... sorry about that

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 7, 2025
@MenD32 MenD32 force-pushed the feat/dra-admin-access branch from 4fc845c to 38083af Compare June 7, 2025 20:54
@MenD32
Copy link
Contributor Author

MenD32 commented Jun 7, 2025

Just finished addressing the review comments, so this PR is ready for review again, @jackfrancis @towca

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2025
@MenD32
Copy link
Contributor Author

MenD32 commented Oct 3, 2025

/remove-lifecycle-stale

@jackfrancis
Copy link
Contributor

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 3, 2025
@jackfrancis
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2025
for _, deviceRequestAllocationResult := range claimCopy.Status.Allocation.Devices.Results {
// Device requests with AdminAccess don't reserve their allocated resources, and are ignored when scheuling.
devReq := getDeviceResultRequest(claim, &deviceRequestAllocationResult)
if devReq != nil && devReq.AdminAccess != nil && *devReq.AdminAccess {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the ""k8s.io/utils/ptr" package in our go.mod, so we can change these pointer usages to leverage that package. E.g.:

if ptr.Deref(devReq.AdminAccess, false) {

Ref:

@MenD32 MenD32 requested a review from jackfrancis October 3, 2025 16:56
}
deviceRequestAllocationResults = append(deviceRequestAllocationResults, deviceRequestAllocationResult)
}
if claimCopy.Status.Allocation != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L122 already guarantees that claimCopy won't be nil at this point in the execution flow, so we can simply assign deviceRequestAllocationResults to claimCopy.Status.Allocation.Devices.Results without checking for nil here.


// getDeviceResultRequest returns the DeviceRequest for the provided DeviceRequestAllocationResult in the provided ResourceClaim. If no result is found, nil is returned.
func getDeviceResultRequest(claim *resourceapi.ResourceClaim, deviceRequestAllocationResult *resourceapi.DeviceRequestAllocationResult) *resourceapi.DeviceRequest {
if claim.Status.Allocation == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this defensive check here since we're only calling this from ClaimWithoutAdminAccessRequests (which as implemented would never pass a nil Allocation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fair to add this check to this function so it could be insulated from how it might be called in the future, not just in ClaimWithoutAdminAccessRequests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be every be a non-nil claim.Spec.Devices.Requests when there is a nil claim.Status.Allocation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, there could be a race-condition in the DRA plugin where is first adds the device requests and then updates the claim status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the future value of this protection is unclear, and this function would be made more clear and maintainable if we enumerated through claim.Spec.Devices.Requests. Having an orthogonal check against claim.Status.Allocation inside the function itself, which seems not integrally related to the if claim.Spec.Devices.Requests[i].Name == deviceRequestAllocationResult.Request { evaluation, is a bit perplexing for naive maintainers (like me!)

@nojnhuh @towca to sanity check my thinking here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I can remove it, or maybe move the check to before the function call, does that sound good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the check to before the function call

that makes the most sense to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, should be a quick fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed we already check the claim's deepcopy so it can just be removed entirely

}

for _, deviceAlloc := range alloc.Devices.Results {
if deviceAlloc.AdminAccess != nil && *deviceAlloc.AdminAccess {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can shorten to if ptr.Deref(devAlloc.AdminAccess, false) { (have to import "k8s.io/utils/ptr" into this file as well)

@MenD32
Copy link
Contributor Author

MenD32 commented Oct 3, 2025

I wasn't aware of ptr.Deref, this is so useful! especially when working with complex Kubernetes structs

@MenD32 MenD32 requested a review from jackfrancis October 3, 2025 21:25
@jackfrancis
Copy link
Contributor

/hold

/lgtm
/approve

@towca might have some more thoughts on this

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 6, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, MenD32

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 6, 2025
@MenD32
Copy link
Contributor Author

MenD32 commented Nov 3, 2025

/hold

/lgtm /approve

@towca might have some more thoughts on this

@towca did you get a change to look at this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CA DRA: support DRA AdminAccess

5 participants