Skip to content

Conversation

@sairameshv
Copy link
Member

@sairameshv sairameshv commented Jul 1, 2024

Developed these code changes on top of #114847 and #94899
/cc @pacoxu @mikebrow

Based on the latest proposal of the ensure secret pulled image feature, two fields in the kubelet configuration have been defined
Reference: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2535-ensure-secret-pulled-images#proposal

What type of PR is this?

/kind feature

What this PR does / why we need it:

Extends the existing kubelet configuration with two new fields according to the KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2535-ensure-secret-pulled-images#proposal

Fixes kubernetes/enhancements#2535

Special notes for your reviewer:

As of today, if a pod pulls an image from a private repository with proper credentials, the image gets pulled onto a node successfully. When an another pod on the same node with wrong credentials tries to pull the same image, the container is getting started without an issue which shouldn't be the case. The above mentioned feature in the KEP helps in resolving this issue.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2535-ensure-secret-pulled-images/README.md

NONE

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 1, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jul 1, 2024
@k8s-ci-robot k8s-ci-robot requested review from ncdc and sttts July 1, 2024 08:08
@k8s-ci-robot k8s-ci-robot added area/code-generation area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 1, 2024
@sairameshv
Copy link
Member Author

/cc @haircommander

@aojea
Copy link
Member

aojea commented Jul 1, 2024

let's add APIs and functionality and test together please, or we may end in inconsistent states

@sairameshv
Copy link
Member Author

let's add APIs and functionality and test together please, or we may end in inconsistent states

Ack, Sure @aojea !

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 9, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sairameshv
Once this PR has been reviewed and has the lgtm label, please assign liggitt for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sairameshv sairameshv changed the title Add extra fields in the Kubelet Configuration to support KEP-2535 Implement KEP-2535 - Ensure secret pulled images feature Jul 9, 2024
@sairameshv
Copy link
Member Author

/test all

}

// if the image is in the ensured secret pulled image list, it is pulled by secret
pulledBySecret = m.ensureSecretPulledImages[imageRef] != nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay one remaining structural concern: how can the kubelet differentiate between image that was present at startup and image that was previously pulled without a secret? currently, if we're checking just this map, then we'd assume that all images currently present on the node (before the kubelet initialized the cache) would be pulled without auth--which may not be true. we similarly have to be careful about deleting an entry from the map, as we don't want to accidentally report it as an image that has no auth.

I am thinking we need to store an entry if we successfully pull without auth, and use some signifier for such a situation in the ensureSecretPulledImages map (maybe an empty string in the Auths map)?

WDYT @sairameshv

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay one remaining structural concern: how can the kubelet differentiate between image that was present at startup and image that was previously pulled without a secret?

This was exactly the issue with the attempt in 1.30, and what made the previous attempt slip. Can I get a summary of how the approach has changed to address that issue (which in my mind is the main problem to be solved with this feature).

distinguishing between these is important and we have to be clear about how we are doing it without regressing existing users who preload images to be used without auth to be robust against public registry unavailability:

  • present at kubelet startup, not pulled with auth (preloaded, expected to be usable by all pods)
  • present at kubelet startup, previously pulled with auth

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distinguishing between these is important and we have to be clear about how we are doing it without regressing existing users who preload images to be used without auth to be robust against public registry unavailability:
present at kubelet startup, not pulled with auth (preloaded, expected to be usable by all pods)

I actually think we should not support this case. A user can turn off this feature if they're sensitive to registry unavailability enough to prepopulate the node. I think we should always treat a cache miss the same: repull the image. I think in the future, we can investigate ways to expose image management as a higher level api (kubectl image pull...) or something to allow users to prepopulate a node before an upgrade without needing to run pods to have the kubelet be aware of the image. but for now, I don't think we should support this case with this feature. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one difference between this implementation and the last is we reintroduced kubelet saving the cache, so we can actually solve the

present at kubelet startup, previously pulled with auth

case, as well as we use PullImageSecretRecheckPeriod == 0 to say "never delete the cache", where it was previously used to set PullImageSecretRecheck=false. This means if an image was ever pulled by the kubelet, we'll continue to track it and be able to protect against registry flakes on repulls with the same auth

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking we need to store an entry if we successfully pull without auth, and use some signifier for such a situation in the ensureSecretPulledImages map (maybe an empty string in the Auths map)?

Currently, we enforced a check to store the metadata of the images only if they have auth component here

present at kubelet startup, previously pulled with auth

I think we have a solution with this implementation here. We do get the Image reference again of an existing image and check the auth component and store it into the cache while performing this EnsureImageExists check during the container start.
So, we don't have a problem if a new container is about to start using an existing image that is pre-populated on the node. Does that makes sense @haircommander ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could get away with what you're describing, but it makes the writing/reading of the cache from disk more important. If the read fails or the file is corrupted, we'd treat the present images as "prepulled". I guess my point is the only way to ensure we always re-auth when we need to is have the internal representation of the cache be the source of truth, and distrust any image not present there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're not worried about that case, then I am not feeling super stubborn about it. But I think it simplifies the logic if we hold that invariant true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my point is the only way to ensure we always re-auth when we need to is have the internal representation of the cache be the source of truth, and distrust any image not present there.

How we transition existing users onto this feature seamlessly is one of the biggest considerations I have.

I don't see how distrusting pre-existing images on first kubelet startup with this new code could ever be compatible with current behavior, unless we also require explicit configuration of which pre-existing images the admin intends to be available without auth. That seems really fiddly / difficult to make the admin specify, and I hadn't seen any discussion of that, so I was assuming nothing like that was planned.

Copy link
Member

@pacoxu pacoxu Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. now, users cannot enable this with kubernetes v1.30 or before.
  2. for instance, in v1.31, we add the FG, and we enable the cache by default, but not use it. FG is alpha and disabled by default. But the cache saving is enabled.
  • if an user want to use it, the user should know the behavior change. Without cache, the behavior is not consistent.
  1. in next releases(if we support upgrading node from 1.31 to 1.34), we may enable the FG by default in 1.34 or later. Then users upgrade from a version which has the cache.

Just an idea: not sure if it can fix the issue.

Anyway. If a pod is started before the cluster is upgraded to a version with the cache. This is still a tough issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user can turn off this feature if they're sensitive to registry unavailability enough to prepopulate the node

We should not make users choose between availability and security.

Once the feature is stable, what is the use case for turning it off? PullImageSecretRecheckPeriod could have some reasonable default like 1 day to cover the majority of use cases.

If we were building this stuff from scratch today, I don't think we would ever let pods pull images that require auth without making sure that they had access to the associated image pull secret. Just because the image is present on the node doesn't mean that a pod is allowed to use it.

@cici37
Copy link
Contributor

cici37 commented Jul 18, 2024

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 18, 2024
}
}()
}
byteData, err := utiljson.Marshal(m.ensureSecretPulledImages)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is effectively an API the kubelet has to maintain compatibility with now, how do we ensure we stay compatible with this format, and version it if needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think defining a proper structure for the image state manager's cache might be a good idea, I would work on it

}
err = utiljson.Unmarshal(byteData, &m.ensureSecretPulledImages)
if err != nil {
klog.ErrorS(err, "Failed to unmarshal /var/lib/kubelet/image_state_manager file data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return an error rather than logging and let the caller decide what to do with the error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we error here, what do we do? hard fail the kubelet? delete the bad file? if we delete the record of which images the kubelet used auth to pull, what happens to how we treat all existing images? as available without authentication or as needing authentication?

see previous issues where bad checkpoints from other features broke kubelet bringup

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the worst case scenario, we end up pulling the image again for the existing images capturing the metadata again and populating the file.

}
m.fsLock.Lock()
defer m.fsLock.Unlock()
err = os.WriteFile(imageStateManagerPath, byteData, 0644)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writing directly to the file is not safe if the kubelet crashes or the write fails halfway, see #125648 as an example of writing safely

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I would work on it

if m.pullImageSecretRecheck {
m.lock.Lock()
defer m.lock.Unlock()
// Get the ImageRef after the Image Pull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have to record the fact that the kubelet used auth to pull the image before pulling? if the kubelet crashes here, won't it have already pulled the image and not recorded the fact that auth was required?

It seems like there's three steps here:

  1. record kubelet using auth to pull a not-yet-existing image (and write to disk)
  2. try to pull the image
  3. record the success of the pull and the hash of the credentials used to pull (and write to disk)

Then if 2 succeeds and we crash before 3 happens, the kubelet does a safe thing on restart (knows to treat the existing image as needing auth verified)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we are not pulling the image here ,instead we perform the Get operation where it returns the image reference in case if the image is present locally on the machine.
This info is required just to decide if we have to perform a Pull again or not. So, we need not add mechanism to record any events related to pull right?
Am I missing something here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @liggitt, keeping a list of in progress pull attempts is nec. to manage a disconnect between 2 and 3..

return imagePullResult.imageRef, "", nil
}

func (m *imageManager) storeEnsureSecretPulledImagesData() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like there's two pieces of info being persisted:

  1. the fact that kubelet pulled a given image using auth - this should persist as long as the image exists on the node (when we GC the image, we can forget we pulled with auth at that point, I guess).
  2. the set of auth hashes used to successfully pull a given image - this can expire / timeout as determined by the pullImageSecretRecheckPeriod

1 should be <= the number of images on the node, and should persist, not timeout

2 will have to write every time we pull an authenticated image with a newly seen set of credentials, and can timeout. With many images / many pulls / many creds, putting all of these in a single file might rewrite a big file frequently. Did we consider splitting this to a file per image?

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 22, 2024
mikebrow and others added 4 commits July 24, 2024 00:44
…on in

Signed-off-by: Mike Brown <[email protected]>
Signed-off-by: Paco Xu <[email protected]>
Signed-off-by: Sai Ramesh Vanka <[email protected]>
… is enabled, do recheck logic

- if recheck is enabeld, recheck no auth image(pullCredentialsHash=="") as well
Based on the proposal of the ensure secret pulled image feature,
two fields in the kubelet configuration have to be defined
Reference: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2535-ensure-secret-pulled-images#proposal

Signed-off-by: Sai Ramesh Vanka <[email protected]>
@fedebongio
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 23, 2024
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 23, 2024
// pulled the image successfully. Otherwise, pod B could use pod A's images
// without auth. So in this case, pull the image again even if it is present but not ensured
if pullImageSecretRecheck && !ensuredBySecret {
return true // noting here that old behaviour returns false in this case indicating the image should not be pulled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the behavior change part.

}

// if the image is in the ensured secret pulled image list, it is pulled by secret
pulledBySecret = m.ensureSecretPulledImages[imageRef] != nil
Copy link
Member

@pacoxu pacoxu Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. now, users cannot enable this with kubernetes v1.30 or before.
  2. for instance, in v1.31, we add the FG, and we enable the cache by default, but not use it. FG is alpha and disabled by default. But the cache saving is enabled.
  • if an user want to use it, the user should know the behavior change. Without cache, the behavior is not consistent.
  1. in next releases(if we support upgrading node from 1.31 to 1.34), we may enable the FG by default in 1.34 or later. Then users upgrade from a version which has the cache.

Just an idea: not sure if it can fix the issue.

Anyway. If a pod is started before the cluster is upgraded to a version with the cache. This is still a tough issue.

@k8s-ci-robot
Copy link
Contributor

@sairameshv: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-e2e-containerd 9398aa6 link true /test pull-kubernetes-node-e2e-containerd
pull-kubernetes-integration 9398aa6 link true /test pull-kubernetes-integration

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@cici37
Copy link
Contributor

cici37 commented Jul 25, 2024

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 25, 2024
klog.ErrorS(err, "Failed to hash auth", "auth", auth)
continue
}
m.lock.RLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you haven't released the write lock yet

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 5, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@liggitt
Copy link
Member

liggitt commented Oct 10, 2024

moving to API review backlog for re-implementation based on 1.32 design updates in kubernetes/enhancements#4789

@liggitt
Copy link
Member

liggitt commented Oct 18, 2024

/close

in favor of 1.32 implementation of design updates in kubernetes/enhancements#4789 - #128152

@k8s-ci-robot
Copy link
Contributor

@liggitt: Closed this PR.

In response to this:

/close

in favor of 1.32 implementation of design updates in kubernetes/enhancements#4789 - #128152

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. area/code-generation area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Ensure secret pulled images