-
Notifications
You must be signed in to change notification settings - Fork 41.7k
Implement KEP-2535 - Ensure secret pulled images feature #125817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
|
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cc @haircommander |
|
let's add APIs and functionality and test together please, or we may end in inconsistent states |
Ack, Sure @aojea ! |
408cc2b to
fd218d4
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sairameshv The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test all |
| } | ||
|
|
||
| // if the image is in the ensured secret pulled image list, it is pulled by secret | ||
| pulledBySecret = m.ensureSecretPulledImages[imageRef] != nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay one remaining structural concern: how can the kubelet differentiate between image that was present at startup and image that was previously pulled without a secret? currently, if we're checking just this map, then we'd assume that all images currently present on the node (before the kubelet initialized the cache) would be pulled without auth--which may not be true. we similarly have to be careful about deleting an entry from the map, as we don't want to accidentally report it as an image that has no auth.
I am thinking we need to store an entry if we successfully pull without auth, and use some signifier for such a situation in the ensureSecretPulledImages map (maybe an empty string in the Auths map)?
WDYT @sairameshv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay one remaining structural concern: how can the kubelet differentiate between image that was present at startup and image that was previously pulled without a secret?
This was exactly the issue with the attempt in 1.30, and what made the previous attempt slip. Can I get a summary of how the approach has changed to address that issue (which in my mind is the main problem to be solved with this feature).
distinguishing between these is important and we have to be clear about how we are doing it without regressing existing users who preload images to be used without auth to be robust against public registry unavailability:
- present at kubelet startup, not pulled with auth (preloaded, expected to be usable by all pods)
- present at kubelet startup, previously pulled with auth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distinguishing between these is important and we have to be clear about how we are doing it without regressing existing users who preload images to be used without auth to be robust against public registry unavailability:
present at kubelet startup, not pulled with auth (preloaded, expected to be usable by all pods)
I actually think we should not support this case. A user can turn off this feature if they're sensitive to registry unavailability enough to prepopulate the node. I think we should always treat a cache miss the same: repull the image. I think in the future, we can investigate ways to expose image management as a higher level api (kubectl image pull...) or something to allow users to prepopulate a node before an upgrade without needing to run pods to have the kubelet be aware of the image. but for now, I don't think we should support this case with this feature. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one difference between this implementation and the last is we reintroduced kubelet saving the cache, so we can actually solve the
present at kubelet startup, previously pulled with auth
case, as well as we use PullImageSecretRecheckPeriod == 0 to say "never delete the cache", where it was previously used to set PullImageSecretRecheck=false. This means if an image was ever pulled by the kubelet, we'll continue to track it and be able to protect against registry flakes on repulls with the same auth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking we need to store an entry if we successfully pull without auth, and use some signifier for such a situation in the ensureSecretPulledImages map (maybe an empty string in the Auths map)?
Currently, we enforced a check to store the metadata of the images only if they have auth component here
present at kubelet startup, previously pulled with auth
I think we have a solution with this implementation here. We do get the Image reference again of an existing image and check the auth component and store it into the cache while performing this EnsureImageExists check during the container start.
So, we don't have a problem if a new container is about to start using an existing image that is pre-populated on the node. Does that makes sense @haircommander ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could get away with what you're describing, but it makes the writing/reading of the cache from disk more important. If the read fails or the file is corrupted, we'd treat the present images as "prepulled". I guess my point is the only way to ensure we always re-auth when we need to is have the internal representation of the cache be the source of truth, and distrust any image not present there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're not worried about that case, then I am not feeling super stubborn about it. But I think it simplifies the logic if we hold that invariant true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my point is the only way to ensure we always re-auth when we need to is have the internal representation of the cache be the source of truth, and distrust any image not present there.
How we transition existing users onto this feature seamlessly is one of the biggest considerations I have.
I don't see how distrusting pre-existing images on first kubelet startup with this new code could ever be compatible with current behavior, unless we also require explicit configuration of which pre-existing images the admin intends to be available without auth. That seems really fiddly / difficult to make the admin specify, and I hadn't seen any discussion of that, so I was assuming nothing like that was planned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- now, users cannot enable this with kubernetes v1.30 or before.
- for instance, in v1.31, we add the FG, and we enable the cache by default, but not use it. FG is alpha and disabled by default. But the cache saving is enabled.
- if an user want to use it, the user should know the behavior change. Without cache, the behavior is not consistent.
- in next releases(if we support upgrading node from 1.31 to 1.34), we may enable the FG by default in 1.34 or later. Then users upgrade from a version which has the cache.
Just an idea: not sure if it can fix the issue.
Anyway. If a pod is started before the cluster is upgraded to a version with the cache. This is still a tough issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user can turn off this feature if they're sensitive to registry unavailability enough to prepopulate the node
We should not make users choose between availability and security.
Once the feature is stable, what is the use case for turning it off? PullImageSecretRecheckPeriod could have some reasonable default like 1 day to cover the majority of use cases.
If we were building this stuff from scratch today, I don't think we would ever let pods pull images that require auth without making sure that they had access to the associated image pull secret. Just because the image is present on the node doesn't mean that a pod is allowed to use it.
|
/remove-sig api-machinery |
| } | ||
| }() | ||
| } | ||
| byteData, err := utiljson.Marshal(m.ensureSecretPulledImages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is effectively an API the kubelet has to maintain compatibility with now, how do we ensure we stay compatible with this format, and version it if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think defining a proper structure for the image state manager's cache might be a good idea, I would work on it
pkg/kubelet/images/image_manager.go
Outdated
| } | ||
| err = utiljson.Unmarshal(byteData, &m.ensureSecretPulledImages) | ||
| if err != nil { | ||
| klog.ErrorS(err, "Failed to unmarshal /var/lib/kubelet/image_state_manager file data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return an error rather than logging and let the caller decide what to do with the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we error here, what do we do? hard fail the kubelet? delete the bad file? if we delete the record of which images the kubelet used auth to pull, what happens to how we treat all existing images? as available without authentication or as needing authentication?
see previous issues where bad checkpoints from other features broke kubelet bringup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the worst case scenario, we end up pulling the image again for the existing images capturing the metadata again and populating the file.
pkg/kubelet/images/image_manager.go
Outdated
| } | ||
| m.fsLock.Lock() | ||
| defer m.fsLock.Unlock() | ||
| err = os.WriteFile(imageStateManagerPath, byteData, 0644) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writing directly to the file is not safe if the kubelet crashes or the write fails halfway, see #125648 as an example of writing safely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I would work on it
pkg/kubelet/images/image_manager.go
Outdated
| if m.pullImageSecretRecheck { | ||
| m.lock.Lock() | ||
| defer m.lock.Unlock() | ||
| // Get the ImageRef after the Image Pull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have to record the fact that the kubelet used auth to pull the image before pulling? if the kubelet crashes here, won't it have already pulled the image and not recorded the fact that auth was required?
It seems like there's three steps here:
- record kubelet using auth to pull a not-yet-existing image (and write to disk)
- try to pull the image
- record the success of the pull and the hash of the credentials used to pull (and write to disk)
Then if 2 succeeds and we crash before 3 happens, the kubelet does a safe thing on restart (knows to treat the existing image as needing auth verified)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we are not pulling the image here ,instead we perform the Get operation where it returns the image reference in case if the image is present locally on the machine.
This info is required just to decide if we have to perform a Pull again or not. So, we need not add mechanism to record any events related to pull right?
Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @liggitt, keeping a list of in progress pull attempts is nec. to manage a disconnect between 2 and 3..
pkg/kubelet/images/image_manager.go
Outdated
| return imagePullResult.imageRef, "", nil | ||
| } | ||
|
|
||
| func (m *imageManager) storeEnsureSecretPulledImagesData() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like there's two pieces of info being persisted:
- the fact that kubelet pulled a given image using auth - this should persist as long as the image exists on the node (when we GC the image, we can forget we pulled with auth at that point, I guess).
- the set of auth hashes used to successfully pull a given image - this can expire / timeout as determined by the pullImageSecretRecheckPeriod
1 should be <= the number of images on the node, and should persist, not timeout
2 will have to write every time we pull an authenticated image with a newly seen set of credentials, and can timeout. With many images / many pulls / many creds, putting all of these in a single file might rewrite a big file frequently. Did we consider splitting this to a file per image?
ba335de to
934cc9f
Compare
…on in Signed-off-by: Mike Brown <[email protected]> Signed-off-by: Paco Xu <[email protected]> Signed-off-by: Sai Ramesh Vanka <[email protected]>
…r IfNotPresent Image
… is enabled, do recheck logic - if recheck is enabeld, recheck no auth image(pullCredentialsHash=="") as well
Based on the proposal of the ensure secret pulled image feature, two fields in the kubelet configuration have to be defined Reference: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2535-ensure-secret-pulled-images#proposal Signed-off-by: Sai Ramesh Vanka <[email protected]>
|
/remove-sig api-machinery |
934cc9f to
7e3327c
Compare
| // pulled the image successfully. Otherwise, pod B could use pod A's images | ||
| // without auth. So in this case, pull the image again even if it is present but not ensured | ||
| if pullImageSecretRecheck && !ensuredBySecret { | ||
| return true // noting here that old behaviour returns false in this case indicating the image should not be pulled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the behavior change part.
| } | ||
|
|
||
| // if the image is in the ensured secret pulled image list, it is pulled by secret | ||
| pulledBySecret = m.ensureSecretPulledImages[imageRef] != nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- now, users cannot enable this with kubernetes v1.30 or before.
- for instance, in v1.31, we add the FG, and we enable the cache by default, but not use it. FG is alpha and disabled by default. But the cache saving is enabled.
- if an user want to use it, the user should know the behavior change. Without cache, the behavior is not consistent.
- in next releases(if we support upgrading node from 1.31 to 1.34), we may enable the FG by default in 1.34 or later. Then users upgrade from a version which has the cache.
Just an idea: not sure if it can fix the issue.
Anyway. If a pod is started before the cluster is upgraded to a version with the cache. This is still a tough issue.
7e3327c to
9398aa6
Compare
|
@sairameshv: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/remove-sig api-machinery |
| klog.ErrorS(err, "Failed to hash auth", "auth", auth) | ||
| continue | ||
| } | ||
| m.lock.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you haven't released the write lock yet
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
moving to API review backlog for re-implementation based on 1.32 design updates in kubernetes/enhancements#4789 |
|
/close in favor of 1.32 implementation of design updates in kubernetes/enhancements#4789 - #128152 |
|
@liggitt: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Developed these code changes on top of #114847 and #94899
/cc @pacoxu @mikebrow
Based on the latest proposal of the ensure secret pulled image feature, two fields in the kubelet configuration have been defined
Reference: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2535-ensure-secret-pulled-images#proposal
What type of PR is this?
/kind feature
What this PR does / why we need it:
Extends the existing kubelet configuration with two new fields according to the KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2535-ensure-secret-pulled-images#proposal
Fixes kubernetes/enhancements#2535
Special notes for your reviewer:
As of today, if a pod pulls an image from a private repository with proper credentials, the image gets pulled onto a node successfully. When an another pod on the same node with wrong credentials tries to pull the same image, the container is getting started without an issue which shouldn't be the case. The above mentioned feature in the KEP helps in resolving this issue.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2535-ensure-secret-pulled-images/README.md