-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-3926: handling undecryptable resources #3927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
stlaz
commented
Mar 27, 2023
- One-line PR description: Improve the identification of resources that won't decrypt. Make it possible to remove undecryptable resources by using the API.
- Issue link: Handling undecryptable resources #3926
- Other comments: Enable deleting API objects even when storage-level decryption is not working properly kubernetes#86489
|
kubernetes/kubernetes#116943 shows a proof of concept for the list options proposed in this KEP. |
| // doing. | ||
| // WARNING: Vendors will most likely consider using this option to be breaking the | ||
| // support of their product. | ||
| UnconditionalDeleteWithClusterBreakingPotential bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear what this API lets you skip... I would not expect to be allowed to skip admission or finalizers if the existing persisted object can be decoded successfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - I thikn it should only allow deleting objects (with all the consequences) that we cannot decode. Nothing else should be allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to get our lingo synchronized - I would consider decoding and transforming two separate cases (you'll see why in code ref below).
What about objects we cannot retrieve from storage for other reasons?
The generic storage Get implementation fails in one additional case, that is if key cannot be constructed - https://github.com/kubernetes/kubernetes/blob/3cf9f66e90d560ac080687610933c712bcf37b39/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go#L756-L769
The etcd3 store implementation fails in 6 other cases:
https://github.com/kubernetes/kubernetes/blob/3cf9f66e90d560ac080687610933c712bcf37b39/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L132-L166
Most of the cases indeed seem like errors we may not want to ignore. Should we focus solely on the transformation error, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we focus solely on the transformation error, then?
I don't think so... transformation and decoding failures have almost identical implications, symptoms, and considerations for deleting anyway. Solving both problems together and consistently makes sense to me
|
@liggitt @wojtek-t @dgrisonnet The addressable comments were addressed, please take a look when you get a chance |
| Currently, removing a resource that causes such failures is not possible. | ||
| A cluster administrator must access etcd directly and remove the malformed data manually. | ||
|
|
||
| This KEP proposes a way to identify resources that fail to decrypt, and introduces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the mechanics of this seem to belong more to api-machinery (who owns the list and delete functionality that would be modified as part of this) ... there are multiple reasons a resource could fail to decode from storage, decryption is only one of them (e.g. kubernetes/kubernetes#69579)
| // doing. | ||
| // WARNING: Vendors will most likely consider using this option to be breaking the | ||
| // support of their product. | ||
| UnconditionalDeleteWithClusterBreakingPotential bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we focus solely on the transformation error, then?
I don't think so... transformation and decoding failures have almost identical implications, symptoms, and considerations for deleting anyway. Solving both problems together and consistently makes sense to me
f76d8a5 to
23ff32b
Compare
23ff32b to
5d7de11
Compare
c294537 to
0c0a5f6
Compare
|
I squashed the previous changes and addressed all the most recent comments in the "address comments" commit. |
soltysh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#prr-shadow
lgtm - thx :)
|
/label tide-merge-method-squash |
|
@deads2k: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, stlaz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
0c0a5f6 to
c32d3dc
Compare
|
(I squashed the changes to a single commit in the latest force push) |
|
/label tide/merge-method-squash |
|
|
||
| The unconditional deletion admission: | ||
| 1. checks if a "delete" request contains the `IgnoreStoreReadErrorWithClusterBreakingPotential` option | ||
| 2. if it does, it checks the RBAC of the request's user for the `delete-ignore-read-errors` verb of the given resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we align these two words, the option and the verb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tkashem this is still open in the implementation.
| // doing. | ||
| // WARNING: Vendors will most likely consider using this option to be breaking the | ||
| // support of their product. | ||
| IgnoreStoreReadErrorWithClusterBreakingPotential bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be withoutReadingFromStorage, i.e. a mode of deletion that has nothing to do with errors but is just highly unconditional deletion? (disclaimer: have only read the kep briefly, maybe I missed it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at today's DeleteOptions, this could even be called unconditional or force.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref old thread around this topic #3927 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth to summarize as a non-goal:
- give clients control over skipping other steps of a delete request flow than decoding errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth to summarize as a non-goal:
- give clients control over skipping other steps of a delete request flow than decoding errors
Agree.
|
/lgtm |