-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5489: Enable defining a default StorageClass for each specific volume access mode #5492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
savirg
commented
Aug 22, 2025
- One-line PR description: Enable defining a default StorageClass for each specific volume access mode
- Issue link: Enable defining a default StorageClass for each specific volume access mode #5489
- Other comments:
|
Welcome @savirg! |
|
Hi @savirg. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
427bfa3 to
d326612
Compare
|
/ok-to-test |
|
/assign |
keps/sig-storage/5489-default-storage-class-per-access-mode/README.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/5489-default-storage-class-per-access-mode/kep.yaml
Outdated
Show resolved
Hide resolved
keps/sig-storage/5489-default-storage-class-per-access-mode/kep.yaml
Outdated
Show resolved
Hide resolved
d326612 to
938eecc
Compare
|
/retest |
| If e2e tests are not necessary or useful, explain why. | ||
| --> | ||
|
|
||
| For the **Alpha** release, a focused suite of end-to-end (e2e) tests will be added to `test/e2e/storage`. These tests are designed to validate the complete user journey, from `PersistentVolumeClaim` (PVC) creation to successful `PersistentVolume` (PV) provisioning and binding, ensuring the feature works correctly in a fully functional cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The challenge with e2e tests is different environments won't have the supported storage providers to test the full provisioning path. And changing the default storageclasses may not be allowed in some environments. I think for e2e tests, we need to leave it to the cluster provider to install appropriate storageclasses, and in the tests, if the default storageclasses are set, then we trigger the tests.
|
|
||
| #### Kubernetes Mutating Admission Policies (CEL) | ||
|
|
||
| This approach would leverage the built-in `MutatingAdmissionPolicy` feature, which uses the Common Expression Language (CEL) to define policies that can modify objects during admission. An administrator would create a `MutatingAdmissionPolicy` resource that inspects the `accessModes` of an incoming PVC and mutates the `storageClassName` field if it is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is actually a bit more complex. In order to choose which storageClassName to set, we typically have to iterate over all StorageClasses to find the ones with the right annotations and also consider all the priority rules that you outlined earlier. CEL cannot do this kind of logic, so we would need to implement a controller that evaluates the PVC and the StorageClasses, and then associate a ConfigMap to the PVC with the final result.
a0c6e99 to
dad4b72
Compare
| 2. **Sorting and Selection:** The list of candidates is sorted using two criteria: | ||
| * **Primary Sort Key:** The hard-coded access mode priority order (`ReadWriteMany` > `ReadOnlyMany` > `ReadWriteOnce` > `ReadWriteOncePod`). | ||
| * **Secondary Sort Key:** The `metadata.creationTimestamp` of the `StorageClass` (newer first). | ||
| 3. **Return Value or Fallback:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if:
- Multiple StorageClasses are set as the default for the same access mode?
- If no default is set for a given access mode, does it fail back to the existing default storage class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If multiple StorageClass objects are annotated as the default for the same access mode, the system will select the one with the most recent metadata.creationTimestamp as the default for that mode.
- If no default is set for a given access mode, the system falls back to the existing global default StorageClass (the one annotated with storageclass.kubernetes.io/is-default-class: "true"). If neither a mode-specific nor a global default exists, PVC provisioning will fail.
|
Users now need to understand not only which StorageClasses exist, but also which are defaults for which access modes. This adds cognitive load and increases the chance of using the "wrong" storage unintentionally. Users may inadvertently consume more expensive storage if the default changes, impacting cost controls. Cluster admins must ensure that defaults are kept up to date and that only one default exists per access mode at any time. Changing defaults could cause a race where, briefly, no default exists, or more than one default exists. Documentation and tooling must be updated to surface this mapping clearly(update k8s public documentation etc.) |
dad4b72 to
3a0c037
Compare
|
/assign @deads2k |
|
Is this only for the volume access mode, any plan to include the same with combination with volume Mode as well? fileSystem+RWX select one storageclass and blockMode+RWX another storageclass? |
|
Hmm that is a feature request. But
Keeping them separate will reduce the complexity in implementation and less confusion for users about which StorageClass is used. Also the logic for how to select a StorageClass based on a complex set of custom rules is a perfect example of a workflow. Different organizations have vastly different needs, how do you suggest us to address the rules or even orders for all the providers? A better approach to address this use case is probably using a custom mutating controller can modify the object in the request. |
@sunnylovestiramisu I understand that this is a new feature request, and I was exploring whether it's possible to target this scenario. However, I also recognize the complexity it could introduce. Currently, the annotation in this PR is kept fairly generic, so extending it further might be challenging unless we consider making the annotation key more specific to the accessMode. As mentioned in the earlier use case, a similar one could involve using RWX volumes for things like registry/logging (NFS), or for virtual machines (block storage). In these scenarios, both volumeMode and accessMode become important factors to consider together. |
keps/sig-storage/5489-default-storage-class-per-access-mode/README.md
Outdated
Show resolved
Hide resolved
|
|
||
| **This design will not impact the current quota calculation.** The mechanism for resource quota enforcement on PVCs remains unchanged; only the defaulting logic for `storageClassName` is affected. | ||
|
|
||
| The proposed annotation is `storageclass.kubernetes.io/is-default-class-for-mode`. A cluster administrator can apply this annotation to multiple `StorageClass` objects, each with a different access mode value, such as `"ReadWriteOnce"`, `"ReadWriteMany"`, `"ReadOnlyMany"`, or `"ReadWriteOncePod"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this on slack but annotations are usually discouraged as supported apis.
Why not make this an actual API field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an alternatives session of:
- Standard Field - e.g. adding a field like defaultForAccessMode: ReadWriteOnce to a StorageClass
- Priority + Field Selector - instead of a simple field, you assign priorities to resources and use field selectors to pick the “highest-priority” matching resource. e.g. StorageClasses could have a priority and the controller can pick the highest priority
- Priority + Field Selector + Namespace Selector - same as 2 but it also considers namespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added these in the alternatives section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed it at the sig-storage meeting today, please see this note.
Using a standard field will increase the complexity if we account for backward compatibility(need to support it in the existing default storage class as well, and we need to support annotation approach forever anyway). And at the community meeting, everyone is okay with the annotation approach.
For future plans like if we will do storage API v2, we can revisit the standard field approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the main reason NOT to use an MAP is ordering WRT the built-in admission, is that right? Is there another reason?
Could we add AccessModes []core.PersistentVolumeAccessMode to StorageClass and then choose a default based on the existing annotation? After writing this out, I guess the problem is if you have two block-ish classes which both support RO, RWO, and RWOP,and each wants to be the default for a different set of modes, that's ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MAP gives the cluster owner ALL the power to do arbitrary default assignment, as long as the built-in admission control doesn't set a value before MAP gets there (yet another problem with built-in things vs. extensions). It seems like the almost-perfect solution. As long as you don't elect one of the StorageClass objects as the default, the built-in is a no-op, so you can set it based on any criteria you want, right?
What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the migration story for most service providers that actually have default storage class set on the cluster? Also how can MAP list all the available storage classes so that we can pick the right one(instead of hard code with a known name)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the migration story for most service providers that actually have default storage class set
Take it off? It's just an annotation.
how can MAP list all the available storage classes so that we can pick the right one(instead of hard code with a known name)?
Maybe I don't follow - how is this different that hard-coding one with a known name and sticking an annotation on it? I thinking of an expression like:
has(spec.accessModes) &&
spec.accessModes[0] == "ReadWriteOnce" ? "blockish-class" :
spec.accessModes[0] == "ReadWriteMany" ? "nfsish-class" :
""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if customer wants to change the default storage class based on access modes? Because if GKE as a service is specifying this in an MAP, the customer will need to delete that default storage class in the GKE cluster and recreate one with the same name?
Currently at least for the annotation, they can change the annotation to a new storage class they prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if GKE as a service is specifying this in an MAP, the customer will need to delete that default storage class in the GKE cluster and recreate one with the same name?
I'm not following what you mean. If the cluster provider (GKE) provides the MAP, either they have to let the user take ownership of it (including deleting it!) or they have to assert that users can't do that (e.g. by recreating or re-setting it). Is it worthwhile to jump on a slack or something?
|
PRR shadow: I didn't see anything outstanding on PRR for alpha. I still have a question on #5492 (comment) but I don't think that impacts PRR. /assign @deads2k |
We discussed this briefly and the current thought is that blockMode is an advanced use case that we wouldn't expect most stateful workloads to use, and the workloads that would use it would have very specific storage requirements and would likely need their own dedicated storageclasses. |
3a0c037 to
618ff4e
Compare
|
|
||
| **This design will not impact the current quota calculation.** The mechanism for resource quota enforcement on PVCs remains unchanged; only the defaulting logic for `storageClassName` is affected. | ||
|
|
||
| The proposed annotation is `storageclass.kubernetes.io/is-default-class-for-mode`. A cluster administrator can apply this annotation to multiple `StorageClass` objects, each with a different access mode value, such as `"ReadWriteOnce"`, `"ReadWriteMany"`, `"ReadOnlyMany"`, or `"ReadWriteOncePod"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to call storageclass.kubernetes.io/is-default-class-for-mode as storageclass.kubernetes.io/is-default-class-for-access-mode so that we can extend for volume mode something like storageclass.kubernetes.io/is-default-class-for-volume-mode in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should change the naming to is-default-class-for-access-mode to be more specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, updated the annotation to is-default-class-for-access-mode.
618ff4e to
c9d1cd8
Compare
|
|
||
| --- | ||
|
|
||
| ### Rationale for Annotation-Based Approach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, @erictune ran into this with his first gang scheduling proposal.
Annotations are hard to deprecate and we don't provide any protection against mutating annotations.
I'm not as knowledgeable in storage classes but most features that originall propose an annotation or label eventually switch to a proper API as we can better guarantee defaulting and mutating logic around an API field.
cc @thockin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the cluster default StorageClass is marked by an annotation already:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
...This proposal seems to use the same method to add more options. With the current annotation already well understood by cluster administrators, the new annotation looks simple enough to get adopted.
@savirg, maybe you can mention more clearly that the design is based on the existing storageclass.kubernetes.io/is-default-class annotation?
To me this looks really useful. Our Ceph-CSI users sometimes pick the wrong type of storage because there is little guidance. With this, users need to care less and will likely make fewer mistakes. (Commonly users pick CephFS, an NFS-like type, for workloads that are better suited on RBD, a local-network-disk type. Typically RWX vs RWO.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll leave this to api-review and sigs.
Thanks for adding the alternatives section!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this during the sig-storage meeting on Oct 2, and agreed that the annotation approach is sufficient given that it is an extension of the existing default StorageClass feature which is annotation-based. Changing it to a field would require us to also add support to the existing feature and add additional complexity of backwards compatibility and handling conflicts if both field and annotation are specified, as well as the cognitive overhead for existing users as well.
We agreed that if we do a v2 of the API we can move to a field at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nixpanic! I've updated the doc to make it clearer that the design is based on the existing storageclass.kubernetes.io/is-default-class annotation.
c9d1cd8 to
8dc6829
Compare
| * **Requires an API Change:** Adding a new field to a stable API requires a full review, versioning, and careful backward compatibility considerations. This process is slow and has a high bar for acceptance. | ||
| * **Slower Iteration:** Changes to core APIs take longer to design, approve, and roll out. Any future improvements or fixes would also be slow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same care is supposed to happen for annotations.
cc @liggitt
| * **Cons:** | ||
| * **Requires an API Change:** Adding a new field to a stable API requires a full review, versioning, and careful backward compatibility considerations. This process is slow and has a high bar for acceptance. | ||
| * **Slower Iteration:** Changes to core APIs take longer to design, approve, and roll out. Any future improvements or fixes would also be slow. | ||
| * **Upgrade/Downgrade Complexity:** Introducing new fields can complicate cluster upgrades and downgrades, especially if components run different versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotations introduce a unique problem of field-squatting, where the annotations have values, but are not honored and then suddenly on an upgrade become active.
|
PRR lgtm, but I'm not really clear on why we'd add more annotation instead of adding fields for what we mean. |
|
WRT adding fields -- the selection of a default isn't really the purvey of the object itself, it's a policy decision imposed upon it. Adding fields would only be reasonable (IMO) if they are really describing the storage and not the policy. |
The thread above with MAP looks nicer than annotations. |
|
the PRR is good, so I'll mark that aspect, but MAP sounds like a nicer plan that went beta already. |
|
PRR approve /approve |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: deads2k, savirg The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
I know there was some work to prove out MAP - can we get an update here? |
|
We are going to explore MAP solution for now. So, won't be merging this enhancement doc. |