Skip to content

Conversation

@savirg
Copy link

@savirg savirg commented Aug 22, 2025

  • One-line PR description: Enable defining a default StorageClass for each specific volume access mode
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 22, 2025
@k8s-ci-robot k8s-ci-robot requested a review from saad-ali August 22, 2025 05:28
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Aug 22, 2025
@k8s-ci-robot k8s-ci-robot requested a review from xing-yang August 22, 2025 05:28
@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 22, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @savirg!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 22, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @savirg. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 22, 2025
@savirg savirg force-pushed the default-sc-for-mode branch 2 times, most recently from 427bfa3 to d326612 Compare August 22, 2025 05:42
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 22, 2025
@msau42
Copy link
Member

msau42 commented Aug 22, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 22, 2025
@msau42
Copy link
Member

msau42 commented Aug 22, 2025

/assign

@savirg savirg force-pushed the default-sc-for-mode branch from d326612 to 938eecc Compare August 25, 2025 07:38
@msau42
Copy link
Member

msau42 commented Sep 9, 2025

/retest

If e2e tests are not necessary or useful, explain why.
-->

For the **Alpha** release, a focused suite of end-to-end (e2e) tests will be added to `test/e2e/storage`. These tests are designed to validate the complete user journey, from `PersistentVolumeClaim` (PVC) creation to successful `PersistentVolume` (PV) provisioning and binding, ensuring the feature works correctly in a fully functional cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge with e2e tests is different environments won't have the supported storage providers to test the full provisioning path. And changing the default storageclasses may not be allowed in some environments. I think for e2e tests, we need to leave it to the cluster provider to install appropriate storageclasses, and in the tests, if the default storageclasses are set, then we trigger the tests.


#### Kubernetes Mutating Admission Policies (CEL)

This approach would leverage the built-in `MutatingAdmissionPolicy` feature, which uses the Common Expression Language (CEL) to define policies that can modify objects during admission. An administrator would create a `MutatingAdmissionPolicy` resource that inspects the `accessModes` of an incoming PVC and mutates the `storageClassName` field if it is empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is actually a bit more complex. In order to choose which storageClassName to set, we typically have to iterate over all StorageClasses to find the ones with the right annotations and also consider all the priority rules that you outlined earlier. CEL cannot do this kind of logic, so we would need to implement a controller that evaluates the PVC and the StorageClasses, and then associate a ConfigMap to the PVC with the final result.

@savirg savirg force-pushed the default-sc-for-mode branch 5 times, most recently from a0c6e99 to dad4b72 Compare September 17, 2025 16:15
2. **Sorting and Selection:** The list of candidates is sorted using two criteria:
* **Primary Sort Key:** The hard-coded access mode priority order (`ReadWriteMany` > `ReadOnlyMany` > `ReadWriteOnce` > `ReadWriteOncePod`).
* **Secondary Sort Key:** The `metadata.creationTimestamp` of the `StorageClass` (newer first).
3. **Return Value or Fallback:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if:

  1. Multiple StorageClasses are set as the default for the same access mode?
  2. If no default is set for a given access mode, does it fail back to the existing default storage class?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If multiple StorageClass objects are annotated as the default for the same access mode, the system will select the one with the most recent metadata.creationTimestamp as the default for that mode.
  2. If no default is set for a given access mode, the system falls back to the existing global default StorageClass (the one annotated with storageclass.kubernetes.io/is-default-class: "true"). If neither a mode-specific nor a global default exists, PVC provisioning will fail.

@sunnylovestiramisu
Copy link
Contributor

sunnylovestiramisu commented Sep 22, 2025

Users now need to understand not only which StorageClasses exist, but also which are defaults for which access modes. This adds cognitive load and increases the chance of using the "wrong" storage unintentionally. Users may inadvertently consume more expensive storage if the default changes, impacting cost controls.

Cluster admins must ensure that defaults are kept up to date and that only one default exists per access mode at any time. Changing defaults could cause a race where, briefly, no default exists, or more than one default exists.

Documentation and tooling must be updated to surface this mapping clearly(update k8s public documentation etc.)

@savirg savirg force-pushed the default-sc-for-mode branch from dad4b72 to 3a0c037 Compare September 23, 2025 06:26
@savirg
Copy link
Author

savirg commented Sep 26, 2025

/assign @deads2k

@Madhu-1
Copy link

Madhu-1 commented Sep 26, 2025

Is this only for the volume access mode, any plan to include the same with combination with volume Mode as well? fileSystem+RWX select one storageclass and blockMode+RWX another storageclass?

@sunnylovestiramisu
Copy link
Contributor

Hmm that is a feature request. But

  • volumeMode defines how a volume is consumed by a pod, either as a pre-formatted filesystem or as a raw block device.
  • accessMode (e.g., ReadWriteOnce, ReadOnlyMany, ReadWriteMany) dictates the access policy, defining how many pods can access a volume and from how many nodes.

Keeping them separate will reduce the complexity in implementation and less confusion for users about which StorageClass is used. Also the logic for how to select a StorageClass based on a complex set of custom rules is a perfect example of a workflow. Different organizations have vastly different needs, how do you suggest us to address the rules or even orders for all the providers?

A better approach to address this use case is probably using a custom mutating controller can modify the object in the request.

@Madhu-1
Copy link

Madhu-1 commented Sep 29, 2025

Hmm that is a feature request. But

  • volumeMode defines how a volume is consumed by a pod, either as a pre-formatted filesystem or as a raw block device.
  • accessMode (e.g., ReadWriteOnce, ReadOnlyMany, ReadWriteMany) dictates the access policy, defining how many pods can access a volume and from how many nodes.

Keeping them separate will reduce the complexity in implementation and less confusion for users about which StorageClass is used. Also the logic for how to select a StorageClass based on a complex set of custom rules is a perfect example of a workflow. Different organizations have vastly different needs, how do you suggest us to address the rules or even orders for all the providers?

A better approach to address this use case is probably using a custom mutating controller can modify the object in the request.

@sunnylovestiramisu I understand that this is a new feature request, and I was exploring whether it's possible to target this scenario. However, I also recognize the complexity it could introduce. Currently, the annotation in this PR is kept fairly generic, so extending it further might be challenging unless we consider making the annotation key more specific to the accessMode.

As mentioned in the earlier use case, a similar one could involve using RWX volumes for things like registry/logging (NFS), or for virtual machines (block storage). In these scenarios, both volumeMode and accessMode become important factors to consider together.


**This design will not impact the current quota calculation.** The mechanism for resource quota enforcement on PVCs remains unchanged; only the defaulting logic for `storageClassName` is affected.

The proposed annotation is `storageclass.kubernetes.io/is-default-class-for-mode`. A cluster administrator can apply this annotation to multiple `StorageClass` objects, each with a different access mode value, such as `"ReadWriteOnce"`, `"ReadWriteMany"`, `"ReadOnlyMany"`, or `"ReadWriteOncePod"`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this on slack but annotations are usually discouraged as supported apis.

Why not make this an actual API field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an alternatives session of:

  1. Standard Field - e.g. adding a field like defaultForAccessMode: ReadWriteOnce to a StorageClass
  2. Priority + Field Selector - instead of a simple field, you assign priorities to resources and use field selectors to pick the “highest-priority” matching resource. e.g. StorageClasses could have a priority and the controller can pick the highest priority
  3. Priority + Field Selector + Namespace Selector - same as 2 but it also considers namespaces

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added these in the alternatives section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed it at the sig-storage meeting today, please see this note.

Using a standard field will increase the complexity if we account for backward compatibility(need to support it in the existing default storage class as well, and we need to support annotation approach forever anyway). And at the community meeting, everyone is okay with the annotation approach.

For future plans like if we will do storage API v2, we can revisit the standard field approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the main reason NOT to use an MAP is ordering WRT the built-in admission, is that right? Is there another reason?

Could we add AccessModes []core.PersistentVolumeAccessMode to StorageClass and then choose a default based on the existing annotation? After writing this out, I guess the problem is if you have two block-ish classes which both support RO, RWO, and RWOP,and each wants to be the default for a different set of modes, that's ambiguous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAP gives the cluster owner ALL the power to do arbitrary default assignment, as long as the built-in admission control doesn't set a value before MAP gets there (yet another problem with built-in things vs. extensions). It seems like the almost-perfect solution. As long as you don't elect one of the StorageClass objects as the default, the built-in is a no-op, so you can set it based on any criteria you want, right?

What am I missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the migration story for most service providers that actually have default storage class set on the cluster? Also how can MAP list all the available storage classes so that we can pick the right one(instead of hard code with a known name)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the migration story for most service providers that actually have default storage class set

Take it off? It's just an annotation.

how can MAP list all the available storage classes so that we can pick the right one(instead of hard code with a known name)?

Maybe I don't follow - how is this different that hard-coding one with a known name and sticking an annotation on it? I thinking of an expression like:

has(spec.accessModes) && 
    spec.accessModes[0] == "ReadWriteOnce" ? "blockish-class" :
    spec.accessModes[0] == "ReadWriteMany" ? "nfsish-class" :
    ""

Copy link
Contributor

@sunnylovestiramisu sunnylovestiramisu Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if customer wants to change the default storage class based on access modes? Because if GKE as a service is specifying this in an MAP, the customer will need to delete that default storage class in the GKE cluster and recreate one with the same name?

Currently at least for the annotation, they can change the annotation to a new storage class they prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if GKE as a service is specifying this in an MAP, the customer will need to delete that default storage class in the GKE cluster and recreate one with the same name?

I'm not following what you mean. If the cluster provider (GKE) provides the MAP, either they have to let the user take ownership of it (including deleting it!) or they have to assert that users can't do that (e.g. by recreating or re-setting it). Is it worthwhile to jump on a slack or something?

@kannon92
Copy link
Contributor

PRR shadow:

I didn't see anything outstanding on PRR for alpha.

I still have a question on #5492 (comment) but I don't think that impacts PRR.

/assign @deads2k

@msau42
Copy link
Member

msau42 commented Sep 30, 2025

Is this only for the volume access mode, any plan to include the same with combination with volume Mode as well? fileSystem+RWX select one storageclass and blockMode+RWX another storageclass?

We discussed this briefly and the current thought is that blockMode is an advanced use case that we wouldn't expect most stateful workloads to use, and the workloads that would use it would have very specific storage requirements and would likely need their own dedicated storageclasses.

@savirg savirg force-pushed the default-sc-for-mode branch from 3a0c037 to 618ff4e Compare October 2, 2025 08:03

**This design will not impact the current quota calculation.** The mechanism for resource quota enforcement on PVCs remains unchanged; only the defaulting logic for `storageClassName` is affected.

The proposed annotation is `storageclass.kubernetes.io/is-default-class-for-mode`. A cluster administrator can apply this annotation to multiple `StorageClass` objects, each with a different access mode value, such as `"ReadWriteOnce"`, `"ReadWriteMany"`, `"ReadOnlyMany"`, or `"ReadWriteOncePod"`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to call storageclass.kubernetes.io/is-default-class-for-mode as storageclass.kubernetes.io/is-default-class-for-access-mode so that we can extend for volume mode something like storageclass.kubernetes.io/is-default-class-for-volume-mode in future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should change the naming to is-default-class-for-access-mode to be more specific.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, updated the annotation to is-default-class-for-access-mode.

@savirg savirg force-pushed the default-sc-for-mode branch from 618ff4e to c9d1cd8 Compare October 2, 2025 08:12

---

### Rationale for Annotation-Based Approach
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, @erictune ran into this with his first gang scheduling proposal.

Annotations are hard to deprecate and we don't provide any protection against mutating annotations.

I'm not as knowledgeable in storage classes but most features that originall propose an annotation or label eventually switch to a proper API as we can better guarantee defaulting and mutating logic around an API field.

cc @thockin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the cluster default StorageClass is marked by an annotation already:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
...

This proposal seems to use the same method to add more options. With the current annotation already well understood by cluster administrators, the new annotation looks simple enough to get adopted.

@savirg, maybe you can mention more clearly that the design is based on the existing storageclass.kubernetes.io/is-default-class annotation?

To me this looks really useful. Our Ceph-CSI users sometimes pick the wrong type of storage because there is little guidance. With this, users need to care less and will likely make fewer mistakes. (Commonly users pick CephFS, an NFS-like type, for workloads that are better suited on RBD, a local-network-disk type. Typically RWX vs RWO.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll leave this to api-review and sigs.

Thanks for adding the alternatives section!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this during the sig-storage meeting on Oct 2, and agreed that the annotation approach is sufficient given that it is an extension of the existing default StorageClass feature which is annotation-based. Changing it to a field would require us to also add support to the existing feature and add additional complexity of backwards compatibility and handling conflicts if both field and annotation are specified, as well as the cognitive overhead for existing users as well.

We agreed that if we do a v2 of the API we can move to a field at that time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nixpanic! I've updated the doc to make it clearer that the design is based on the existing storageclass.kubernetes.io/is-default-class annotation.

@savirg savirg force-pushed the default-sc-for-mode branch from c9d1cd8 to 8dc6829 Compare October 6, 2025 06:22
Comment on lines +833 to +834
* **Requires an API Change:** Adding a new field to a stable API requires a full review, versioning, and careful backward compatibility considerations. This process is slow and has a high bar for acceptance.
* **Slower Iteration:** Changes to core APIs take longer to design, approve, and roll out. Any future improvements or fixes would also be slow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same care is supposed to happen for annotations.

cc @liggitt

* **Cons:**
* **Requires an API Change:** Adding a new field to a stable API requires a full review, versioning, and careful backward compatibility considerations. This process is slow and has a high bar for acceptance.
* **Slower Iteration:** Changes to core APIs take longer to design, approve, and roll out. Any future improvements or fixes would also be slow.
* **Upgrade/Downgrade Complexity:** Introducing new fields can complicate cluster upgrades and downgrades, especially if components run different versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotations introduce a unique problem of field-squatting, where the annotations have values, but are not honored and then suddenly on an upgrade become active.

@deads2k
Copy link
Contributor

deads2k commented Oct 6, 2025

PRR lgtm, but I'm not really clear on why we'd add more annotation instead of adding fields for what we mean.

@thockin
Copy link
Member

thockin commented Oct 6, 2025

WRT adding fields -- the selection of a default isn't really the purvey of the object itself, it's a policy decision imposed upon it. Adding fields would only be reasonable (IMO) if they are really describing the storage and not the policy.

@deads2k
Copy link
Contributor

deads2k commented Oct 7, 2025

WRT adding fields -- the selection of a default isn't really the purvey of the object itself, it's a policy decision imposed upon it. Adding fields would only be reasonable (IMO) if they are really describing the storage and not the policy.

The thread above with MAP looks nicer than annotations.

@deads2k
Copy link
Contributor

deads2k commented Oct 10, 2025

the PRR is good, so I'll mark that aspect, but MAP sounds like a nicer plan that went beta already.

@deads2k
Copy link
Contributor

deads2k commented Oct 10, 2025

PRR approve

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: deads2k, savirg
Once this PR has been reviewed and has the lgtm label, please ask for approval from msau42. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@thockin
Copy link
Member

thockin commented Oct 15, 2025

I know there was some work to prove out MAP - can we get an update here?

@savirg
Copy link
Author

savirg commented Oct 16, 2025

We are going to explore MAP solution for now. So, won't be merging this enhancement doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants