@@ -94,17 +94,24 @@ tags, and then generate with `hack/update-toc.sh`.
9494- [ Design Details] ( #design-details )
9595 - [ Kubernetes Changes, Access Mode] ( #kubernetes-changes-access-mode )
9696 - [ Scheduler Enforcement] ( #scheduler-enforcement )
97+ - [ Alpha] ( #alpha )
98+ - [ Beta] ( #beta )
9799 - [ Mount Enforcement] ( #mount-enforcement )
98100 - [ CSI Specification Changes, Volume Capabilities] ( #csi-specification-changes-volume-capabilities )
101+ - [ Supporting In-Tree Drivers] ( #supporting-in-tree-drivers )
99102 - [ Test Plan] ( #test-plan )
103+ - [ Prerequisite testing updates] ( #prerequisite-testing-updates )
104+ - [ Unit tests] ( #unit-tests )
105+ - [ Integration tests] ( #integration-tests )
106+ - [ e2e tests] ( #e2e-tests )
100107 - [ Validation of PersistentVolumeSpec Object] ( #validation-of-persistentvolumespec-object )
101108 - [ Mounting and Mapping with ReadWriteOncePod] ( #mounting-and-mapping-with-readwriteoncepod )
102109 - [ Mounting and Mapping with ReadWriteOnce] ( #mounting-and-mapping-with-readwriteonce )
103110 - [ Mapping Kubernetes Access Modes to CSI Volume Capability Access Modes] ( #mapping-kubernetes-access-modes-to-csi-volume-capability-access-modes )
104111 - [ End to End Tests] ( #end-to-end-tests )
105112 - [ Graduation Criteria] ( #graduation-criteria )
106- - [ Alpha] ( #alpha )
107- - [ Beta] ( #beta )
113+ - [ Alpha] ( #alpha-1 )
114+ - [ Beta] ( #beta-1 )
108115 - [ GA] ( #ga )
109116 - [ Upgrade / Downgrade Strategy] ( #upgrade--downgrade-strategy )
110117 - [ Version Skew Strategy] ( #version-skew-strategy )
@@ -389,6 +396,8 @@ This access mode will be enforced in two places:
389396
390397#### Scheduler Enforcement
391398
399+ ##### Alpha
400+
392401First is at the time a pod is scheduled. When scheduling a pod, if another pod
393402is found using the same PVC and the PVC uses ReadWriteOncePod, then scheduling
394403will fail and the pod will be considered UnschedulableAndUnresolvable.
@@ -407,6 +416,24 @@ marked UnschedulableAndUnresolvable.
407416[ volume restrictions plugin ] : https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/scheduler/framework/plugins/volumerestrictions/volume_restrictions.go#L29
408417[ node info cache ] : https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/scheduler/framework/types.go#L357
409418
419+ ##### Beta
420+
421+ Support for pod preemption is enforced in beta.
422+
423+ When a pod (A) is scheduled, if another pod (B) is found using the same PVC, the
424+ PVC uses ReadWriteOncePod, and pod (A) has higher priority than pod (B), then
425+ return Unschedulable (which will cause pod (B) to be preempted). If pod (A) has
426+ lower or equal priority compared with pod (B), return
427+ UnschedulableAndUnresolvable.
428+
429+ In the PreFilter phase of the volume restrictions scheduler plugin, we will
430+ build a cache of any existing pods and nodes using the ReadWriteOncePod PVCs on
431+ the pod to be scheduled. This cache will be saved as part of the scheduler's
432+ cycleState and forwarded to the following step. During AddPod and RemovePod we
433+ will add or remove references to the target ReadWriteOncePod PVCs to simulate
434+ preemption. During the Filter phase we will check caches for remaining
435+ references to the PVCs and compare their pod priorities if applicable.
436+
410437#### Mount Enforcement
411438
412439As an additional precaution this will also be enforced at the time a volume is
@@ -483,18 +510,21 @@ Put more succinctly:
483510CSI clients that will need updating are kubelet, external-provisioner,
484511external-attacher, and external-resizer.
485512
513+ ### Supporting In-Tree Drivers
514+
515+ In-tree storage drivers implement the [ ` PersistentVolumePlugin ` ] interface which
516+ specifies a list of supported access modes. For beta, we will update drivers to
517+ also accept the ReadWriteOncePod access mode. Additional updates are required to
518+ the CSI migration libraries (per volume type) to account for the new access
519+ mode.
520+
521+ [ `PersistentVolumePlugin` ] : https://github.com/kubernetes/kubernetes/blob/v1.25.2/pkg/volume/plugins.go#L200-L201
522+
486523### Test Plan
487524
488525<!--
489526**Note:** *Not required until targeted at a release.*
490-
491- Consider the following in developing a test plan for this enhancement:
492- - Will there be e2e and integration tests, in addition to unit tests?
493- - How will it be tested in isolation vs with other components?
494-
495- No need to outline all of the test cases, just the general strategy. Anything
496- that would count as tricky in the implementation, and anything particularly
497- challenging to test, should be called out.
527+ The goal is to ensure that we don't accept enhancements with inadequate testing.
498528
499529All code is expected to have adequate tests (eventually with coverage
500530expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
@@ -503,6 +533,96 @@ when drafting this test plan.
503533[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
504534-->
505535
536+ [ X] I/we understand the owners of the involved components may require updates to
537+ existing tests to make this code solid enough prior to committing the changes
538+ necessary to implement this enhancement.
539+
540+ ##### Prerequisite testing updates
541+
542+ <!--
543+ Based on reviewers feedback describe what additional tests need to be added prior
544+ implementing this enhancement to ensure the enhancements have also solid foundations.
545+ -->
546+
547+ None. New tests will be added for the transition to beta to support scheduler
548+ changes.
549+
550+ ##### Unit tests
551+
552+ <!--
553+ In principle every added code should have complete unit test coverage, so providing
554+ the exact set of tests will not bring additional value.
555+ However, if complete unit test coverage is not possible, explain the reason of it
556+ together with explanation why this is acceptable.
557+ -->
558+
559+ <!--
560+ Additionally, for Alpha try to enumerate the core package you will be touching
561+ to implement this enhancement and provide the current unit coverage for those
562+ in the form of:
563+ - <package>: <date> - <current test coverage>
564+ The data can be easily read from:
565+ https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
566+
567+ This can inform certain test coverage improvements that we want to do before
568+ extending the production code to implement this enhancement.
569+ -->
570+
571+ In alpha, the following unit tests were updated. See
572+ https://github.com/kubernetes/kubernetes/pull/102028 and
573+ https://github.com/kubernetes/kubernetes/pull/103082 for more context.
574+
575+ - ` k8s.io/kubernetes/pkg/apis/core/helper ` : ` 09-22-2022 ` - ` 26.2 `
576+ - ` k8s.io/kubernetes/pkg/apis/core/v1/helper ` : ` 09-22-2022 ` - ` 56.9 `
577+ - ` k8s.io/kubernetes/pkg/apis/core/validation ` : ` 09-22-2022 ` - ` 82.3 `
578+ - ` k8s.io/kubernetes/pkg/controller/volume/persistentvolume ` : ` 09-22-2022 ` - ` 79.4 `
579+ - ` k8s.io/kubernetes/pkg/kubelet/volumemanager/cache ` : ` 09-22-2022 ` - ` 66.3 `
580+ - ` k8s.io/kubernetes/pkg/volume/csi/csi_client.go ` : ` 09-22-2022 ` - ` 76.2 `
581+ - ` k8s.io/kubernetes/pkg/scheduler/apis/config/v1beta2 ` : ` 09-22-2022 ` - ` 76.8 `
582+ - ` k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumerestrictions ` : ` 09-22-2022 ` - ` 85 `
583+ - ` k8s.io/kubernetes/pkg/scheduler/framework ` : ` 09-22-2022 ` - ` 77.1 `
584+
585+ In beta, there will be additional unit test coverage for
586+ ` k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumerestrictions ` to cover
587+ preemption logic.
588+
589+ ##### Integration tests
590+
591+ <!--
592+ This question should be filled when targeting a release.
593+ For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
594+
595+ For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
596+ https://storage.googleapis.com/k8s-triage/index.html
597+ -->
598+
599+ ##### e2e tests
600+
601+ <!--
602+ This question should be filled when targeting a release.
603+ For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
604+
605+ For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
606+ https://storage.googleapis.com/k8s-triage/index.html
607+
608+ We expect no non-infra related flakes in the last month as a GA graduation criteria.
609+ -->
610+
611+ To test this feature end to end, we will need to check the following cases:
612+
613+ - A ReadWriteOncePod volume will succeed mounting when consumed by a single pod
614+ on a node
615+ - A ReadWriteOncePod volume will fail to mount when consumed by a second pod on
616+ the same node
617+ - A ReadWriteOncePod volume will fail to attach when consumed by a second pod on
618+ a different node
619+
620+ For testing the mapping for ReadWriteOnce, we should update the mock CSI driver
621+ to support the new volume capability access modes and cut a release. The
622+ existing Kubernetes end to end tests will be updated to use this version which
623+ will test the mapping behavior because most storage end to end tests rely on the
624+ ReadWriteOnce access mode.
625+
506626#### Validation of PersistentVolumeSpec Object
507627
508628To test the validation logic of the PersistentVolumeSpec, we need to check the
@@ -538,20 +658,6 @@ well as in CSI sidecars.
538658
539659#### End to End Tests
540660
541- To test this feature end to end, we will need to check the following cases:
542-
543- - A ReadWriteOncePod volume will succeed mounting when consumed by a single pod
544- on a node
545- - A ReadWriteOncePod volume will fail to mount when consumed by a second pod on
546- the same node
547- - A ReadWriteOncePod volume will fail to attach when consumed by a second pod on
548- a different node
549-
550- For testing the mapping for ReadWriteOnce, we should update the mock CSI driver
551- to support the new volume capability access modes and cut a release. The
552- existing Kubernetes end to end tests will be updated to use this version which
553- will test the mapping behavior because most storage end to end tests rely on the
554- ReadWriteOnce access mode.
555661
556662### Graduation Criteria
557663
@@ -622,9 +728,8 @@ in back-to-back releases.
622728
623729- Scheduler enforces ReadWriteOncePod access mode by marking pods as
624730 Unschedulable, preemption logic added
731+ - In-tree drivers support ReadWriteOncePod access mode
625732- ReadWriteOncePod access mode has end to end test coverage
626- - Mock CSI driver supports ` SINGLE_NODE_*_WRITER ` access modes, relevant end to
627- end tests updated to use this driver
628733- Hostpath CSI driver supports ` SINGLE_NODE_*_WRITER ` access modes, relevant end
629734 to end tests updated to use this driver
630735
@@ -832,13 +937,31 @@ Try to be as paranoid as possible - e.g., what if some components will restart
832937mid-rollout?
833938-->
834939
940+ Rolling out this feature involves enabling the ReadWriteOncePod feature gate
941+ across kube-apiserver, kube-scheduler, kubelet, and updating CSI driver and
942+ sidecar versions.
943+
944+ The only way this rollout can fail is if a user does not update all components,
945+ in which case the feature will not work. See the above section on version skews
946+ for behavior in this scenario.
947+
948+ Rolling out this feature does not impact any running workloads.
949+
835950###### What specific metrics should inform a rollback?
836951
837952<!--
838953What signals should users be paying attention to when the feature is young
839954that might indicate a serious problem?
840955-->
841956
957+ If pods using ReadWriteOncePod PVCs fail to schedule, you may see an increase in
958+ ` scheduler_unschedulable_pods{plugin="VolumeRestrictions"} ` .
959+
960+ For enforcement in kubelet, if there are issues may see changes in metrics for
961+ "volume_mount" operations. For example, an increase in
962+ ` storage_operation_duration_seconds_bucket{operation_name="volume_mount"} ` for
963+ larger buckets may indicate issues with mount.
964+
842965###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
843966
844967<!--
@@ -847,12 +970,24 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
847970are missing a bunch of machinery and tooling and can't do that now.
848971-->
849972
973+ For alpha, manual tests were performed to:
974+
975+ - Unsuccessfully create workloads using ReadWriteOncePod PVCs prior to upgrade
976+ - Perform the upgrade (enabling feature flags and updating CSI drivers)
977+ - Successfully create workloads using ReadWriteOncePod PVCs
978+ - Perform the downgrade (disabling feature flags and downgrading CSI drivers)
979+ - Successfully delete ReadWriteOncePod PVCs
980+
981+ For beta, similar manual tests will need to be performed once implemented.
982+
850983###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
851984
852985<!--
853986Even if applying deprecation policies, they may still surprise some users.
854987-->
855988
989+ No.
990+
856991### Monitoring Requirements
857992
858993<!--
@@ -867,18 +1002,21 @@ checking if there are objects with field X set) may be a last resort. Avoid
8671002logs or events for this purpose.
8681003-->
8691004
1005+ An operator can query for PersistentVolumeClaims and PersistentVolumes in the
1006+ cluster with the ReadWriteOncePod access mode. If any exist then the feature is
1007+ in use.
1008+
8701009###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
8711010
8721011<!--
8731012Pick one more of these and delete the rest.
8741013-->
8751014
876- - [ ] Metrics
877- - Metric name:
1015+ - [X ] Metrics
1016+ - Metric name: ` scheduler_unschedulable_pods{plugin="VolumeRestrictions"} `
8781017 - [ Optional] Aggregation method:
8791018 - Components exposing the metric:
880- - [ ] Other (treat as last resort)
881- - Details:
1019+ - kube-scheduler
8821020
8831021###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
8841022
@@ -892,13 +1030,17 @@ high level (needs more precise definitions) those may be things like:
8921030 - 99,9% of /health requests per day finish with 200 code
8931031-->
8941032
1033+ Per-day percentage of CSI driver API calls finishing with 5XX errors <= 1%.
1034+
8951035###### Are there any missing metrics that would be useful to have to improve observability of this feature?
8961036
8971037<!--
8981038Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
8991039implementation difficulties, etc.).
9001040-->
9011041
1042+ No.
1043+
9021044### Dependencies
9031045
9041046<!--
@@ -922,6 +1064,18 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
9221064 - Impact of its degraded performance or high-error rates on the feature:
9231065-->
9241066
1067+ This feature depends on the cluster having CSI drivers and sidecars that use CSI
1068+ spec v1.5.0 at minimum.
1069+
1070+ - [ CSI drivers and sidecars]
1071+ - Usage description:
1072+ - Impact of its outage on the feature: Inability to perform CSI storage
1073+ operations on ReadWriteOncePod PVCs and PVs (for example, provisioning
1074+ volumes)
1075+ - Impact of its degraded performance or high-error rates on the feature:
1076+ Increase in latency performing CSI storage operations (due to repeated
1077+ retries)
1078+
9251079### Scalability
9261080
9271081<!--
@@ -1026,6 +1180,9 @@ details). For now, we leave it here.
10261180
10271181###### How does this feature react if the API server and/or etcd is unavailable?
10281182
1183+ Existing ReadWriteOncePod volumes will continue working, however users will not
1184+ be able to make any changes to them.
1185+
10291186###### What are other known failure modes?
10301187
10311188<!--
@@ -1041,8 +1198,12 @@ For each of them, fill in the following information by copying the below templat
10411198 - Testing: Are there any tests for failure mode? If not, describe why.
10421199-->
10431200
1201+ None.
1202+
10441203###### What steps should be taken if SLOs are not being met to determine the problem?
10451204
1205+ Roll back the feature by disabling the ReadWriteOncePod feature gate.
1206+
10461207## Implementation History
10471208
10481209<!--
0 commit comments