@@ -94,17 +94,23 @@ tags, and then generate with `hack/update-toc.sh`.
9494- [ Design Details] ( #design-details )
9595 - [ Kubernetes Changes, Access Mode] ( #kubernetes-changes-access-mode )
9696 - [ Scheduler Enforcement] ( #scheduler-enforcement )
97+ - [ Alpha] ( #alpha )
98+ - [ Beta] ( #beta )
9799 - [ Mount Enforcement] ( #mount-enforcement )
98100 - [ CSI Specification Changes, Volume Capabilities] ( #csi-specification-changes-volume-capabilities )
99101 - [ Test Plan] ( #test-plan )
102+ - [ Prerequisite testing updates] ( #prerequisite-testing-updates )
103+ - [ Unit tests] ( #unit-tests )
104+ - [ Integration tests] ( #integration-tests )
105+ - [ e2e tests] ( #e2e-tests )
100106 - [ Validation of PersistentVolumeSpec Object] ( #validation-of-persistentvolumespec-object )
101107 - [ Mounting and Mapping with ReadWriteOncePod] ( #mounting-and-mapping-with-readwriteoncepod )
102108 - [ Mounting and Mapping with ReadWriteOnce] ( #mounting-and-mapping-with-readwriteonce )
103109 - [ Mapping Kubernetes Access Modes to CSI Volume Capability Access Modes] ( #mapping-kubernetes-access-modes-to-csi-volume-capability-access-modes )
104110 - [ End to End Tests] ( #end-to-end-tests )
105111 - [ Graduation Criteria] ( #graduation-criteria )
106- - [ Alpha] ( #alpha )
107- - [ Beta] ( #beta )
112+ - [ Alpha] ( #alpha-1 )
113+ - [ Beta] ( #beta-1 )
108114 - [ GA] ( #ga )
109115 - [ Upgrade / Downgrade Strategy] ( #upgrade--downgrade-strategy )
110116 - [ Version Skew Strategy] ( #version-skew-strategy )
@@ -389,6 +395,8 @@ This access mode will be enforced in two places:
389395
390396#### Scheduler Enforcement
391397
398+ ##### Alpha
399+
392400First is at the time a pod is scheduled. When scheduling a pod, if another pod
393401is found using the same PVC and the PVC uses ReadWriteOncePod, then scheduling
394402will fail and the pod will be considered UnschedulableAndUnresolvable.
@@ -407,6 +415,24 @@ marked UnschedulableAndUnresolvable.
407415[ volume restrictions plugin ] : https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/scheduler/framework/plugins/volumerestrictions/volume_restrictions.go#L29
408416[ node info cache ] : https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/scheduler/framework/types.go#L357
409417
418+ ##### Beta
419+
420+ Support for pod preemption is enforced in beta.
421+
422+ When a pod (A) is scheduled, if another pod (B) is found using the same PVC, the
423+ PVC uses ReadWriteOncePod, and pod (A) has higher priority than pod (B), then
424+ return Unschedulable (which will cause pod (B) to be preempted). If pod (A) has
425+ lower or equal priority compared with pod (B), return
426+ UnschedulableAndUnresolvable.
427+
428+ In the PreFilter phase of the volume restrictions scheduler plugin, we will
429+ build a cache of any existing pods and nodes using the ReadWriteOncePod PVCs on
430+ the pod to be scheduled. This cache will be saved as part of the scheduler's
431+ cycleState and forwarded to the following step. During AddPod and RemovePod we
432+ will add or remove references to the target ReadWriteOncePod PVCs to simulate
433+ preemption. During the Filter phase we will check caches for remaining
434+ references to the PVCs and compare their pod priorities if applicable.
435+
410436#### Mount Enforcement
411437
412438As an additional precaution this will also be enforced at the time a volume is
@@ -487,14 +513,7 @@ external-attacher, and external-resizer.
487513
488514<!--
489515**Note:** *Not required until targeted at a release.*
490-
491- Consider the following in developing a test plan for this enhancement:
492- - Will there be e2e and integration tests, in addition to unit tests?
493- - How will it be tested in isolation vs with other components?
494-
495- No need to outline all of the test cases, just the general strategy. Anything
496- that would count as tricky in the implementation, and anything particularly
497- challenging to test, should be called out.
516+ The goal is to ensure that we don't accept enhancements with inadequate testing.
498517
499518All code is expected to have adequate tests (eventually with coverage
500519expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
@@ -503,6 +522,120 @@ when drafting this test plan.
503522[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
504523-->
505524
525+ [ X] I/we understand the owners of the involved components may require updates to
526+ existing tests to make this code solid enough prior to committing the changes
527+ necessary to implement this enhancement.
528+
529+ ##### Prerequisite testing updates
530+
531+ <!--
532+ Based on reviewers feedback describe what additional tests need to be added prior
533+ implementing this enhancement to ensure the enhancements have also solid foundations.
534+ -->
535+
536+ None. New tests will be added for the transition to beta to support scheduler
537+ changes.
538+
539+ ##### Unit tests
540+
541+ <!--
542+ In principle every added code should have complete unit test coverage, so providing
543+ the exact set of tests will not bring additional value.
544+ However, if complete unit test coverage is not possible, explain the reason of it
545+ together with explanation why this is acceptable.
546+ -->
547+
548+ <!--
549+ Additionally, for Alpha try to enumerate the core package you will be touching
550+ to implement this enhancement and provide the current unit coverage for those
551+ in the form of:
552+ - <package>: <date> - <current test coverage>
553+ The data can be easily read from:
554+ https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
555+
556+ This can inform certain test coverage improvements that we want to do before
557+ extending the production code to implement this enhancement.
558+ -->
559+
560+ In alpha, the following unit tests were updated. See
561+ https://github.com/kubernetes/kubernetes/pull/102028 and
562+ https://github.com/kubernetes/kubernetes/pull/103082 for more context.
563+
564+ - ` k8s.io/kubernetes/pkg/apis/core/helper ` : ` 09-22-2022 ` - ` 26.2 `
565+ - ` k8s.io/kubernetes/pkg/apis/core/v1/helper ` : ` 09-22-2022 ` - ` 56.9 `
566+ - ` k8s.io/kubernetes/pkg/apis/core/validation ` : ` 09-22-2022 ` - ` 82.3 `
567+ - ` k8s.io/kubernetes/pkg/controller/volume/persistentvolume ` : ` 09-22-2022 ` - ` 79.4 `
568+ - ` k8s.io/kubernetes/pkg/kubelet/volumemanager/cache ` : ` 09-22-2022 ` - ` 66.3 `
569+ - ` k8s.io/kubernetes/pkg/volume/csi/csi_client.go ` : ` 09-22-2022 ` - ` 76.2 `
570+ - ` k8s.io/kubernetes/pkg/scheduler/apis/config/v1beta2 ` : ` 09-22-2022 ` - ` 76.8 `
571+ - ` k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumerestrictions ` : ` 09-22-2022 ` - ` 85 `
572+ - ` k8s.io/kubernetes/pkg/scheduler/framework ` : ` 09-22-2022 ` - ` 77.1 `
573+
574+ In beta, there will be additional unit test coverage for
575+ ` k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumerestrictions ` to cover
576+ preemption logic.
577+
578+ ##### Integration tests
579+
580+ <!--
581+ This question should be filled when targeting a release.
582+ For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
583+
584+ For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
585+ https://storage.googleapis.com/k8s-triage/index.html
586+ -->
587+
588+ Integration tests for scheduler plugin behavior are available here:
589+
590+ - [ test/integration/scheduler/filters/filters_test.go] : [ testgrid]
591+
592+ [ test/integration/scheduler/filters/filters_test.go ] : https://github.com/kubernetes/kubernetes/blob/v1.25.2/test/integration/scheduler/filters/filters_test.go
593+ [ testgrid ] : https://testgrid.k8s.io/presubmits-kubernetes-blocking#pull-kubernetes-integration&include-filter-by-regex=k8s.io/kubernetes/test/integration/scheduler/filters.TestUnschedulablePodBecomesSchedulable/scheduled_pod_uses_read-write-once-pod_pvc
594+
595+ ##### e2e tests
596+
597+ <!--
598+ This question should be filled when targeting a release.
599+ For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
600+
601+ For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
602+ https://storage.googleapis.com/k8s-triage/index.html
603+
604+ We expect no non-infra related flakes in the last month as a GA graduation criteria.
605+ -->
606+
607+ For alpha, to test this feature end to end, we will need to check the following cases:
608+
609+ - A ReadWriteOncePod volume will succeed mounting when consumed by a single pod
610+ on a node
611+ - A ReadWriteOncePod volume will fail to mount when consumed by a second pod on
612+ the same node
613+ - A ReadWriteOncePod volume will fail to attach when consumed by a second pod on
614+ a different node
615+
616+ For testing the mapping for ReadWriteOnce, we should update the mock CSI driver
617+ to support the new volume capability access modes and cut a release. The
618+ existing Kubernetes end to end tests will be updated to use this version which
619+ will test the mapping behavior because most storage end to end tests rely on the
620+ ReadWriteOnce access mode.
621+
622+ E2E tests for alpha behavior can be found here:
623+
624+ - [ test/e2e/storage/testsuites/readwriteoncepod.go ] : [k8s-triage]
625+
626+ For beta, we will want to cover the additional cases for preemption:
627+
628+ - A high-priority pod requesting a ReadWriteOncePod volume that's already in-use
629+ will result in the preemption of the pod previously using the volume
630+ - A high-priority pod requesting multiple ReadWriteOncePod volumes that are
631+ already in-use will result in the preemption of multiple pods previously using
632+ the volumes
633+ - A low-priority (or no priority) pod requesting a ReadWriteOncePod volume
634+ that's already in-use will result in it being UnschedulableAndUnresolvable
635+
636+ [ test/e2e/storage/testsuites/readwriteoncepod.go ] : https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/testsuites/readwriteoncepod.go
637+ [ k8s-triage ] : https://storage.googleapis.com/k8s-triage/index.html?pr=1&test=read-write-once-pod
638+
506639#### Validation of PersistentVolumeSpec Object
507640
508641To test the validation logic of the PersistentVolumeSpec, we need to check the
@@ -538,20 +671,6 @@ well as in CSI sidecars.
538671
539672#### End to End Tests
540673
541- To test this feature end to end, we will need to check the following cases:
542-
543- - A ReadWriteOncePod volume will succeed mounting when consumed by a single pod
544- on a node
545- - A ReadWriteOncePod volume will fail to mount when consumed by a second pod on
546- the same node
547- - A ReadWriteOncePod volume will fail to attach when consumed by a second pod on
548- a different node
549-
550- For testing the mapping for ReadWriteOnce, we should update the mock CSI driver
551- to support the new volume capability access modes and cut a release. The
552- existing Kubernetes end to end tests will be updated to use this version which
553- will test the mapping behavior because most storage end to end tests rely on the
554- ReadWriteOnce access mode.
555674
556675### Graduation Criteria
557676
@@ -623,8 +742,6 @@ in back-to-back releases.
623742- Scheduler enforces ReadWriteOncePod access mode by marking pods as
624743 Unschedulable, preemption logic added
625744- ReadWriteOncePod access mode has end to end test coverage
626- - Mock CSI driver supports ` SINGLE_NODE_*_WRITER ` access modes, relevant end to
627- end tests updated to use this driver
628745- Hostpath CSI driver supports ` SINGLE_NODE_*_WRITER ` access modes, relevant end
629746 to end tests updated to use this driver
630747
@@ -832,13 +949,31 @@ Try to be as paranoid as possible - e.g., what if some components will restart
832949mid-rollout?
833950-->
834951
952+ Rolling out this feature involves enabling the ReadWriteOncePod feature gate
953+ across kube-apiserver, kube-scheduler, kubelet, and updating CSI driver and
954+ sidecar versions.
955+
956+ The only way this rollout can fail is if a user does not update all components,
957+ in which case the feature will not work. See the above section on version skews
958+ for behavior in this scenario.
959+
960+ Rolling out this feature does not impact any running workloads.
961+
835962###### What specific metrics should inform a rollback?
836963
837964<!--
838965What signals should users be paying attention to when the feature is young
839966that might indicate a serious problem?
840967-->
841968
969+ If pods using ReadWriteOncePod PVCs fail to schedule, you may see an increase in
970+ ` scheduler_unschedulable_pods{plugin="VolumeRestrictions"} ` .
971+
972+ For enforcement in kubelet, if there are issues may see changes in metrics for
973+ "volume_mount" operations. For example, an increase in
974+ ` storage_operation_duration_seconds_bucket{operation_name="volume_mount"} ` for
975+ larger buckets may indicate issues with mount.
976+
842977###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
843978
844979<!--
@@ -847,12 +982,24 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
847982are missing a bunch of machinery and tooling and can't do that now.
848983-->
849984
985+ For alpha, manual tests were performed to:
986+
987+ - Unsuccessfully create workloads using ReadWriteOncePod PVCs prior to upgrade
988+ - Perform the upgrade (enabling feature flags and updating CSI drivers)
989+ - Successfully create workloads using ReadWriteOncePod PVCs
990+ - Perform the downgrade (disabling feature flags and downgrading CSI drivers)
991+ - Successfully delete ReadWriteOncePod PVCs
992+
993+ For beta, similar manual tests will need to be performed once implemented.
994+
850995###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
851996
852997<!--
853998Even if applying deprecation policies, they may still surprise some users.
854999-->
8551000
1001+ No.
1002+
8561003### Monitoring Requirements
8571004
8581005<!--
@@ -867,18 +1014,21 @@ checking if there are objects with field X set) may be a last resort. Avoid
8671014logs or events for this purpose.
8681015-->
8691016
1017+ An operator can query for PersistentVolumeClaims and PersistentVolumes in the
1018+ cluster with the ReadWriteOncePod access mode. If any exist then the feature is
1019+ in use.
1020+
8701021###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
8711022
8721023<!--
8731024Pick one more of these and delete the rest.
8741025-->
8751026
876- - [ ] Metrics
877- - Metric name:
1027+ - [X ] Metrics
1028+ - Metric name: ` scheduler_unschedulable_pods{plugin="VolumeRestrictions"} `
8781029 - [ Optional] Aggregation method:
8791030 - Components exposing the metric:
880- - [ ] Other (treat as last resort)
881- - Details:
1031+ - kube-scheduler
8821032
8831033###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
8841034
@@ -892,13 +1042,17 @@ high level (needs more precise definitions) those may be things like:
8921042 - 99,9% of /health requests per day finish with 200 code
8931043-->
8941044
1045+ Per-day percentage of CSI driver API calls finishing with 5XX errors <= 1%.
1046+
8951047###### Are there any missing metrics that would be useful to have to improve observability of this feature?
8961048
8971049<!--
8981050Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
8991051implementation difficulties, etc.).
9001052-->
9011053
1054+ No.
1055+
9021056### Dependencies
9031057
9041058<!--
@@ -922,6 +1076,18 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
9221076 - Impact of its degraded performance or high-error rates on the feature:
9231077-->
9241078
1079+ This feature depends on the cluster having CSI drivers and sidecars that use CSI
1080+ spec v1.5.0 at minimum.
1081+
1082+ - [ CSI drivers and sidecars]
1083+ - Usage description:
1084+ - Impact of its outage on the feature: Inability to perform CSI storage
1085+ operations on ReadWriteOncePod PVCs and PVs (for example, provisioning
1086+ volumes)
1087+ - Impact of its degraded performance or high-error rates on the feature:
1088+ Increase in latency performing CSI storage operations (due to repeated
1089+ retries)
1090+
9251091### Scalability
9261092
9271093<!--
@@ -1026,6 +1192,9 @@ details). For now, we leave it here.
10261192
10271193###### How does this feature react if the API server and/or etcd is unavailable?
10281194
1195+ Existing ReadWriteOncePod volumes will continue working, however users will not
1196+ be able to make any changes to them.
1197+
10291198###### What are other known failure modes?
10301199
10311200<!--
@@ -1041,8 +1210,12 @@ For each of them, fill in the following information by copying the below templat
10411210 - Testing: Are there any tests for failure mode? If not, describe why.
10421211-->
10431212
1213+ None.
1214+
10441215###### What steps should be taken if SLOs are not being met to determine the problem?
10451216
1217+ Roll back the feature by disabling the ReadWriteOncePod feature gate.
1218+
10461219## Implementation History
10471220
10481221<!--
0 commit comments