Skip to content

Commit 53250d3

Browse files
committed
KEP-2485: ReadWriteOncePod beta production readiness
1 parent 1287006 commit 53250d3

File tree

3 files changed

+197
-33
lines changed

3 files changed

+197
-33
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2485
22
alpha:
33
approver: "@ehashman"
4+
beta:
5+
approver: "@deads2k"

keps/sig-storage/2485-read-write-once-pod-pv-access-mode/README.md

Lines changed: 191 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -94,17 +94,24 @@ tags, and then generate with `hack/update-toc.sh`.
9494
- [Design Details](#design-details)
9595
- [Kubernetes Changes, Access Mode](#kubernetes-changes-access-mode)
9696
- [Scheduler Enforcement](#scheduler-enforcement)
97+
- [Alpha](#alpha)
98+
- [Beta](#beta)
9799
- [Mount Enforcement](#mount-enforcement)
98100
- [CSI Specification Changes, Volume Capabilities](#csi-specification-changes-volume-capabilities)
101+
- [Supporting In-Tree Drivers](#supporting-in-tree-drivers)
99102
- [Test Plan](#test-plan)
103+
- [Prerequisite testing updates](#prerequisite-testing-updates)
104+
- [Unit tests](#unit-tests)
105+
- [Integration tests](#integration-tests)
106+
- [e2e tests](#e2e-tests)
100107
- [Validation of PersistentVolumeSpec Object](#validation-of-persistentvolumespec-object)
101108
- [Mounting and Mapping with ReadWriteOncePod](#mounting-and-mapping-with-readwriteoncepod)
102109
- [Mounting and Mapping with ReadWriteOnce](#mounting-and-mapping-with-readwriteonce)
103110
- [Mapping Kubernetes Access Modes to CSI Volume Capability Access Modes](#mapping-kubernetes-access-modes-to-csi-volume-capability-access-modes)
104111
- [End to End Tests](#end-to-end-tests)
105112
- [Graduation Criteria](#graduation-criteria)
106-
- [Alpha](#alpha)
107-
- [Beta](#beta)
113+
- [Alpha](#alpha-1)
114+
- [Beta](#beta-1)
108115
- [GA](#ga)
109116
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
110117
- [Version Skew Strategy](#version-skew-strategy)
@@ -389,6 +396,8 @@ This access mode will be enforced in two places:
389396

390397
#### Scheduler Enforcement
391398

399+
##### Alpha
400+
392401
First is at the time a pod is scheduled. When scheduling a pod, if another pod
393402
is found using the same PVC and the PVC uses ReadWriteOncePod, then scheduling
394403
will fail and the pod will be considered UnschedulableAndUnresolvable.
@@ -407,6 +416,24 @@ marked UnschedulableAndUnresolvable.
407416
[volume restrictions plugin]: https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/scheduler/framework/plugins/volumerestrictions/volume_restrictions.go#L29
408417
[node info cache]: https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/scheduler/framework/types.go#L357
409418

419+
##### Beta
420+
421+
Support for pod preemption is enforced in beta.
422+
423+
When a pod (A) is scheduled, if another pod (B) is found using the same PVC, the
424+
PVC uses ReadWriteOncePod, and pod (A) has higher priority than pod (B), then
425+
return Unschedulable (which will cause pod (B) to be preempted). If pod (A) has
426+
lower or equal priority compared with pod (B), return
427+
UnschedulableAndUnresolvable.
428+
429+
In the PreFilter phase of the volume restrictions scheduler plugin, we will
430+
build a cache of any existing pods and nodes using the ReadWriteOncePod PVCs on
431+
the pod to be scheduled. This cache will be saved as part of the scheduler's
432+
cycleState and forwarded to the following step. During AddPod and RemovePod we
433+
will add or remove references to the target ReadWriteOncePod PVCs to simulate
434+
preemption. During the Filter phase we will check caches for remaining
435+
references to the PVCs and compare their pod priorities if applicable.
436+
410437
#### Mount Enforcement
411438

412439
As an additional precaution this will also be enforced at the time a volume is
@@ -483,18 +510,21 @@ Put more succinctly:
483510
CSI clients that will need updating are kubelet, external-provisioner,
484511
external-attacher, and external-resizer.
485512

513+
### Supporting In-Tree Drivers
514+
515+
In-tree storage drivers implement the [`PersistentVolumePlugin`] interface which
516+
specifies a list of supported access modes. For beta, we will update drivers to
517+
also accept the ReadWriteOncePod access mode. Additional updates are required to
518+
the CSI migration libraries (per volume type) to account for the new access
519+
mode.
520+
521+
[`PersistentVolumePlugin`]: https://github.com/kubernetes/kubernetes/blob/v1.25.2/pkg/volume/plugins.go#L200-L201
522+
486523
### Test Plan
487524

488525
<!--
489526
**Note:** *Not required until targeted at a release.*
490-
491-
Consider the following in developing a test plan for this enhancement:
492-
- Will there be e2e and integration tests, in addition to unit tests?
493-
- How will it be tested in isolation vs with other components?
494-
495-
No need to outline all of the test cases, just the general strategy. Anything
496-
that would count as tricky in the implementation, and anything particularly
497-
challenging to test, should be called out.
527+
The goal is to ensure that we don't accept enhancements with inadequate testing.
498528
499529
All code is expected to have adequate tests (eventually with coverage
500530
expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
@@ -503,6 +533,96 @@ when drafting this test plan.
503533
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
504534
-->
505535

536+
[X] I/we understand the owners of the involved components may require updates to
537+
existing tests to make this code solid enough prior to committing the changes
538+
necessary to implement this enhancement.
539+
540+
##### Prerequisite testing updates
541+
542+
<!--
543+
Based on reviewers feedback describe what additional tests need to be added prior
544+
implementing this enhancement to ensure the enhancements have also solid foundations.
545+
-->
546+
547+
None. New tests will be added for the transition to beta to support scheduler
548+
changes.
549+
550+
##### Unit tests
551+
552+
<!--
553+
In principle every added code should have complete unit test coverage, so providing
554+
the exact set of tests will not bring additional value.
555+
However, if complete unit test coverage is not possible, explain the reason of it
556+
together with explanation why this is acceptable.
557+
-->
558+
559+
<!--
560+
Additionally, for Alpha try to enumerate the core package you will be touching
561+
to implement this enhancement and provide the current unit coverage for those
562+
in the form of:
563+
- <package>: <date> - <current test coverage>
564+
The data can be easily read from:
565+
https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
566+
567+
This can inform certain test coverage improvements that we want to do before
568+
extending the production code to implement this enhancement.
569+
-->
570+
571+
In alpha, the following unit tests were updated. See
572+
https://github.com/kubernetes/kubernetes/pull/102028 and
573+
https://github.com/kubernetes/kubernetes/pull/103082 for more context.
574+
575+
- `k8s.io/kubernetes/pkg/apis/core/helper`: `09-22-2022` - `26.2`
576+
- `k8s.io/kubernetes/pkg/apis/core/v1/helper`: `09-22-2022` - `56.9`
577+
- `k8s.io/kubernetes/pkg/apis/core/validation`: `09-22-2022` - `82.3`
578+
- `k8s.io/kubernetes/pkg/controller/volume/persistentvolume`: `09-22-2022` - `79.4`
579+
- `k8s.io/kubernetes/pkg/kubelet/volumemanager/cache`: `09-22-2022` - `66.3`
580+
- `k8s.io/kubernetes/pkg/volume/csi/csi_client.go`: `09-22-2022` - `76.2`
581+
- `k8s.io/kubernetes/pkg/scheduler/apis/config/v1beta2`: `09-22-2022` - `76.8`
582+
- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumerestrictions`: `09-22-2022` - `85`
583+
- `k8s.io/kubernetes/pkg/scheduler/framework`: `09-22-2022` - `77.1`
584+
585+
In beta, there will be additional unit test coverage for
586+
`k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumerestrictions` to cover
587+
preemption logic.
588+
589+
##### Integration tests
590+
591+
<!--
592+
This question should be filled when targeting a release.
593+
For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
594+
595+
For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
596+
https://storage.googleapis.com/k8s-triage/index.html
597+
-->
598+
599+
##### e2e tests
600+
601+
<!--
602+
This question should be filled when targeting a release.
603+
For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
604+
605+
For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
606+
https://storage.googleapis.com/k8s-triage/index.html
607+
608+
We expect no non-infra related flakes in the last month as a GA graduation criteria.
609+
-->
610+
611+
To test this feature end to end, we will need to check the following cases:
612+
613+
- A ReadWriteOncePod volume will succeed mounting when consumed by a single pod
614+
on a node
615+
- A ReadWriteOncePod volume will fail to mount when consumed by a second pod on
616+
the same node
617+
- A ReadWriteOncePod volume will fail to attach when consumed by a second pod on
618+
a different node
619+
620+
For testing the mapping for ReadWriteOnce, we should update the mock CSI driver
621+
to support the new volume capability access modes and cut a release. The
622+
existing Kubernetes end to end tests will be updated to use this version which
623+
will test the mapping behavior because most storage end to end tests rely on the
624+
ReadWriteOnce access mode.
625+
506626
#### Validation of PersistentVolumeSpec Object
507627

508628
To test the validation logic of the PersistentVolumeSpec, we need to check the
@@ -538,20 +658,6 @@ well as in CSI sidecars.
538658

539659
#### End to End Tests
540660

541-
To test this feature end to end, we will need to check the following cases:
542-
543-
- A ReadWriteOncePod volume will succeed mounting when consumed by a single pod
544-
on a node
545-
- A ReadWriteOncePod volume will fail to mount when consumed by a second pod on
546-
the same node
547-
- A ReadWriteOncePod volume will fail to attach when consumed by a second pod on
548-
a different node
549-
550-
For testing the mapping for ReadWriteOnce, we should update the mock CSI driver
551-
to support the new volume capability access modes and cut a release. The
552-
existing Kubernetes end to end tests will be updated to use this version which
553-
will test the mapping behavior because most storage end to end tests rely on the
554-
ReadWriteOnce access mode.
555661

556662
### Graduation Criteria
557663

@@ -622,9 +728,8 @@ in back-to-back releases.
622728

623729
- Scheduler enforces ReadWriteOncePod access mode by marking pods as
624730
Unschedulable, preemption logic added
731+
- In-tree drivers support ReadWriteOncePod access mode
625732
- ReadWriteOncePod access mode has end to end test coverage
626-
- Mock CSI driver supports `SINGLE_NODE_*_WRITER` access modes, relevant end to
627-
end tests updated to use this driver
628733
- Hostpath CSI driver supports `SINGLE_NODE_*_WRITER` access modes, relevant end
629734
to end tests updated to use this driver
630735

@@ -832,13 +937,31 @@ Try to be as paranoid as possible - e.g., what if some components will restart
832937
mid-rollout?
833938
-->
834939

940+
Rolling out this feature involves enabling the ReadWriteOncePod feature gate
941+
across kube-apiserver, kube-scheduler, kubelet, and updating CSI driver and
942+
sidecar versions.
943+
944+
The only way this rollout can fail is if a user does not update all components,
945+
in which case the feature will not work. See the above section on version skews
946+
for behavior in this scenario.
947+
948+
Rolling out this feature does not impact any running workloads.
949+
835950
###### What specific metrics should inform a rollback?
836951

837952
<!--
838953
What signals should users be paying attention to when the feature is young
839954
that might indicate a serious problem?
840955
-->
841956

957+
If pods using ReadWriteOncePod PVCs fail to schedule, you may see an increase in
958+
`scheduler_unschedulable_pods{plugin="VolumeRestrictions"}`.
959+
960+
For enforcement in kubelet, if there are issues may see changes in metrics for
961+
"volume_mount" operations. For example, an increase in
962+
`storage_operation_duration_seconds_bucket{operation_name="volume_mount"}` for
963+
larger buckets may indicate issues with mount.
964+
842965
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
843966

844967
<!--
@@ -847,12 +970,24 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
847970
are missing a bunch of machinery and tooling and can't do that now.
848971
-->
849972

973+
For alpha, manual tests were performed to:
974+
975+
- Unsuccessfully create workloads using ReadWriteOncePod PVCs prior to upgrade
976+
- Perform the upgrade (enabling feature flags and updating CSI drivers)
977+
- Successfully create workloads using ReadWriteOncePod PVCs
978+
- Perform the downgrade (disabling feature flags and downgrading CSI drivers)
979+
- Successfully delete ReadWriteOncePod PVCs
980+
981+
For beta, similar manual tests will need to be performed once implemented.
982+
850983
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
851984

852985
<!--
853986
Even if applying deprecation policies, they may still surprise some users.
854987
-->
855988

989+
No.
990+
856991
### Monitoring Requirements
857992

858993
<!--
@@ -867,18 +1002,21 @@ checking if there are objects with field X set) may be a last resort. Avoid
8671002
logs or events for this purpose.
8681003
-->
8691004

1005+
An operator can query for PersistentVolumeClaims and PersistentVolumes in the
1006+
cluster with the ReadWriteOncePod access mode. If any exist then the feature is
1007+
in use.
1008+
8701009
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
8711010

8721011
<!--
8731012
Pick one more of these and delete the rest.
8741013
-->
8751014

876-
- [ ] Metrics
877-
- Metric name:
1015+
- [X] Metrics
1016+
- Metric name: `scheduler_unschedulable_pods{plugin="VolumeRestrictions"}`
8781017
- [Optional] Aggregation method:
8791018
- Components exposing the metric:
880-
- [ ] Other (treat as last resort)
881-
- Details:
1019+
- kube-scheduler
8821020

8831021
###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
8841022

@@ -892,13 +1030,17 @@ high level (needs more precise definitions) those may be things like:
8921030
- 99,9% of /health requests per day finish with 200 code
8931031
-->
8941032

1033+
Per-day percentage of CSI driver API calls finishing with 5XX errors <= 1%.
1034+
8951035
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
8961036

8971037
<!--
8981038
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
8991039
implementation difficulties, etc.).
9001040
-->
9011041

1042+
No.
1043+
9021044
### Dependencies
9031045

9041046
<!--
@@ -922,6 +1064,18 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
9221064
- Impact of its degraded performance or high-error rates on the feature:
9231065
-->
9241066

1067+
This feature depends on the cluster having CSI drivers and sidecars that use CSI
1068+
spec v1.5.0 at minimum.
1069+
1070+
- [CSI drivers and sidecars]
1071+
- Usage description:
1072+
- Impact of its outage on the feature: Inability to perform CSI storage
1073+
operations on ReadWriteOncePod PVCs and PVs (for example, provisioning
1074+
volumes)
1075+
- Impact of its degraded performance or high-error rates on the feature:
1076+
Increase in latency performing CSI storage operations (due to repeated
1077+
retries)
1078+
9251079
### Scalability
9261080

9271081
<!--
@@ -1026,6 +1180,9 @@ details). For now, we leave it here.
10261180

10271181
###### How does this feature react if the API server and/or etcd is unavailable?
10281182

1183+
Existing ReadWriteOncePod volumes will continue working, however users will not
1184+
be able to make any changes to them.
1185+
10291186
###### What are other known failure modes?
10301187

10311188
<!--
@@ -1041,8 +1198,12 @@ For each of them, fill in the following information by copying the below templat
10411198
- Testing: Are there any tests for failure mode? If not, describe why.
10421199
-->
10431200

1201+
None.
1202+
10441203
###### What steps should be taken if SLOs are not being met to determine the problem?
10451204

1205+
Roll back the feature by disabling the ReadWriteOncePod feature gate.
1206+
10461207
## Implementation History
10471208

10481209
<!--

keps/sig-storage/2485-read-write-once-pod-pv-access-mode/kep.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,17 +22,17 @@ see-also:
2222
replaces:
2323

2424
# The target maturity stage in the current dev cycle for this KEP.
25-
stage: alpha
25+
stage: beta
2626

2727
# The most recent milestone for which work toward delivery of this KEP has been
2828
# done. This can be the current (upcoming) milestone, if it is being actively
2929
# worked on.
30-
latest-milestone: "v1.22"
30+
latest-milestone: "v1.26"
3131

3232
# The milestone at which this feature was, or is targeted to be, at each stage.
3333
milestone:
3434
alpha: "v1.22"
35-
beta: TBD
35+
beta: "v1.26"
3636
stable: TBD
3737

3838
# The following PRR answers are required at alpha release
@@ -47,3 +47,4 @@ disable-supported: true
4747

4848
# The following PRR answers are required at beta release
4949
metrics:
50+
- scheduler_unschedulable_pods{plugin="VolumeRestrictions"}

0 commit comments

Comments
 (0)