Skip to content

Commit 8e0a47a

Browse files
author
NickrenREN
committed
pv monitoring proposal
1 parent eb410ff commit 8e0a47a

File tree

1 file changed

+190
-0
lines changed

1 file changed

+190
-0
lines changed
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# PV monitoring proposal
2+
3+
Status: Pending
4+
5+
Version: Alpha
6+
7+
Implementation Owner: NickrenREN@
8+
9+
## Motivation
10+
11+
For now, kubernetes has no way to monitor the PVs, which may cause serious problems.
12+
For example: if volumes are unhealthy, and pods do not know that and still try to get and write data,
13+
which will lead to data loss and unavailability of services.
14+
So it is necessary to have a mechanism for monitoring PVs and react when PVs have problems.
15+
16+
## Proposal
17+
18+
We can separate the proposal into two parts:
19+
20+
* monitoring PVs and marking them if they have problems
21+
* reacting to the unhealthy PVs
22+
23+
For monitoring, we may create a controller for it, and each volume plugin should have its own function to check volume health.
24+
Controller can call them periodically. The controller also needs to watch node events because local PVs will be unreachable if nodes break down.
25+
26+
For reacting, different kinds of apps may have different methods,we can also create a controller for it.
27+
28+
At first phase, we can focus on local storage PVs monitoring.
29+
30+
## User Experience
31+
### Use Cases
32+
33+
* If the local PV path is deleted, users should know that and the local PV should be marked and deleted;
34+
* If the local PV path is not a mountpoint any more, the local PV should be marked and deleted;
35+
* If nodes which have local PVs are breaking down, the local PVs should be marked and deleted (the application has data backup and can restore it or can tolerate data loss and the PV protection feature may help);
36+
* For local PVs, we need to make sure that PV capacity must not be greater than device capacity and PV used bytes must not be greater than PV capacity;
37+
* For network storage, if the storage driver volume is deleted, the PV object in kubernetes should be marked and deleted too;
38+
* If we can not get access to the PV volume for a certain time (network or some other problems), we need to mark and delete the PV;
39+
* PV fsType checking ? bad blocks checking ?
40+
41+
## Implementation
42+
43+
As mentioned above, we can split this into two parts and put them in the external repo at first.
44+
45+
### Monitoring controller:
46+
47+
Like PV controller, monitoring controller should check PVs’ health condition periodically and taint them if PVs are unhealthy.
48+
49+
Health checking implementation should be per plugin. Each volume plugin needs to have its own methods to check its volumes.
50+
51+
At the first stage, we can focus on local storage PVs, and then extend to other network storage PVs.
52+
#### For local storage:
53+
54+
The local storage PV monitor consists of two parts
55+
56+
* create a daemonset on every node, which is responsible for monitoring local PVs in that specific node, no matter the PVs are created manually or by provisioner;
57+
* create a monitor controller, which is responsible for watching PVs and Nodes events. PVs may be updated if they are unhealthy and we also need to react to node failure event.
58+
59+
At the first phase, we can support local storage monitoring first.
60+
61+
Take local storage as an example, detailed checking method may be like this:
62+
63+
```
64+
// checkStatus checks local pv health condition
65+
func (monitor *LocalPVMonitor) checkStatus(pv *v1.PersistentVolume) {
66+
// check if PV is local storage
67+
if pv.Spec.Local == nil {
68+
glog.Infof("PV: %s is not local storage", pv.Name)
69+
return
70+
}
71+
// check node and pv affinity
72+
fit, err := CheckNodeAffinity(pv, monitor.Node.Labels)
73+
if err != nil {
74+
glog.Errorf("check node affinity error: %v", err)
75+
return
76+
}
77+
if !fit {
78+
glog.Errorf("pv: %s does not belong to this node: %s", pv.Name, monitor.Node.Name)
79+
return
80+
}
81+
82+
// check if host dir still exists
83+
mountPath, continueThisCheck := monitor.checkHostDir(pv)
84+
if !continueThisCheck {
85+
glog.Errorf("Host dir is modified, PV should be marked")
86+
return
87+
}
88+
89+
// check if it is still a mount point
90+
continueThisCheck = monitor.checkMountPoint(mountPath, pv)
91+
if !continueThisCheck {
92+
glog.Errorf("Retrieving mount points error or %s is not a mount point any more", mountPath)
93+
return
94+
}
95+
96+
// check PV size: PV capacity must not be greater than device capacity and PV used bytes must not be greater that PV capacity
97+
if pv.Spec.VolumeMode != nil && *pv.Spec.VolumeMode == v1.PersistentVolumeBlock {
98+
monitor.checkPVAndBlockSize(mountPath, pv)
99+
} else {
100+
monitor.checkPVAndFSSize(mountPath, pv)
101+
}
102+
103+
// other checks ...
104+
}
105+
```
106+
If monitor finds that one PV is unhealthy, it will mark the PV by adding annotations including timestamp.
107+
The reaction controller then can react to this PV depending on the annotations and timestamp.
108+
109+
When we first mark a PV, we will add another annotation which key is `FirstMarkTime`.
110+
And if local PV is unhealthy, annotation keys may be like: `HostPathNotExist`, `MisMatchedVolSize`, and `NotMountPoint`...
111+
112+
A marked local PV looks like:
113+
```
114+
Name: example-local-pv-1
115+
Labels: <none>
116+
Annotations: FirstMarkTime=2018-04-17 07:31:02.388570492 +0000 UTC m=+600.033905921
117+
HostPathNotExist=yes
118+
NotMountPoint=yes
119+
volume.alpha.kubernetes.io/node-affinity={ "requiredDuringSchedulingIgnoredDuringExecution": { "nodeSelectorTerms": [ { "matchExpressions": [ { "key": "kubernetes.io/hostname", "operator": "In", "valu...
120+
Finalizers: [kubernetes.io/pv-protection]
121+
StorageClass: local-disks
122+
Status: Available
123+
Claim:
124+
Reclaim Policy: Retain
125+
Access Modes: RWO
126+
Capacity: 200Mi
127+
Node Affinity: <none>
128+
Message:
129+
Source:
130+
Type: LocalVolume (a persistent volume backed by local storage on a node)
131+
Path: /mnt/disks/vol/vol1
132+
Events:
133+
Type Reason Age From Message
134+
---- ------ ---- ---- -------
135+
Normal MarkPVSucceeded 1m local-volume-monitor-127.0.0.1-40a8fb4d-4206-11e8-8e52-080027765304 Mark PV successfully with annotation key: NotMountPoint
136+
Normal MarkPVSucceeded 22s local-volume-monitor-127.0.0.1-40a8fb4d-4206-11e8-8e52-080027765304 Mark PV successfully with annotation key: HostPathNotExist
137+
```
138+
139+
#### For out-tree volume plugins(except local storage):
140+
141+
We can implement the monitor at external-repo at first. So for networked storage monitor,
142+
we can create a new controller called MonitorController like ProvisionController,
143+
which is responsible for creating informers, watching Node and PV events and calling each plugin’s monitor functions and watch .
144+
And each volume plugin will create its own monitor to check its volumes’ status.
145+
146+
#### For in-tree volume plugins(except local storage):
147+
148+
We can add a new volume plugin interface: PVHealthCheckingVolumePlugin.
149+
```
150+
type PVHealthCheckingVolumePlugin interface {
151+
VolumePlugin
152+
153+
CheckHealthCondition(spec *Spec) (string, error)
154+
}
155+
```
156+
And each volume plugin will implement it. The entire monitoring controller workflow is:
157+
158+
* Fill PV cache with initial data from etcd
159+
* Resync and check volumes status periodically
160+
* Taint PV if the volume status is abnormal
161+
162+
### PV controller changes:
163+
For unbound PVCs/PVs, PVCs will not be bound to PVs which have taints.
164+
165+
### Reaction controller:
166+
Reaction part can be implemented at the second stage, and can focus on statefulset reaction at first.
167+
Reaction controller will react to the PV update event (PVs tainted/marked by monitoring controller).
168+
Different kinds of apps should have different reactions.
169+
170+
statefulset reaction: check the annotation timestamp, if the PV can recover within the predefined time interval,
171+
we will do nothing, otherwise we need to delete the PVC bound to the unhealthy volume(PV) as well as pods referencing it.
172+
Notice the statefulset apps must have data backup and can restore it or can tolerate data loss.
173+
The PV protection feature may help.
174+
175+
Reaction controller’s workflow is:
176+
177+
* Fill PV cache from etcd;
178+
* Watch for PV update events;
179+
* Resync and populate periodically;
180+
* Delete related PVC and pods if needed ;
181+
182+
183+
184+
## Roadmap to support PV monitoring
185+
186+
* support local storage PV monitoring(marking PVs);
187+
* out-tree networked volume plugins monitor and statefulset reaction and add PV taint API support;
188+
* support in-tree volume plugins and react to other kinds of applications if needed.
189+
190+
## Alternatives considered

0 commit comments

Comments
 (0)