You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -74,52 +74,49 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
74
74
75
75
## Summary
76
76
77
-
This proposal aims at enabling dynamic node resizing. This will help in resizing cluster resource capacity by just updating resources of nodes rather than adding new node or removing existing node and
78
-
also enable node configurations to be reflected at the node and cluster levels automatically without the need to manually resetting the kubelet
77
+
The proposal aims at enabling dynamic node resizing. This will help in updating cluster resource capacity by just resizing compute resources of nodes rather than adding new node or removing existing node from a cluster.
78
+
The updated node configurations are to be reflected at the node and cluster levels automatically without the need to reset the kubelet.
79
79
80
-
This proposal also aims to improvise the initialisation and reinitialisation of resource managers like cpu manager, memory manager with the dynamic change in machine's CPU and memory configurations.
80
+
This proposal also aims to improve the initialization and reinitialization of resource managers, such as the CPU manager and memory manager, in response to changes in a node's CPU and memory configurations.
81
81
82
82
## Motivation
83
-
In a typical Kubernetes environment, the cluster resources may need to be altered because of various reasons like
84
-
- Incorrect resource assignment while creating a cluster.
85
-
-Workload on cluster is increased over time and leading to add more resources to cluster.
86
-
-Workload on cluster is decreased over time and leading to resources under utilization.
83
+
In a typical Kubernetes environment, the cluster resources may need to be altered due to following reasons:
84
+
- Incorrect resource assignment during cluster creation.
85
+
-Increased workload over time, leading to the need for additional resources in the cluster.
86
+
-Decreased workload over time, leading to resource underutilization in the cluster.
87
87
88
-
To handle these scenarios currently we can
89
-
- Horizontally scale up or down cluster by the addition or removal of compute nodes
90
-
- Vertically scale up or down cluster by increasing or decreasing the node’s capacity, but the current workaround for the node resize to be captured by the cluster is only by the means of restarting Kubelet.
88
+
To handle these scenarios, we can:
89
+
- Horizontally scale up or down the cluster by adding or removing compute nodes.
90
+
- Vertically scale up or down the cluster by increasing or decreasing node capacity. However, currently, the workaround for capturing node resizing in the cluster involves restarting the Kubelet.
91
91
92
-
The dynamic node resize will give advantages in case of scenarios like
93
-
- Handling the resource demand with limited set of machines by increasing the capacity of existing machines rather than creating new ones.
94
-
- Creating/Deleting new machine takes more time when compared to increasing/decreasing the capacity of existing ones.
92
+
Dynamic node resizing will provide advantages in scenarios such as:
93
+
- Handling resource demand with a limited set of nodes by increasing the capacity of existing nodes instead of creating new nodes.
94
+
- Creating or deleting new nodes takes more time compared to increasing or decreasing the capacity of existing nodes.
95
95
96
96
### Goals
97
97
98
-
* Dynamically resize the node without restarting the kubelet
99
-
* Add ability to reinitialize resource managers(cpu manager, memory manager) to adopt changes in machine resource
100
-
98
+
* Dynamically resize the node without restarting the kubelet.
99
+
* Ability to reinitialize resource managers (CPU manager, memory manager) to adopt changes in node's resource.
101
100
102
101
### Non-Goals
103
102
104
103
* Update the autoscaler to utilize dynamic node resize.
105
104
106
105
## Proposal
107
106
108
-
This KEP adds a polling mechanism in kubelet to fetch the machine-info using cadvisor, The information will be fetched repeatedly based on configured time interval.
109
-
Later node status updater will take care of updating this information at node level.
107
+
This KEP adds a polling mechanism in kubelet to fetch the machine-information from cAdvisor's cache, The information will be fetched periodically based on a configured time interval, after which the node status updater is responsible for updating this information at node level in the cluster.
110
108
111
-
This KEP also improvises the resource managers like memory manager, cpu manager initialization and reinitialization so that these resource managers will
112
-
adapt to the dynamic change in machine configurations.
109
+
Additionally, this KEP aims to improve the initialization and reinitialization of resource managers, such as the memory manager and CPU manager, so that they can adapt to change in node's configurations.
113
110
114
111
### User Stories (Optional)
115
112
116
113
#### Story 1
117
114
118
-
As a cluster admin, I want to increase the cluster resource capacity without adding a new node to the cluster.
115
+
As a cluster admin, I must be able to increase the cluster resource capacity without adding a new node to the cluster.
119
116
120
117
#### Story 2
121
118
122
-
As a cluster admin, I want to decrease the cluster resource capacity without removing an existing node from the cluster.
119
+
As a cluster admin, I must be able to decrease the cluster resource capacity without removing an existing node from the cluster.
123
120
124
121
### Notes/Constraints/Caveats (Optional)
125
122
@@ -148,13 +145,13 @@ Consider including folks who also work outside the SIG or subproject.
148
145
149
146
## Design Details
150
147
151
-
Below diagram is shows the interaction between kubelet and cadvisor
148
+
Below diagram is shows the interaction between kubelet and cAdvisor.
@@ -177,7 +174,7 @@ Below diagram is shows the interaction between kubelet and cadvisor
177
174
| node status update | | |
178
175
|<-------------------------------| | |
179
176
| | | |
180
-
| | | |
177
+
| if shrink in resource | | |
181
178
| re-run pod admission | | |
182
179
|<-------------------------------| | | |
183
180
| | | |
@@ -188,14 +185,76 @@ Below diagram is shows the interaction between kubelet and cadvisor
188
185
```
189
186
190
187
The interaction sequence is as follows
191
-
1. Kubelet will be polling cadvisor with interval of configured time like one minute to fetch the machine resource information
192
-
2. Cadvisor will fetch and update the machine resource information
193
-
3. kubelet cache will be updated with the latest machine resource information
194
-
4. node status updater will update the node's status with new resource information
195
-
5. In case of shrink in cluster resources will re-run the pod admission to evict pods which lack resources
196
-
6. kubelet will reinitialize the resource managers to keep them up to date with dynamic resource changes
188
+
1. Kubelet will be polling in interval of configured time to fetch the machine resource information from cAdvisor's cache, Which is currently updated every 5 minutes.
189
+
3. Kubelet's cache will be updated with the latest machine resource information.
190
+
4. Node status updater will update the node's status with the latest resource information.
191
+
5. In case of a shrink in cluster resources rerun the pod admission and the pod admission will evict pods
192
+
6. Kubelet will reinitialize the resource managers to keep them up to date with dynamic resource changes.
193
+
194
+
Note: In case of increase in cluster resources, the scheduler will automatically schedule any pending pods.
195
+
196
+
**Kubelet Configuration changes**
197
+
198
+
* A new boolean variable `dynamicNodeResize` will be added to kubelet configuration.
199
+
*`dynamicNodeResize` will be false by default.
200
+
* User need to set `dynamicNodeResize` to true make use of Dynamic Node Resize.
201
+
202
+
**Proposed Code changes**
203
+
204
+
**Dynamic Node resize and Pod Re-admission logic**
205
+
206
+
```azure
207
+
if kl.kubeletConfiguration.DynamicNodeResize {
208
+
// Handle the node dynamic resize
209
+
machineInfo, err := kl.cadvisor.MachineInfo()
210
+
if err != nil {
211
+
klog.ErrorS(err, "Error fetching machine info")
212
+
} else {
213
+
cachedMachineInfo, _ := kl.GetCachedMachineInfo()
214
+
215
+
if !reflect.DeepEqual(cachedMachineInfo, machineInfo) {
216
+
kl.setCachedMachineInfo(machineInfo)
217
+
218
+
// Resync the resource managers
219
+
if err := kl.ResyncComponents(machineInfo); err != nil {
220
+
klog.ErrorS(err, "Error resyncing the kubelet components with machine info")
221
+
}
222
+
223
+
//Rerun pod admission only in case of shrink in cluster resources
224
+
if machineInfo.NumCores < cachedMachineInfo.NumCores || machineInfo.MemoryCapacity < cachedMachineInfo.MemoryCapacity {
225
+
klog.InfoS("Observed shrink in nod resources, rerunning pod admission")
226
+
kl.HandlePodAdditions(activePods)
227
+
}
228
+
}
229
+
}
230
+
}
231
+
```
232
+
233
+
**Changes to resource managers to adapt to dynamic resize**
234
+
235
+
1. Adding ResyncComponents() method to ContainerManager interface
236
+
```azure
237
+
// Manages the containers running on a machine.
238
+
type ContainerManager interface {
239
+
.
240
+
.
241
+
// ResyncComponents will resyc the resource managers like cpu, memory and topology managers
0 commit comments