@@ -125,6 +125,10 @@ message CheckpointContainerRequest {
125125 string container_id = 1;
126126 // Location of the checkpoint archive used for export/import
127127 string location = 2;
128+ // Timeout in seconds for the checkpoint to complete.
129+ // Timeout of zero means to use the CRI default.
130+ // Timeout > 0 means to use the user specified timeout.
131+ int64 timeout = 3;
128132}
129133
130134message CheckpointContainerResponse {}
@@ -146,6 +150,16 @@ In its first implementation the risks are low as it tries to be a CRI API
146150change with minimal changes to the kubelet and it is gated by the feature
147151gate ` ContainerCheckpoint ` .
148152
153+ One possible risk that was identified during Alpha is that the disk of
154+ the node requesting the checkpoints could fill up if too many checkpoints
155+ are created. One approach to solve this was some kind of garbage collection
156+ of checkpoint archives. A pull request to implement garbage collection
157+ was opened ([ #115888 ] ( https://github.com/kubernetes/kubernetes/pull/115888 ) )
158+ but during review it became clear that the kubelet might not be the right
159+ place to implement checkpoint archive garbage collection and the pull request
160+ was closed again. Currently the most likely solution seems to be to implement
161+ the garbage collection in an operator.
162+
149163## Design Details
150164
151165The feature gate ` ContainerCheckpoint ` will ensure that the API
@@ -244,13 +258,29 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
244258 Once CRI implementation provide the relevant RPC calls
245259 the e2e tests will not fail but need to be extended.
246260
261+ - Once the initial Alpha release CRI-O supports the
262+ ` CheckpointContainer ` CRI RPC and tests have been
263+ enhanced to support CRI implementation that implement
264+ the ` CheckpointContainer ` CRI RPC
265+
266+ - Once Kubernetes was released with the ` CheckpointContainer ` CRI RPC
267+ CRI-O has been updated to support the new CRI RPC.
268+ The tests have been enhanced to work with CRI implementations
269+ that support the ` CheckpointContainer ` CRI RPC as well as
270+ CRI implementations that do not support it. The tests also handle
271+ if the corresponding feature gate is disabled or enabled:
272+ < https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/checkpoint_container.go >
273+
247274### Graduation Criteria
248275
249276#### Alpha
250277
251- - [ ] Implement the new feature gate and kubelet implementation
252- - [ ] Ensure proper tests are in place
253- - [ ] Update documentation to make the feature visible
278+ - [X] Implement the new feature gate and kubelet implementation
279+ - [X] Ensure proper tests are in place
280+ - [X] Update documentation to make the feature visible
281+ - < https://kubernetes.io/docs/reference/node/kubelet-checkpoint-api/ >
282+ - < https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ >
283+ - < https://kubernetes.io/blog/2023/03/10/forensic-container-analysis/ >
254284
255285#### Alpha to Beta Graduation
256286
@@ -350,6 +380,7 @@ does not compress the checkpoint archive on disk.
350380* 2022-01-20: Reworked based on review and renamed feature gate to ` ContainerCheckpoint `
351381* 2022-04-05: Added CRI API section and targeted 1.25
352382* 2022-05-17: Remove * restore* RPC from the CRI API
383+ * 2023-10-09: Beta graduation in 1.30
353384
354385## Drawbacks
355386
0 commit comments