Skip to content

Commit 0ff3da6

Browse files
committed
Graduate "Forensic Container Checkpointing" to Beta
As defined in the existing KEP the steps to graduate from Alpha to Beta are At least one container engine has to have implemented the corresponding CRI APIs to introduce e2e test for checkpointing. - [ ] Enable the feature per default - [ ] No major bugs reported in the previous cycle CRI-O implemented the corresponding CRI RPC and no major bugs have been reported since the initial release in 1.25. Signed-off-by: Adrian Reber <[email protected]>
1 parent f451a19 commit 0ff3da6

File tree

3 files changed

+40
-7
lines changed

3 files changed

+40
-7
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2008
22
alpha:
33
approver: "@ehashman"
4+
beta:
5+
approver: "@deads2k"

keps/sig-node/2008-forensic-container-checkpointing/README.md

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,10 @@ message CheckpointContainerRequest {
125125
string container_id = 1;
126126
// Location of the checkpoint archive used for export/import
127127
string location = 2;
128+
// Timeout in seconds for the checkpoint to complete.
129+
// Timeout of zero means to use the CRI default.
130+
// Timeout > 0 means to use the user specified timeout.
131+
int64 timeout = 3;
128132
}
129133
130134
message CheckpointContainerResponse {}
@@ -146,6 +150,16 @@ In its first implementation the risks are low as it tries to be a CRI API
146150
change with minimal changes to the kubelet and it is gated by the feature
147151
gate `ContainerCheckpoint`.
148152

153+
One possible risk that was identified during Alpha is that the disk of
154+
the node requesting the checkpoints could fill up if too many checkpoints
155+
are created. One approach to solve this was some kind of garbage collection
156+
of checkpoint archives. A pull request to implement garbage collection
157+
was opened ([#115888](https://github.com/kubernetes/kubernetes/pull/115888))
158+
but during review it became clear that the kubelet might not be the right
159+
place to implement checkpoint archive garbage collection and the pull request
160+
was closed again. Currently the most likely solution seems to be to implement
161+
the garbage collection in an operator.
162+
149163
## Design Details
150164

151165
The feature gate `ContainerCheckpoint` will ensure that the API
@@ -244,13 +258,29 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
244258
Once CRI implementation provide the relevant RPC calls
245259
the e2e tests will not fail but need to be extended.
246260

261+
- Once the initial Alpha release CRI-O supports the
262+
`CheckpointContainer` CRI RPC and tests have been
263+
enhanced to support CRI implementation that implement
264+
the `CheckpointContainer` CRI RPC
265+
266+
- Once Kubernetes was released with the `CheckpointContainer` CRI RPC
267+
CRI-O has been updated to support the new CRI RPC.
268+
The tests have been enhanced to work with CRI implementations
269+
that support the `CheckpointContainer` CRI RPC as well as
270+
CRI implementations that do not support it. The tests also handle
271+
if the corresponding feature gate is disabled or enabled:
272+
<https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/checkpoint_container.go>
273+
247274
### Graduation Criteria
248275

249276
#### Alpha
250277

251-
- [ ] Implement the new feature gate and kubelet implementation
252-
- [ ] Ensure proper tests are in place
253-
- [ ] Update documentation to make the feature visible
278+
- [X] Implement the new feature gate and kubelet implementation
279+
- [X] Ensure proper tests are in place
280+
- [X] Update documentation to make the feature visible
281+
- <https://kubernetes.io/docs/reference/node/kubelet-checkpoint-api/>
282+
- <https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/>
283+
- <https://kubernetes.io/blog/2023/03/10/forensic-container-analysis/>
254284

255285
#### Alpha to Beta Graduation
256286

@@ -350,6 +380,7 @@ does not compress the checkpoint archive on disk.
350380
* 2022-01-20: Reworked based on review and renamed feature gate to `ContainerCheckpoint`
351381
* 2022-04-05: Added CRI API section and targeted 1.25
352382
* 2022-05-17: Remove *restore* RPC from the CRI API
383+
* 2023-10-09: Beta graduation in 1.30
353384

354385
## Drawbacks
355386

keps/sig-node/2008-forensic-container-checkpointing/kep.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,18 @@ approvers:
1515
- "@dchen1107"
1616

1717
# The target maturity stage in the current dev cycle for this KEP.
18-
stage: alpha
18+
stage: beta
1919

2020
# The most recent milestone for which work toward delivery of this KEP has been
2121
# done. This can be the current (upcoming) milestone, if it is being actively
2222
# worked on.
23-
latest-milestone: "v1.25"
23+
latest-milestone: "v1.30"
2424

2525
# The milestone at which this feature was, or is targeted to be, at each stage.
2626
milestone:
2727
alpha: "v1.25"
28-
beta: "v1.26"
29-
stable: "v1.28"
28+
beta: "v1.30"
29+
stable: "v1.33"
3030

3131
# The following PRR answers are required at alpha release
3232
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)