Skip to content

Commit 679d5d3

Browse files
committed
Called out UNRESOLVED issues.
More on the SP service and sidecar.
1 parent 20645b9 commit 679d5d3

File tree

1 file changed

+68
-5
lines changed
  • keps/sig-storage/3314-csi-changed-block-tracking

1 file changed

+68
-5
lines changed

keps/sig-storage/3314-csi-changed-block-tracking/README.md

Lines changed: 68 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -785,9 +785,27 @@ The following conditions are well defined:
785785
#### Capability Flags
786786

787787
The gRPC service is optional and its availability is indicated by ...
788-
> @TODO How should the availability of this service be advertised through CSI interfaces?
789-
> Is this mandatory?
790-
> It is pretty obviously sensed in K8s by looking for SnapshotSessionConfiguration objects.
788+
```
789+
<<[UNRESOLVED How should the availability of this service be advertised through CSI interfaces?]>>
790+
Is it mandatory to provide such a mechanism via the CSI interfaces?
791+
It makes no sense to add a capability flag to the `GetPluginCapabilities` RPC of the
792+
`Identity` service, because a backup application cannot access this service.
793+
794+
It is pretty easy for a backup application to look for the SnapshotSessionConfiguration object
795+
of the CSI driver of the volume snapshots.
796+
A session would eventually expire if a SnapshotServiceConfiguration for the CSI driver were not found.
797+
Should the manager fail such sessions immediately?
798+
This would probably be more convenient from the backup application perspective,
799+
especially if there is a well defined error message returned.
800+
801+
Thinking ahead to the possibility of a companion data service, then a Capability RPC
802+
in the SnapshotMetadata service makes sense, because it can be used later to advertise
803+
the existence of this data service.
804+
It is also possible that if block v/s extent is fixed for a given driver then that
805+
could be returned here, though I'm not sure about block size - it may could vary
806+
by volume.
807+
<<[/UNRESOLVED]>>
808+
```
791809

792810

793811
### Kubernetes Components
@@ -858,7 +876,8 @@ individually transition each such CR to a terminal state as follows:
858876
from their associated VolumeSnapshotContent objects.
859877
If a VolumeSnapshot or associated VolumeSnapshotContent cannot be found, or
860878
if more than one CSI driver is involved then the manager will
861-
change the state of the CR to `Failed`.
879+
change the state of the CR to `Failed`
880+
and provide an appropriate error message in the `error` field.
862881
- The manager searches for the
863882
[SnapshotSessionConfiguration CR](#snapshotserviceconfiguration)
864883
created by the CSI driver. When it finds this object it will create a
@@ -876,10 +895,24 @@ the
876895
[SnapshotSessionConfiguration CR](#snapshotserviceconfiguration)
877896
to the `Failed` state.
878897

879-
Periodically, the manager will scan all
898+
Periodically, the manager will scan for all
880899
[SnapshotSessionConfiguration CRs](#snapshotserviceconfiguration)
900+
and all [SnapshotSessionData CRs](#snapshotsessiondata)
881901
and delete those that have passed their expiry time.
882902

903+
```
904+
<<[UNRESOLVED Is there a guaranteed minimum time for a session?]>>
905+
How is a session expiry time to be computed?
906+
907+
- Is it fixed, say by a ConfigMap the manager could reference?
908+
(One size-fits-all-CSI-drivers)
909+
- Does it depend on the number of snapshots specified in the session?
910+
- Does it depend on the size of the underlying volumes?
911+
912+
The ConfigMap would be the easiest to implement. The other variants
913+
would require interaction with the SP.
914+
<<[/UNRESOLVED]>>
915+
```
883916

884917
### The External Snapshot Session Sidecar
885918

@@ -909,6 +942,26 @@ in this proposal, including
909942
- Validating individual RPC arguments.
910943
- Translating RPC arguments from the Kubernetes domain to the SP domain at runtime.
911944

945+
The sidecar will attempt to load the
946+
[SnapshotSessionData CR](#snapshotsessiondata) in its namespace with the name
947+
provided by the value of the `session_token` input argument of an RPC call.
948+
If the object is not found or is found to have expired, or that the
949+
VolumeSnapshots specified in the RPC call
950+
are not mentioned in the [SnapshotSessionData CR](#snapshotsessiondata),
951+
then the RPC call will be failed.
952+
953+
The sidecar will attempt to load the VolumeSnapshots specified in an RPC call,
954+
along with their associated VolumeSnapshotContent objects, to ensure that they still
955+
exist and to obtain the SP identifiers for the snapshots.
956+
Additional checks may be performed depending on the RPC.
957+
(For example, in the case of a [GetDelta](#getdelta-rpc) RPC, it will check
958+
that all the snapshots come from the same volume and that the
959+
snapshot order is correct.)
960+
If all checks are successful, the RPC call is proxied to the
961+
[SP Snapshot Session Service](#the-sp-snapshot-session-service)
962+
over the UNIX domain socket, with its input parameters appropriately
963+
translated. The metadata result stream is proxied back to the calling client
964+
without any transformation.
912965

913966
### The SP Snapshot Session Service
914967

@@ -917,6 +970,16 @@ This container must be configured to communicate with the community provided
917970
[external-snapshot-session sidecar](#the-external-snapshot-session-sidecar)
918971
over a UNIX domain socket.
919972

973+
The SP service decides whether the metadata is returned in *block-based* format
974+
(`block_metadata_type` is `FIXED_LENGTH`)
975+
or *extent-based* format (`block_metadata_type` is `VARIABLE_LENGTH`).
976+
In any given RPC call, the `block_metadata_type` and `volume_size_bytes` return
977+
properties should be constant;
978+
likewise the `size_bytes` of all `BlockMetadata` entries if the `block_metadata_type`
979+
value returned is `FIXED_LENGTH`.
980+
981+
The SP service should ignore the `session-token` input argument in the RPC calls,
982+
though it may include the value in log records for correlation with the sidecar logs.
920983

921984
### Test Plan
922985

0 commit comments

Comments
 (0)