Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions docs/design/proposals/destructive-operations/destructive-operations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Semantics of destructive operations

## Summary

This design document stems from the initial discussion on the [original Github issue #82](https://github.com/aws-controllers-k8s/community/issues/82).

When deleting an ACK custom resource, the underlying AWS resource is first deleted. Users should expect this to be the default behaviour, as creating and updating CR similarly have the same respective actions in AWS. However users have requested that some resources should continue to live on after the ACK CR has been deleted. Resources with stateful data, such as S3 buckets, RDS instance or EBS volumes, may want to be retained so that they may be migrated between ACK installations or off ACK entirely without requiring they be backed up, deleted and then restored.

Similar to the CDK[`RemovalPolicy` type](https://docs.aws.amazon.com/cdk/api/v2//docs/aws-cdk-lib.RemovalPolicy.html#members) and the native [K8s PV reclaim policy](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reclaiming), ACK should support the ability to leave the AWS resource alive while still deleting the custom resource object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest here... any time I see a reference to CloudFormation's (this is CDK, yes, but it's all CloudFormation...) DeletionPolicy, I shudder. I really despise the entire user experience of configuring some tool to behave different ways when the user takes a specific action (the action in this case is the DELETE action. This is a non-obvious interface and I am opposed to it. Instead, we should encourage a user experience that requires the user to explicitly call a different action in order to have the tool behave differently.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting with @RedbackThomson a bit more about this, I'm going to take a slightly less rigid approach to this. Kubernetes API unfortunately does not have a "control channel" for which to send instructions to a controller about how to handle a particular action. Instead, we have the metadata.annotations Hammer which is unfortunate IMHO because this is on the resource and not where it should be: as decorative data on the request.

Alas, the metadata.annotations Hammer is all we have so I will reconsider this proposal in light of that and see if I can come up with a variation that doesn't offend my sense of poor coupling.


## AWS Removal Policy

### Requirements

1. Multiple levels of granularity over resource protection (controller-wide, namespace-wide and resource-wide)
2. Protection should be defined per-service (if namespace-wide)
3. Users are able to define either of:
1. The AWS resource should be deleted along with the CR
2. The AWS resource is left in-tact after the CR is deleted

### Investigation

A number of K8s resources already have similar functionality:

* Helm uses an annotation: `helm.sh/resource-policy: keep`
* Google Config Connector uses an annotation: `cnrm.cloud.google.com/deletion-policy: abandon`
* Skaffold uses a label: `skaffold.dev/cleanup: "false"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were also proposals originally to use an annotation/label for our resource adoption functionality. Instead of doing the annotation/label on a CR, we chose a user experience where the user needs to explicitly adopt an AWS resource by creating a services.k8s.aws/AdoptedResource CR. I continue to believe that we made the right decision in this regard: an explicit adoption action (via the kubectl create -f <services.k8s.aws/AdoptedResource> process) is the least surprising, hardest to misuse interface.

For the reverse of adoption action, I would prefer to have a similarly explicit action, essentially an "unadopt" or "unmanage" action. We could do this in a couple ways:

  • Create a new kubectl plugin/verb called kubectl unmanage that the user could call to delete the CR but keep the AWS resource around
  • kubectl delete services.k8s.aws/AdoptedResource/<resource arn> and have the AdoptedResourceReconciler run some "delete the CR but keep the AWS resource around" code

* K8s PV use a spec field: `persistentVolumeReclaimPolicy``: Retain`


Existing AWS products have their own variants:

* AWS CDK uses a spec field: `removalPolicy: RemovalPolicy.RETAIN`
* CloudFormation uses a metadata field on the resource: `DeletionPolicy: Retain`

CloudFormation and CDK both provide the ability to snapshot given resources before deleting them (mostly databases). While this is outside of the scope for these requirements, it’s not outside the scope of ACK in the future.

## Proposed Option

The following proposed option will introduce:

* Two new annotations, one service-wide and one ACK-wide, to be applied on namespace and ACK resources
* One new optional command line argument for the controller binary

(Requirements 1) Following the current CARM design, which also requires the same levels of granularity, it is consistent to mark the policies in the annotation metadata for the namespace and resources, and to create a new command line argument on the controller for the controller-wide policies. The controller would place precedence on the resource policy, then the namespace policy and finally the controller policy - the annotation on the object with the highest precedence will set the policy for the given resource.

(Requirements 2) The full annotation path needs to consist of an annotation prefix (typically a URL) before the annotation name. One variation should concern every ACK controller regardless of service, and one should discriminate for controllers only concerned with a single service. Cross-service annotations, finalizers and CRDS are already configured with the `services.k8s.aws/` prefix. For this option, prepending the service identifier to the beginning would identify that the annotation applies only to the given service, eg. `s3.services.k8s.aws/` . Hence, there would be two prefixes:

* `services.k8s.aws/`: All ACK controllers will read the policy of this namespace/resource
* `<service>.services.k8s.aws/`: Only the <service> ACK controller will read the policy of this namespace (taking precedence over the former option)


The name of this annotation could be either:

* Follow the AWS variations (`RemovalPolicy` or `DeletionPolicy`) which would put it inline with expectations of current CDK and CloudFormation users
* Follow existing K8s variations (`ResourcePolicy` or `ReclaimPolicy`) which would put it inline with expectations of current K8s users

Users familiar with K8s would find the existing K8s variations more recognizable but may incorrectly assume that the annotation has the same values and behaves identically to native K8s resources. Therefore, it may be better to introduce the AWS terminology into the annotation - as the proposed functionality will behave identically to CDK and CloudFormation. Considering this annotation takes effect after the user calls “Delete”, I propose using `DeletionPolicy` as the annotation name.

(Requirement 3) In order to account for a future possibility of adding additional policy options, the value of this annotation should be an enumeration. Considering the name already references the existing CloudFormation `DeletionPolicy` field, the values should similarly follow those values:

* `Delete` - deletes the AWS resource alongside the CR
* `Retain` - retains the AWS resource, but deletes the CR

The default value for this annotation will be `Delete` as it is in-line with the existing ACK functionality of deleting the underlying AWS resource.

#### Examples

Definition at the controller - all resources reconciled by this controller will use the “retain” policy (unless overridden by an annotation):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think "retain" should be the default use-case. We can keep current implementation as default i.e. Deleting a resource cleans things up on AWS side as well. In my opinion this will be default thinking of most users.

And option to override this cli-option for production use, stateful resources and databases will give users ability to prevent accidental deletion.


Another less popular option can be if we get opinionated about AWS service controllers and store metadata about which service controllers will have "retain" as default option. Then helm/kustomization artifacts will only have deletion policy "retain" for those service controllers. However this bring inconsistency among service controllers by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think "retain" should be the default use-case

"delete" is the default. All default ACK functionality will remain the same for all considerations of this proposal. Only in this particular example do we override the default and set it to be "retain" at the command line argument level.

And option to override this cli-option for production use, stateful resources and databases will give users ability to prevent accidental deletion.

That is what this example is attempting to show.


```yaml
containers:
- command:
- ./bin/controller
args:
- --aws-account-id
- "$(AWS_ACCOUNT_ID)"
- --aws-region
- "$(AWS_REGION)"
- --enable-development-logging
- "$(ACK_ENABLE_DEVELOPMENT_LOGGING)"
- --log-level
- "$(ACK_LOG_LEVEL)"
- --resource-tags
- "$(ACK_RESOURCE_TAGS)"
+ - --deletion-policy
+ - "retain"
Comment on lines +86 to +87
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the drift remediation feature, we went for resource level configuration. On top of the --deletion-policy flag, do we want to add another flag to have the similar experience for deletion protection? thinking:

--resource-deletion-policy Function=Retain \
--resource-deletion-policy EventSourceMappings=Retain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I actually have some issues with the resource level configuration of drift remediation, in that it's a non-standard format and seems a little bit convoluted. But I'll leave those comments on your PR.

For now I'll stick to having a global deletion policy flag, and leave the door open to more granular CLI flags later

```

Definition at the namespace (protecting all ACK resources across any service):

```yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
services.k8s.aws/deletion-policy: retain
name: my-retained-namespace
```

Definition at the namespace (protecting only S3 ACK resources):

```yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
s3.services.k8s.aws/deletion-policy: retain
name: my-retained-namespace
```

Definition at the resource:

```yaml
apiVersion: s3.services.k8s.aws/v1alpha1
kind: Bucket
metadata:
name: test-bucket
annotations:
s3.services.k8s.aws/deletion-policy: retain
spec:
name: retained-bucket
```

## Backwards Compatibility and Risks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might out of scope for this proposal: Currently resources are deleted right after a DELETE API Call is successful. Some resources still hang for multiple minutes, and even days.. e.g RDS::DBInstance, CloudTrail::EDS ... few open questions:

  • Do we still want to continue having same user experience in the future?
  • If a user sets the deletion policy to Delete, do we want to keep the CR alive reflecting the real state of a resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user sets the deletion policy to Delete, do we want to keep the CR alive reflecting the real state of a resource?

A deletion policy of Delete should be identical to the current implementation. A deletion policy of Retain will immediately remove the CR from K8s (since it takes no synchronous action on the underlying AWS resource)


The proposed solution introduces two new annotations and an optional command line argument. The default behaviour for any resource without these annotations, or for the controller binary without this optional command, is backwards compatible with all current ACK controllers. There are no backwards compatibility issues with this solution.

TODO: Identify possible risks with this proposal