Returning the latest resource from the Delete() call #21

kumargauravsharma · 2021-06-16T02:34:01Z

Related issue: aws-controllers-k8s/community#836

This PR contains code changes to patch custom resource with latest details during long running delete.
It does so if resource manager's Delete function return requeue related errors.
The expectation is that resource manager's ReadOne method would set appropriate conditions (example: ACK.ResourceSynced to false with message) and provide latest details about the resource which is being deleted.
Conditions can also be set at some common place where common conditions logic should reside.

Testing:
make test passed for runtime.

Alternate approach that was considered, but was not taken up as it involved non-intrusive changes to resource manager's APIs. Following are the details:
Currently, resource manager's Delete API returns only one data field of type error, it does not return resource.
Also, the sdk.go::sdkDelete() logic does not parse the return result output of service delete API.

It can be updated such that sdk.go::sdkDelete() logic parses the output response, and allows a sdk_delete_post_set_output hook for service controllers to have custom logic (as needed). The resultant resource reference would then be returned from this method.
In case, requeue is required then the return values from this method will have both resource, error (requeue type) as not nil.
And the reconciler:cleanup() logic here, would patch the current resource with the returned resource from rm.Delete() call and return the error (if any).

Though this change is similar to the approach taken for resource manager ReadOne here, this is more intrusive as it changes the Delete API's return type.
However, it does have advantage that there is no need for another ReadOne call from reconciler, and also provides flexibility to service controller's to implement custom hook specific to delete scenario.

[EDIT]: the alternate approach that is mentioned above, is the final approach taken per the discussion/comments in this PR.
Related PR in code-generator: aws-controllers-k8s/code-generator#114

a-hilaly · 2021-06-16T18:08:07Z

pkg/runtime/reconciler.go

+		if errors.As(err, &requeueNeededAfter) ||
+			errors.As(err, &requeueNeeded) {


I'm not in favor of using errors.As here, first because it's using the reflect library and second because it's a code that can panic if misused https://golang.org/src/errors/wrap.go?s=2494:2537#L67

Another small remark: &requeueNeededAfter is a "pointer to a pointer to a requeue.RequeueNeededAfter"

You can use Go type assertions/switches for this use case. Something similar to:

_, ok := err.(*requeue.RequeueNeededAfter) if ok { // ... }

Updated code per the comment.

a-hilaly · 2021-06-16T18:08:49Z

pkg/runtime/reconciler.go

+			observed, _ := rm.ReadOne(ctx, current)
+			if observed != nil {
+				_ = r.patchResource(ctx, current, observed)
+			}


Why ignore the returned errors here?

This block of code executes when Delete api returned error and requeue is requested. ReadOne error is ignored as during the next reconciler loop run (due to requeue) the correct course of action would be taken.
Error from path resources are ignored throughout in the reconciler at other places as well, thus followed the same.

This block of code executes when Delete api returned error and requeue is requested. ReadOne error is ignored as during the next reconciler loop run (due to requeue) the correct course of action would be taken.
Error from path resources are ignored throughout in the reconciler at other places as well, thus followed the same.

@kumargauravsharma I think @a-hilaly was referring to ignoring the returned error from rm.ReadOne(), not patchResource... (I think?) :)

I thought the comment was for both.
For error during read one:
There are two errors available at that time a) error returned from Delete() method b) error returned from Delete call.
The code returns the error received from Delete call as that is the primary operation being done in this method.
ReadOne error is ignored as during the next reconciler loop run (due to requeue) the correct course of action would be taken.

IMO we should at least log these errors. It would be hard to track them if the metrics server is reporting few 4XX/5XX but logs are not showing any

IMO we should at least log these errors. It would be hard to track them if the metrics server is reporting few 4XX/5XX but logs are not showing any

nvm this was added in the recent logging PR :)

jaypipes · 2021-06-17T00:32:05Z

@kumargauravsharma if a call to DeleteReplicationGroup returns success, can the ReplicationGroup ever end up in any state other than deleted?

kumargauravsharma · 2021-06-21T16:26:39Z

@jaypipes no error from DeleteReplicationGroup API call set the replication group state as 'deleting'. It may take some time for it to get deleted, meanwhile the resource describe api does return the resource details as delete progresses.

jaypipes · 2021-06-21T21:43:14Z

@jaypipes no error from DeleteReplicationGroup API call set the replication group state as 'deleting'. It may take some time for it to get deleted, meanwhile the resource describe api does return the resource details as delete progresses.

(apologies for the late reply, I'm on PTO this week...)

@kumargauravsharma ok, thanks for the info. I just want to make sure I'm understanding your use cases clearly...you want this functionality in order to match the Kubernetes API experience to the Elasticache API experience: that the replication group resource continues to appear in a DescribeReplicationGroup API call -- and the ReplicationGroup CR will continue to appear in calls to kubectl describe replicationgroups/{name} with a Status.Status == deleting -- even after a DeleteReplicationGroup API call returns no error and the replication group will only ever transition to a deleted state (and not appear in a DescribeReplicationGroup API call). This is about allowing the ReplicationGroup CR to exist for that period of time while it is in progress of being deleted, yes?

jaypipes

@kumargauravsharma please see inline for a suggested alternative approach that I think would align our Delete method more with the other ReadOne/Create/Update resource manager methods...

jaypipes · 2021-06-21T21:47:10Z

pkg/runtime/reconciler.go

+			observed, _ := rm.ReadOne(ctx, current)
+			if observed != nil {
+				_ = r.patchResource(ctx, current, observed)
+			}


This block of code executes when Delete api returned error and requeue is requested. ReadOne error is ignored as during the next reconciler loop run (due to requeue) the correct course of action would be taken.
Error from path resources are ignored throughout in the reconciler at other places as well, thus followed the same.

@kumargauravsharma I think @a-hilaly was referring to ignoring the returned error from rm.ReadOne(), not patchResource... (I think?) :)

jaypipes · 2021-06-21T21:54:37Z

pkg/runtime/reconciler.go

 		}
 		return err
 	}
 	if err = rm.Delete(ctx, observed); err != nil {


Instead of adding in the logic below to call rm.ReadOne(), how about changing the rm.Delete() method signature to return (acktypes.AWSResource, error) similar to the acktypes.AWSResourceManager:ReadOne(), Create() and Update() methods?

That way, rm.Delete() implementations for asynchronously-completing delete operations (like Elasticache's RGs) could simply return a modified AWSResource containing the fields that should be saved via patchResource()? Then the below code could be simplified to this:

latest, err = rm.Delete(ctx, observed) if latest != nil { // The Delete operation is likely asynchronous and has likely set a Status // field on the returned CR to something like `deleting`. Here, we patchResource() // in order to save these Status field modifications. _ = r.patchResource(ctx, latest) } if err != nil { // NOTE: Delete() implementations that have asynchronously-completing // deletions should return a RequeueNeededAfter. return err }

thoughts?

I gave this approach a thought while creating this PR and mentioned its pros, cons in the PR description at that time, please refer section with text in PR description:

Alternate approach that was considered ...

Downside of this approach was the scope of the changes that it brings: changes to the API signature of resource manager delete, sdkDelete method, and controllers would need to be regenerated with unit tests for delete and code generator templates.

However, I prefer this approach of changing API return type and this PR can be updated for it.

I gave this approach a thought while creating this PR and mentioned its pros, cons in the PR description at that time, please refer section with text in PR description:

Alternate approach that was considered ...

Indeed you did :) Sorry I missed that.

However, I prefer this approach of changing API return type and this PR can be updated for it.

OK, cool. Let's cut a release of runtime (and probably code-generator) with the enhancements made over the last couple days and then modify this PR to change the Delete() method signature.

Work for you?

Sure, thanks Jay!

+1 for this approach for two reasons. SageMaker has resources where:

Delete = Stop so readOne will never be NotFound (InProgress -(call delete)-> Stopping -> Stopped)

Delete resource returns success, but can end up DeleteFailed state so we can add custom code to retry and keep returning requeueerror until the resource is actually cleaned up

@kumargauravsharma any plan to update this PR soon?

I have updated this PR with the approach that we concluded.
Corresponding code-generator changes are available in PR: aws-controllers-k8s/code-generator#114

kumargauravsharma · 2021-06-22T08:55:40Z

@jaypipes no error from DeleteReplicationGroup API call set the replication group state as 'deleting'. It may take some time for it to get deleted, meanwhile the resource describe api does return the resource details as delete progresses.

(apologies for the late reply, I'm on PTO this week...)

@kumargauravsharma ok, thanks for the info. I just want to make sure I'm understanding your use cases clearly...you want this functionality in order to match the Kubernetes API experience to the Elasticache API experience: that the replication group resource continues to appear in a DescribeReplicationGroup API call -- and the ReplicationGroup CR will continue to appear in calls to kubectl describe replicationgroups/{name} with a Status.Status == deleting -- even after a DeleteReplicationGroup API call returns no error and the replication group will only ever transition to a deleted state (and not appear in a DescribeReplicationGroup API call). This is about allowing the ReplicationGroup CR to exist for that period of time while it is in progress of being deleted, yes?

The PR: aws-controllers-k8s/elasticache-controller#34 takes care of ensuring that the ReplicationGroup CR exists for the period of time while delete is in progress.
However, the ReplicationGroup CR becomes out of sync from its latest state on service side. Thus, this PR #21 was to fix that behavior of reconciler.

nit: once the ReplicationGroup gets deleted on service side, the sdkFind returns not found exception, as there is no ' deleted' status.

jaypipes

thanks for your patience @kumargauravsharma.

jaypipes · 2021-07-12T17:40:48Z

/lgtm

ack-bot · 2021-07-12T17:40:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jaypipes, kumargauravsharma, surajkota

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jaypipes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ack-bot requested review from jaypipes and vijtrip2 June 16, 2021 02:34

a-hilaly reviewed Jun 16, 2021

View reviewed changes

echen-98 mentioned this pull request Jun 17, 2021

address regression with User code, fix RG deletion test aws-controllers-k8s/elasticache-controller#35

Merged

kumargauravsharma force-pushed the delete-progress branch from 5d76a72 to 6b04199 Compare June 21, 2021 16:21

kumargauravsharma requested a review from a-hilaly June 21, 2021 16:26

jaypipes reviewed Jun 21, 2021

View reviewed changes

Provide custom resource delete progress for long running delete

4eeee42

kumargauravsharma force-pushed the delete-progress branch from 6b04199 to 4eeee42 Compare July 2, 2021 05:33

kumargauravsharma mentioned this pull request Jul 2, 2021

Returning the latest resource from the Delete() call aws-controllers-k8s/code-generator#114

Merged

kumargauravsharma requested review from jaypipes and surajkota July 7, 2021 18:56

surajkota approved these changes Jul 8, 2021

View reviewed changes

kumargauravsharma changed the title ~~Provide custom resource delete progress for long running delete~~ Returning the latest resource from the Delete() call Jul 12, 2021

jaypipes approved these changes Jul 12, 2021

View reviewed changes

ack-bot added the approved label Jul 12, 2021

ack-bot assigned jaypipes Jul 12, 2021

ack-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2021

ack-bot merged commit 308bc45 into aws-controllers-k8s:main Jul 12, 2021

		if errors.As(err, &requeueNeededAfter) \|\|
		errors.As(err, &requeueNeeded) {

Returning the latest resource from the Delete() call #21

Returning the latest resource from the Delete() call #21

Uh oh!

Conversation

kumargauravsharma commented Jun 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-hilaly Jun 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a-hilaly Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaypipes commented Jun 17, 2021

Uh oh!

kumargauravsharma commented Jun 21, 2021

Uh oh!

jaypipes commented Jun 21, 2021

Uh oh!

jaypipes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kumargauravsharma Jun 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

surajkota Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

surajkota Jun 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kumargauravsharma commented Jun 22, 2021

Uh oh!

jaypipes left a comment

Choose a reason for hiding this comment

Uh oh!

jaypipes commented Jul 12, 2021

Uh oh!

ack-bot commented Jul 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kumargauravsharma commented Jun 16, 2021 •

edited

Loading

a-hilaly Jun 16, 2021 •

edited

Loading

a-hilaly Jun 22, 2021 •

edited

Loading

kumargauravsharma Jun 22, 2021 •

edited

Loading

surajkota Jun 23, 2021 •

edited

Loading

surajkota Jun 29, 2021 •

edited

Loading