-
Notifications
You must be signed in to change notification settings - Fork 181
Returning the latest resource from the Delete() call #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returning the latest resource from the Delete() call #21
Conversation
pkg/runtime/reconciler.go
Outdated
if errors.As(err, &requeueNeededAfter) || | ||
errors.As(err, &requeueNeeded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in favor of using errors.As
here, first because it's using the reflect
library and second because it's a code that can panic if misused https://golang.org/src/errors/wrap.go?s=2494:2537#L67
Another small remark: &requeueNeededAfter
is a "pointer to a pointer to a requeue.RequeueNeededAfter
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use Go type assertions/switches for this use case. Something similar to:
_, ok := err.(*requeue.RequeueNeededAfter)
if ok {
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated code per the comment.
pkg/runtime/reconciler.go
Outdated
observed, _ := rm.ReadOne(ctx, current) | ||
if observed != nil { | ||
_ = r.patchResource(ctx, current, observed) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ignore the returned errors here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code executes when Delete api returned error and requeue is requested. ReadOne error is ignored as during the next reconciler loop run (due to requeue) the correct course of action would be taken.
Error from path resources are ignored throughout in the reconciler at other places as well, thus followed the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code executes when Delete api returned error and requeue is requested. ReadOne error is ignored as during the next reconciler loop run (due to requeue) the correct course of action would be taken.
Error from path resources are ignored throughout in the reconciler at other places as well, thus followed the same.
@kumargauravsharma I think @a-hilaly was referring to ignoring the returned error from rm.ReadOne()
, not patchResource... (I think?) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the comment was for both.
For error during read one:
There are two errors available at that time a) error returned from Delete() method b) error returned from Delete call.
The code returns the error received from Delete call as that is the primary operation being done in this method.
ReadOne error is ignored as during the next reconciler loop run (due to requeue) the correct course of action would be taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should at least log these errors. It would be hard to track them if the metrics server is reporting few 4XX/5XX but logs are not showing any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should at least log these errors. It would be hard to track them if the metrics server is reporting few 4XX/5XX but logs are not showing any
nvm this was added in the recent logging PR :)
@kumargauravsharma if a call to DeleteReplicationGroup returns success, can the ReplicationGroup ever end up in any state other than deleted? |
5d76a72
to
6b04199
Compare
@jaypipes no error from DeleteReplicationGroup API call set the replication group state as 'deleting'. It may take some time for it to get deleted, meanwhile the resource describe api does return the resource details as delete progresses. |
(apologies for the late reply, I'm on PTO this week...) @kumargauravsharma ok, thanks for the info. I just want to make sure I'm understanding your use cases clearly...you want this functionality in order to match the Kubernetes API experience to the Elasticache API experience: that the replication group resource continues to appear in a DescribeReplicationGroup API call -- and the ReplicationGroup CR will continue to appear in calls to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kumargauravsharma please see inline for a suggested alternative approach that I think would align our Delete method more with the other ReadOne/Create/Update resource manager methods...
pkg/runtime/reconciler.go
Outdated
observed, _ := rm.ReadOne(ctx, current) | ||
if observed != nil { | ||
_ = r.patchResource(ctx, current, observed) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code executes when Delete api returned error and requeue is requested. ReadOne error is ignored as during the next reconciler loop run (due to requeue) the correct course of action would be taken.
Error from path resources are ignored throughout in the reconciler at other places as well, thus followed the same.
@kumargauravsharma I think @a-hilaly was referring to ignoring the returned error from rm.ReadOne()
, not patchResource... (I think?) :)
pkg/runtime/reconciler.go
Outdated
} | ||
return err | ||
} | ||
if err = rm.Delete(ctx, observed); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding in the logic below to call rm.ReadOne()
, how about changing the rm.Delete()
method signature to return (acktypes.AWSResource, error)
similar to the acktypes.AWSResourceManager:ReadOne()
, Create()
and Update()
methods?
That way, rm.Delete()
implementations for asynchronously-completing delete operations (like Elasticache's RGs) could simply return a modified AWSResource
containing the fields that should be saved via patchResource()
? Then the below code could be simplified to this:
latest, err = rm.Delete(ctx, observed)
if latest != nil {
// The Delete operation is likely asynchronous and has likely set a Status
// field on the returned CR to something like `deleting`. Here, we patchResource()
// in order to save these Status field modifications.
_ = r.patchResource(ctx, latest)
}
if err != nil {
// NOTE: Delete() implementations that have asynchronously-completing
// deletions should return a RequeueNeededAfter.
return err
}
thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this approach a thought while creating this PR and mentioned its pros, cons in the PR description at that time, please refer section with text in PR description:
Alternate approach that was considered ...
Downside of this approach was the scope of the changes that it brings: changes to the API signature of resource manager delete, sdkDelete method, and controllers would need to be regenerated with unit tests for delete and code generator templates.
However, I prefer this approach of changing API return type and this PR can be updated for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this approach a thought while creating this PR and mentioned its pros, cons in the PR description at that time, please refer section with text in PR description:
Alternate approach that was considered ...
Indeed you did :) Sorry I missed that.
However, I prefer this approach of changing API return type and this PR can be updated for it.
OK, cool. Let's cut a release of runtime (and probably code-generator) with the enhancements made over the last couple days and then modify this PR to change the Delete() method signature.
Work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks Jay!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this approach for two reasons. SageMaker has resources where:
- Delete = Stop so readOne will never be NotFound (InProgress -(call delete)-> Stopping -> Stopped)
- Delete resource returns success, but can end up
DeleteFailed
state so we can add custom code to retry and keep returning requeueerror until the resource is actually cleaned up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kumargauravsharma any plan to update this PR soon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this PR with the approach that we concluded.
Corresponding code-generator changes are available in PR: aws-controllers-k8s/code-generator#114
The PR: aws-controllers-k8s/elasticache-controller#34 takes care of ensuring that the ReplicationGroup CR exists for the period of time while delete is in progress. nit: once the ReplicationGroup gets deleted on service side, the sdkFind returns not found exception, as there is no ' deleted' status. |
6b04199
to
4eeee42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your patience @kumargauravsharma.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jaypipes, kumargauravsharma, surajkota The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Related issue: aws-controllers-k8s/community#836
This PR contains code changes to patch custom resource with latest details during long running delete.
It does so if resource manager's Delete function return requeue related errors.
The expectation is that resource manager's ReadOne method would set appropriate conditions (example:
ACK.ResourceSynced
to false with message) and provide latest details about the resource which is being deleted.Conditions can also be set at some common place where common conditions logic should reside.
Testing:
make test
passed for runtime.Alternate approach that was considered, but was not taken up as it involved non-intrusive changes to resource manager's APIs. Following are the details:
Currently, resource manager's
Delete
API returns only one data field of typeerror
, it does not returnresource
.Also, the
sdk.go::sdkDelete()
logic does not parse the return result output of service delete API.It can be updated such that
sdk.go::sdkDelete()
logic parses the output response, and allows asdk_delete_post_set_output
hook for service controllers to have custom logic (as needed). The resultant resource reference would then be returned from this method.In case, requeue is required then the return values from this method will have both resource, error (requeue type) as not nil.
And the reconciler:cleanup() logic here, would patch the current resource with the returned resource from
rm.Delete()
call and return the error (if any).Though this change is similar to the approach taken for resource manager ReadOne here, this is more intrusive as it changes the Delete API's return type.
However, it does have advantage that there is no need for another ReadOne call from reconciler, and also provides flexibility to service controller's to implement custom hook specific to delete scenario.
[EDIT]: the alternate approach that is mentioned above, is the final approach taken per the discussion/comments in this PR.
Related PR in code-generator: aws-controllers-k8s/code-generator#114