Skip to content

Conversation

echen-98
Copy link
Contributor

Description of changes:

Local e2e test run:

============================================================================ test session starts =============================================================================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /elasticache-controller/tests/e2e
plugins: xdist-2.2.0, forked-1.3.0
[gw0] Python 3.8.10 (default, May  4 2021, 19:07:02)  -- [GCC 10.2.1 20201203]
[gw1] Python 3.8.10 (default, May  4 2021, 19:07:02)  -- [GCC 10.2.1 20201203]
[gw2] Python 3.8.10 (default, May  4 2021, 19:07:02)  -- [GCC 10.2.1 20201203]
[gw3] Python 3.8.10 (default, May  4 2021, 19:07:02)  -- [GCC 10.2.1 20201203]
[gw4] Python 3.8.10 (default, May  4 2021, 19:07:02)  -- [GCC 10.2.1 20201203]
[gw5] Python 3.8.10 (default, May  4 2021, 19:07:02)  -- [GCC 10.2.1 20201203]
gw0 [9] / gw1 [9] / gw2 [9] / gw3 [9] / gw4 [9] / gw5 [9]
scheduling tests via LoadScheduling

tests/test_replicationgroup.py::TestReplicationGroup::test_rg_input_coverage 
tests/test_replicationgroup.py::TestReplicationGroup::test_rg_cmd_update 
tests/test_replicationgroup.py::TestReplicationGroup::test_rg_delete 
tests/test_replicationgroup_largecluster.py::TestReplicationGroupLargeCluster::test_rg_largecluster 
tests/test_replicationgroup.py::TestReplicationGroup::test_rg_cmd_fromsnapshot 
tests/test_replicationgroup.py::TestReplicationGroup::test_rg_auth_token 
[gw5] [ 11%] SKIPPED tests/test_replicationgroup_largecluster.py::TestReplicationGroupLargeCluster::test_rg_largecluster 
[gw3] [ 22%] SKIPPED tests/test_replicationgroup.py::TestReplicationGroup::test_rg_auth_token x
[gw1] [ 33%] PASSED tests/test_replicationgroup.py::TestReplicationGroup::test_rg_cmd_fromsnapshot 
[gw0] [ 44%] PASSED tests/test_replicationgroup.py::TestReplicationGroup::test_rg_input_coverage 
[gw4] [ 55%] PASSED tests/test_replicationgroup.py::TestReplicationGroup::test_rg_delete 
tests/test_user.py::TestUser::test_CRUD 
[gw1] [ 66%] PASSED tests/test_user.py::TestUser::test_CRUD 
tests/test_snapshot.py::TestSnapshot::test_snapshot_kms 
[gw2] [ 77%] PASSED tests/test_replicationgroup.py::TestReplicationGroup::test_rg_cmd_update 
[gw0] [ 88%] PASSED tests/test_snapshot.py::TestSnapshot::test_snapshot_kms 
tests/test_usergroup.py::TestUserGroup::test_user_group_create_update 
[gw2] [100%] PASSED tests/test_usergroup.py::TestUserGroup::test_user_group_create_update 

================================================================= 7 passed, 2 skipped in 1688.28s (0:28:08) ==================================================================

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ack-bot ack-bot requested review from jaypipes and nmvk June 16, 2021 23:03
@echen-98
Copy link
Contributor Author

/ok-to-test
/test elasticache-kind-e2e

@ack-bot ack-bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jun 16, 2021
Copy link
Contributor

@nmvk nmvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


//TODO: revert the call to Spec.AccessString once removeFromDelta implementation changes
removeFromDelta(delta, "AccessString")
removeFromDelta(delta, "Spec.AccessString")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have multiple "AccessString" fields in the Status struct?

LastRequestedAccessString:
is_read_only: true
from:
operation: CreateUser
path: AccessString
ExpandedAccessString:
is_read_only: true
from:
operation: CreateUser
path: AccessString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The access string provided by the ElastiCache API differs from the specified access string (there's some behind-the-scenes expansion).

Because there's no straightforward way to translate between the two representations, adding a state-keeping field Status.LastRequestedAccessString allows us to determine when the desired access string has changed. This field is only updated upon a successful create or update (see the set_output methods).

Since the generated delta code will always detect a difference between desired.Spec.AccessString and latest.Spec.AccessString there's a hook after that to then remove the difference if desired.Spec.AccessString is still equal to desired.Status.LastRequestAccessString. See filterDelta() in pkg/resource/user/delta_util.go.

Status.ExpandedAccessString is purely informational and was added because that's what the user would see if they were directly using the ElastiCache API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echen-98 why aren't you simply updating Spec.AccessString to the actual expanded value upon return from CreateUser and just get rid of the two other fields in the Status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaypipes do you mean updating desired.Spec.AccessString or latest.Spec.AccessString? We're already doing the latter, and unless I'm missing something big, I thought the former wasn't possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. I thought patchResource() only patched Status because desired.Spec should not be modified

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaypipes

desired.Spec.AccessString: on ~app::* -@all +@read
latest.Spec.AccessString: on ~app::* -@all +@read +@hash +@bitmap +@geo -setbit -bitfield -hset -hsetnx -hmset -hincrby -hincrbyfloat -hdel -bitop -geoadd -georadius -georadiusbymember

There is always a Spec difference in AccessString. Without any custom code, this path gets added to the delta, and an Update is always invoked. So the default behavior of the generated code is to make (mostly useless) Update calls in an infinite loop.

If we were saving Spec.AccessString as the value returned by the server after a CreateUser or UpdateUser call, there would be no infinite loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the Create case, it was my impression that in the reconciler, after a resource is found to not exist and a Create is called, then a ReadOne will be called next. If this ReadOne populates latest.Spec.AccessString with the expanded access string, there will be a Spec difference (since desired.Spec has the user-requested access string). And as long as there is a Spec difference, then Update will be invoked.

What I'm trying to say is it doesn't matter whether we save the server-returned value in latest.Spec.AccessString in a Create or Update call (especially since latest.Spec is thrown away upon patching in these two cases). The infinite loop depends solely upon what happens in ReadOne, and ReadOne will always populate latest.Spec.AccessString with a value different from desired.Spec.AccessString.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should add that there may not be an infinite loop (that was the initial behavior when I submitted my first iteration of the User code before the split repos, and there have been several runtime changes since then). However, this would still cause an unnecessary Update call after the creation of the resource, because the ReadOne right after the Create would pick up the Spec difference.

Regardless, the point is that we need to store the last requested access string. Otherwise we won't know when the desired access string has changed. If the user updates another field other than the access string, we want to know whether or not to add the access string to the Update payload (I don't think it's best practice to add a field unless it actually has changed).

This is the approach that we decided on before the split repos, and we decided that we should keep it once the User resource was brought back into scope. It works, and if it's sub-optimal, that just reflects that we sometimes have to do sub-optimal things when dealing with the ElastiCache API.

The real point of this discussion is the more general question of: how do you handle the case where an API returns a value different from the one requested? I think that's a very important question to answer, but I also think that this PR is not the place to have this discussion. The point of this PR was to finally have succeeding Prow runs, and to continue to block it until we have a clear answer to that question doesn't make sense to me.

Also, even if we were to settle on a different solution, that would be in a follow-up PR anyway.

echen-98/ack-community@f059064
#32

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @echen-98
We shall let this PR go through and have different issues to track discussion on Spec, Status population and Delta computation and resource patching related aspects for special fields (readonly, server modifiable, not specified in spec but server provided defaults etc.).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaypipes please suggest if its fine to merge this PR in its current state.

resource = k8s.get_resource(reference)
assert resource['metadata']['deletionTimestamp'] is not None
# uncomment when reconciler->cleanup() invokes patchResource()
# TODO: uncomment when reconciler->cleanup() invokes patchResource()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reconciler->cleanup() does not invoke patchResource() because it finalizes the deletion of the resource from Kubernetes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That discussion is out of scope of this PR, I just added a TODO here since it felt like the comment needed one.

More details:
aws-controllers-k8s/runtime#21
#34

@kumargauravsharma can elaborate if needed.



@pytest.fixture(scope="module")
def rg_delete(rg_delete_input, make_replication_group):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function doesn't delete anything...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed it to the noun form "deletion" to avoid further confusion

@echen-98 echen-98 changed the title address regression with User code, fix RG delete test address regression with User code, fix RG deletion test Jun 22, 2021
@echen-98 echen-98 force-pushed the test-coverage branch 2 times, most recently from f05544d to 46bd4d4 Compare June 22, 2021 19:16
Copy link
Contributor

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ack-bot
Copy link
Collaborator

ack-bot commented Jun 22, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: echen-98, jaypipes, kumargauravsharma, nmvk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [echen-98,jaypipes,kumargauravsharma,nmvk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@echen-98
Copy link
Contributor Author

Missed some references to renamed fixtures, addressing those and waiting for tests to pass

@echen-98
Copy link
Contributor Author

previous run failed during Docker build step?
/test elasticache-kind-e2e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants