Fix session surviving cluster purge and recreate through cache #4162

roehrijn · 2023-03-22T14:38:34Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

AWS sessions need to be removed from cache in case of faulty retrieval from credentials providers. Otherwise a defective session can survive a delete of a cluster and recreate with same name but new credentials.

Steps to reproduce:

Create credentials for AWS account
Create a cluster with name XYZ using multi-tenancy for these AWS credentials
Delete this cluster with name XYZ
Create new credentials for AWS account
Create a cluster with same name XYZ using multi-tenancy for these new AWS credentials

Result: Due to the two-layered caching approach in sessionForClusterWithRegion()(pkg/cloud/scope/session.go:108) the session created using the credentials from step #1 is able to survive and still be used for the newly created cluster, which results in various 401/403 errors from AWS API.

This PR makes sure that the old session is explcitly evicted from cache as soon as it is no longer possible to retrieve the credentials from cached credential providers.

Checklist:

squashed commits
includes documentation
adds unit tests
adds or updates e2e tests

Release note:

Fix surviving defective session on cluster recreate with new AWS credentials

_{Jan Röhrich <[email protected]>, Mercedes-Benz Tech Innovation GmbH (Provider Information)}

linux-foundation-easycla · 2023-03-22T14:38:38Z

The committers listed above are authorized under a signed CLA.

✅ login: roehrijn / name: Jan Röhrich (3636d8f)

k8s-ci-robot · 2023-03-22T14:38:43Z

Welcome @roehrijn!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-aws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-aws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2023-03-22T14:38:44Z

Hi @roehrijn. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-03-22T14:44:13Z

@roehrijn: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test EasyCLA

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Skarlso · 2023-03-23T08:23:46Z

/ok-to-test

* session needs to be removed from cache in case of faulty retrieval from credentials providers

Skarlso · 2023-03-29T04:50:29Z

Hello @roehrijn. Would you please update the description to include release notes? Thanks.

/test ?

k8s-ci-robot · 2023-03-29T04:50:31Z

@Skarlso: The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-aws-build
/test pull-cluster-api-provider-aws-test
/test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-aws-apidiff-main
/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-e2e-clusterclass
/test pull-cluster-api-provider-aws-e2e-conformance
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e-eks-gc
/test pull-cluster-api-provider-aws-e2e-eks-testing

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-aws-apidiff-main
pull-cluster-api-provider-aws-build
pull-cluster-api-provider-aws-test
pull-cluster-api-provider-aws-verify

In response to this:

Hello @roehrijn. Would you please update the description to include release notes? Thanks.

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Skarlso · 2023-03-29T04:51:16Z

/test pull-cluster-api-provider-aws-e2e-eks

roehrijn · 2023-03-30T12:27:16Z

Hi @Skarlso,

Hello @roehrijn. Would you please update the description to include release notes? Thanks.

I thought I already did this - propably before your comment. But now also added the release nodes prompt. Is it correct now?

Looking forward to get this into main. Btw., we're having this fix in our own fork and successfully using it in production since a few weeks.

Skarlso · 2023-04-03T19:56:25Z

Yep looks alright now. Thanks.

I'll take a look later on. I've been sick with Covid these past couple of days.

Skarlso · 2023-04-03T20:07:57Z

/lgtm
/approve

k8s-ci-robot · 2023-04-03T20:08:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Skarlso

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Skarlso]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

roehrijn · 2023-04-05T07:05:48Z

Hi @Skarlso, hope you're well again. No worries, I also had several gaps of multiple days where I had no time to look into this PR. Now I'm happy about my first merged contribution to a CNCF project. More to come ...

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 22, 2023

k8s-ci-robot added needs-priority cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 22, 2023

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 22, 2023

k8s-ci-robot requested review from Skarlso and luthermonson March 22, 2023 14:38

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 23, 2023

Fix session surviving cluster purge and recreate through cache

3636d8f

* session needs to be removed from cache in case of faulty retrieval from credentials providers

roehrijn force-pushed the roehrijn/credentials-cache-fix branch from 1014a29 to 3636d8f Compare March 23, 2023 08:57

k8s-ci-robot assigned Skarlso Apr 3, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 3, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 3, 2023

k8s-ci-robot merged commit d9b62d2 into kubernetes-sigs:main Apr 3, 2023

roehrijn deleted the roehrijn/credentials-cache-fix branch April 5, 2023 07:09

Fix session surviving cluster purge and recreate through cache #4162

Fix session surviving cluster purge and recreate through cache #4162

Uh oh!

Conversation

roehrijn commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Mar 22, 2023

Uh oh!

k8s-ci-robot commented Mar 22, 2023

Uh oh!

k8s-ci-robot commented Mar 22, 2023

Uh oh!

Skarlso commented Mar 23, 2023

Uh oh!

Skarlso commented Mar 29, 2023

Uh oh!

k8s-ci-robot commented Mar 29, 2023

Uh oh!

Skarlso commented Mar 29, 2023

Uh oh!

roehrijn commented Mar 30, 2023

Uh oh!

Skarlso commented Apr 3, 2023

Uh oh!

Skarlso commented Apr 3, 2023

Uh oh!

k8s-ci-robot commented Apr 3, 2023

Uh oh!

roehrijn commented Apr 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

roehrijn commented Mar 22, 2023 •

edited

Loading

linux-foundation-easycla bot commented Mar 22, 2023 •

edited

Loading