Skip to content

Prevent leaking EIP when creating machines with BYO IPv4 Pool #5038

@mtulio

Description

@mtulio

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

There are some situations that a machine deployed with BYO IPv4 pool is leaking Elastic IPs:

  • 1/ When the instance is created, the flag (AssociatePublicIpAddress) to create the instance without public IP must be set to false to the primary network interface, otherwise the instance will be created with an Amazon-provided while the BYO IP reconciliation loop doesn't reach the BYO reconciliation, and the custom EIP is allocated and associated to the instance.
  • 2/ The machine reconciliation loop is reaching some race condition or inconsistency from AWS API which is making the controller to create two EIP for each machine created, alternating* between each EIP while the reconciliation finished, leaking one (unused/dissociated) in the instance life cycle. The delete flow is removing both.

*alternating is expected as the algorithm lookup the EIP by role trying to optimize/reuse disassociated.

Furthermore, the following failure is happening as the BYO IP reconciliation loop is trying to associate to an non-running machine:

time="2024-05-08T15:49:33-03:00" level=debug msg="I0508 15:49:33.785472 2878400 recorder.go:104] 
\"Failed to associate Elastic IP for \\\"ec2-i-03de70744825f25c5\\\": InvalidInstanceID: 
The pending instance 'i-03de70744825f25c5' is not in a valid state for this operation.\\n\\tstatus code: 
400, request id: 7582391c-b35e-44b9-8455-e68663d90fed\" logger=\"events\" type=\"Warning\" 
object=[...]\"name\":\"mrb-byoip-32-kbcz9\",\"[...] reason=\"FailedAssociateEIP\""

time="2024-05-08T15:49:33-03:00" level=debug msg="E0508 15:49:33.803742 2878400 controller.go:329] \"Reconciler error\" err=<"

time="2024-05-08T15:49:33-03:00" level=debug msg="\tfailed to reconcile EIP: failed to associate Elastic IP 
\"eipalloc-08faccab2dbb28d4f\" to instance \"i-03de70744825f25c5\": 
InvalidInstanceID: The pending instance 'i-03de70744825f25c5' is not in a valid state for this operation."

What did you expect to happen:

  • Machine is created successfully allocating a single EIP when using BYO IPv4 Pool
  • Machine reconciliation loop must wait the instance to leaving the pending state before trying to associate EIP, preventing error messages from expected behaviors in the logs

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-aws version: v2.5.2
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release): RHCOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions