Skip to content

Conversation

@Antanukas
Copy link
Collaborator

@Antanukas Antanukas commented Mar 7, 2022

Adds support to shrink cluster by multiple instances

@Antanukas Antanukas changed the base branch from master to antanas/use-batch-placement-remove March 7, 2022 09:57
Base automatically changed from antanas/use-batch-placement-remove to master March 8, 2022 06:03
@Antanukas Antanukas marked this pull request as ready for review March 8, 2022 11:15
}

// findPodInstanceToRemove returns the pod (and associated placement instace)
// findPodInstancesToRemove returns the pod (and associated placement instace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: s/pod/pods

Copy link
Collaborator

@schallert schallert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

// findPodInstanceToRemove returns the pod (and associated placement instace)
// findPodInstancesToRemove returns the pod (and associated placement instace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you update the comment to reflect that this will return multiple pods / instances now?

}

return nil, nil, errNoPodsInPlacement
return podsToRemove, instancesToRemove, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure this is doing what the user intends, do we want to return an error if len(podsToRemove) != removeCount? Not sure if there's a case where that could happen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only case where this can happen is if we somehow want to remove more than exists. There is a test case when removeCount is for 4 but we have only 3 instances in placement.

In practice this should not happen because we remove only if placement contains more than desired:

		if inPlacement > desired {
			setLogger.Info("remove instance from placement for set")
			return c.shrinkPlacementForSet(cluster, set, placement, int(inPlacement-desired))
		}

. Maybe if we spam expand/shrink we could enter some weird state here. But I think it's hard to reason what will happen anyways.

Another approach that potentially can be safer is to instead of having "removeCount" we could pass in "desired" so that we try to reach target state, we could than assert if target state makes sense like "desired" > "currentPodCount" otherwise it looks like expansion.

@Antanukas
Copy link
Collaborator Author

Refactored PR to instead of using removeCount == inPlacement - desired just pass desired to shrink method that tries to reach the desired count. If desired is negative it just bails. If current state is lower or equal to desired it will not do any shrinking. I find this approach easier to understand and reason about.

Copy link
Collaborator

@jeromefroe jeromefroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMx2, I like the new approach of just passing the desired number of instances to shrinkPlacementForSet better as well

podsToRemove []*corev1.Pod
instancesToRemove []placement.Instance
)
for i := len(podIDs) - 1; i >= desiredInstanceCount; i-- {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe it's worth moving the check that desiredInstanceCount is greater than zero into this function just in case we call it from other places in the future?

@Antanukas Antanukas enabled auto-merge (squash) March 10, 2022 09:27
@Antanukas Antanukas merged commit 0194136 into master Mar 10, 2022
@Antanukas Antanukas deleted the antanas/batch-node-removal branch March 10, 2022 09:37
Comment on lines +593 to +594
if desiredInstanceCount < 0 {
msg := fmt.Sprintf("desired instance count is negative: %d", desiredInstanceCount)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also validate against desiredInstanceCount = 0?

Copy link
Collaborator Author

@Antanukas Antanukas Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 0 should be a valid option. Like we can do with kubectl set scale 0. Haven't tried it though if it will work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it be valid in terms of placement - where will the shards be relocated to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah maybe it does not make sense after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants