Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

If shutting down half or more of the master-eligible nodes, their votes must
first be explicitly withdrawn to ensure that the cluster doesn't lose its
quorum. This works via voting tombstones, stored in the cluster state, which
tell the reconfigurator to remove nodes from the voting configuration.

This change introduces voting tombstones to the cluster state, together with
transport APIs for adding and removing them, and makes use of these APIs in
InternalTestCluster to support tests which remove at least half of the
master-eligible nodes at once (e.g. shrinking from two master-eligible nodes to
one).

If shutting down half or more of the master-eligible nodes, their votes must
first be explicitly withdrawn to ensure that the cluster doesn't lose its
quorum. This works via _voting tombstones_, stored in the cluster state, which
tell the reconfigurator to remove nodes from the voting configuration.

This change introduces voting tombstones to the cluster state, together with
transport APIs for adding and removing them, and makes use of these APIs in
`InternalTestCluster` to support tests which remove at least half of the
master-eligible nodes at once (e.g. shrinking from two master-eligible nodes to
one).
@DaveCTurner DaveCTurner added >enhancement v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Nov 12, 2018
@DaveCTurner DaveCTurner requested a review from ywelsch November 12, 2018 11:53
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some smaller comments. Looks very good already.

@DaveCTurner
Copy link
Contributor Author

@elasticmachine test this please

@ywelsch ywelsch mentioned this pull request Nov 12, 2018
61 tasks
@DaveCTurner DaveCTurner requested a review from ywelsch November 13, 2018 08:18
@DaveCTurner DaveCTurner requested a review from ywelsch November 13, 2018 13:59
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think you need to merge latest zen2 branch here.

final int nodesToRemain = internalCluster().size() - 1;
logger.info("--> reducing to [{}] nodes", nodesToRemain);
internalCluster().ensureAtMostNumDataNodes(nodesToRemain);
assertThat(internalCluster().size(), lessThanOrEqualTo(nodesToRemain));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're testing here with this assertion. why not just repeatedly call stopRandomNode(), and maybe check that the cluster is alive and healthy before shutting down the last node.

@DaveCTurner
Copy link
Contributor Author

CI failure was due to being a Centos worker (fixed by #35453) so I merged master to zen2 and thence to here.

@DaveCTurner DaveCTurner merged commit 8e40a2b into elastic:zen2 Nov 13, 2018
@DaveCTurner DaveCTurner deleted the 2018-11-09-voting-tombstones branch November 13, 2018 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants