Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

Today if a leader is not discovered or elected then nodes are essentially
silent at INFO and above, and log copiously at DEBUG and below. A short delay
when electing a leader is not unusual, for instance if other nodes have not yet
started, but a persistent failure to elect a leader is a problem worthy of log
messages in the default configuration.

With this change, while there is no leader each node outputs a WARN-level log
message every 10 seconds (by default) indicating as such, describing the
current discovery state and the current quorum(s).

Today if a leader is not discovered or elected then nodes are essentially
silent at INFO and above, and log copiously at DEBUG and below. A short delay
when electing a leader is not unusual, for instance if other nodes have not yet
started, but a persistent failure to elect a leader is a problem worthy of log
messages in the default configuration.

With this change, while there is no leader each node outputs a WARN-level log
message every 10 seconds (by default) indicating as such, describing the
current discovery state and the current quorum(s).
@DaveCTurner DaveCTurner added >enhancement v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Nov 28, 2018
@DaveCTurner DaveCTurner requested a review from ywelsch November 28, 2018 12:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner DaveCTurner changed the title Add warning if cluster fails to form fast enough [Zen2] Add warning if cluster fails to form fast enough Nov 28, 2018
@ywelsch ywelsch mentioned this pull request Nov 29, 2018
61 tasks
@DaveCTurner DaveCTurner changed the base branch from zen2 to master December 6, 2018 08:27
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the logic here is spread over too many classes. Can we just expose the info from PeerFinder that we need (i.e. lastResolvedAddresses)? Coordinator takes care of scheduling this and can then just assemble the information from the various components into a log output, not requiring a callback.

return new VotingConfiguration(Arrays.stream(nodes).map(DiscoveryNode::getId).collect(Collectors.toSet()));
}

public String getQuorumDescription() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method (and the other describe methods) are without context in their respective classes. I would prefer to have the full construction of the output in warnClusterFormationFailed

foundPeers.forEach(possibleVotes::addVote);
final String isQuorumOrNot = coordinationState.get().isElectionQuorum(possibleVotes) ? "is a quorum" : "is not a quorum";

logger.warn("leader not discovered or elected yet: election requires {}, have discovered {} which {}; discovery " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's swap leader for master.


public String getBootstrapDescription() {
if (initialMasterNodeCount == 0) {
return "external cluster bootstrapping";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is meant by "external cluster bootstrapping"?

foundPeers.forEach(possibleVotes::addVote);
final String isQuorumOrNot = coordinationState.get().isElectionQuorum(possibleVotes) ? "is a quorum" : "is not a quorum";

logger.warn("leader not discovered or elected yet: election requires {}, have discovered {} which {}; discovery " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this is not a master-eligible node: does it even make sense to talk about elections here? Maybe it should state that it is a non-master-eligible nodes and that it cannot find a master?

@DaveCTurner DaveCTurner requested a review from ywelsch December 7, 2018 14:53
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit 9d41798 into elastic:master Dec 7, 2018
@DaveCTurner DaveCTurner deleted the 2018-11-28-cluster-formation-timeout-warning branch December 7, 2018 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants