Skip to content

node didn't join cluster #832

@jackfrancis

Description

@jackfrancis

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

Hi there, I build a 3 control plane node + 20 worker node cluster, and only 19 worker nodes joined. From the logs on the VM of the node that didn't join it appears that the kubeadm node registration as logged in the cloud-final journalctl logs ran into an etcd leader election change and couldn't recover. See:

$ sudo journalctl -u cloud-final --no-pager
-- Logs begin at Mon 2020-07-27 17:57:52 UTC, end at Mon 2020-07-27 19:43:10 UTC. --
Jul 27 17:58:26 francis-test-md-0-8qlrg systemd[1]: Starting Execute cloud user/final scripts...
Jul 27 17:58:29 francis-test-md-0-8qlrg cloud-init[1815]: W0727 17:58:29.758593    1860 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
Jul 27 17:58:29 francis-test-md-0-8qlrg cloud-init[1815]: W0727 17:58:29.776696    1860 common.go:77] your configuration file uses a deprecated API spec: "kubeadm.k8s.io/v1beta1". Please use 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.
Jul 27 17:58:29 francis-test-md-0-8qlrg cloud-init[1815]: [preflight] Running pre-flight checks
Jul 27 17:58:31 francis-test-md-0-8qlrg cloud-init[1815]: [preflight] Reading configuration from the cluster...
Jul 27 17:58:31 francis-test-md-0-8qlrg cloud-init[1815]: [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
Jul 27 17:58:31 francis-test-md-0-8qlrg cloud-init[1815]: [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
Jul 27 17:59:04 francis-test-md-0-8qlrg cloud-init[1815]: error execution phase kubelet-start: etcdserver: leader changed
Jul 27 17:59:04 francis-test-md-0-8qlrg cloud-init[1815]: To see the stack trace of this error execute with --v=5 or higher
Jul 27 17:59:04 francis-test-md-0-8qlrg cloud-init[1815]: Cloud-init v. 19.4-33-gbb4131a2-0ubuntu1~18.04.1 running 'modules:final' at Mon, 27 Jul 2020 17:58:26 +0000. Up 44.44 seconds.
Jul 27 17:59:04 francis-test-md-0-8qlrg cloud-init[1815]: 2020-07-27 17:59:04,045 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [1]
Jul 27 17:59:04 francis-test-md-0-8qlrg cloud-init[1815]: 2020-07-27 17:59:04,064 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
Jul 27 17:59:04 francis-test-md-0-8qlrg cloud-init[1815]: 2020-07-27 17:59:04,065 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
Jul 27 17:59:04 francis-test-md-0-8qlrg ec2[1913]: 
Jul 27 17:59:04 francis-test-md-0-8qlrg ec2[1913]: #############################################################
Jul 27 17:59:04 francis-test-md-0-8qlrg ec2[1913]: -----BEGIN SSH HOST KEY FINGERPRINTS-----
<truncated
Jul 27 17:59:04 francis-test-md-0-8qlrg ec2[1913]: -----END SSH HOST KEY FINGERPRINTS-----
Jul 27 17:59:04 francis-test-md-0-8qlrg ec2[1913]: #############################################################
Jul 27 17:59:04 francis-test-md-0-8qlrg cloud-init[1815]: Cloud-init v. 19.4-33-gbb4131a2-0ubuntu1~18.04.1 finished at Mon, 27 Jul 2020 17:59:04 +0000. Datasource DataSourceAzure [seed=/dev/sr0].  Up 81.69 seconds
Jul 27 17:59:04 francis-test-md-0-8qlrg systemd[1]: cloud-final.service: Main process exited, code=exited, status=1/FAILURE
Jul 27 17:59:04 francis-test-md-0-8qlrg systemd[1]: cloud-final.service: Failed with result 'exit-code'.
Jul 27 17:59:04 francis-test-md-0-8qlrg systemd[1]: Failed to start Execute cloud user/final scripts.

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • cluster-api-provider-azure version:
  • Kubernetes version: (use kubectl version): v1.17.8
  • OS (e.g. from /etc/os-release): Ubuntu 18.04-LTS

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions