-
Notifications
You must be signed in to change notification settings - Fork 41.7k
Description
What happened?
I created deployments with hostname, subdomain and headless service. I had the pods query their DNS records and log the results.
It typically took ~30 seconds for name resolution to be correct, though in some cases it could be much faster. Name resolution seems to fail occasionally returning NXDOMAIN. When the pod is deleted and recreated results are similar though the previous A record may also be returned.
I initially ran very short duration tests that stopped after observing a single intermittent failure. I later ran some longer duration tests and observed that these intermittent failures continue to occur and vary in duration.
First are some results of short duration tests after creating resources with reproducer-1.yml
Fairly typical, took ~30 seconds for initial successful response.
+ kubectl -n name-space logs --timestamps pod/example-9-85b9cf77c4-74dpb
2024-09-09T23:05:14.821639151Z ip address: 10.244.0.165
2024-09-09T23:05:46.613337669Z 10.244.0.165 example-9.headless.name-space.svc.cluster.local example-9.headless.name-space.svc.cluster.local example-9.headless
2024-09-09T23:05:46.613549795Z success
2024-09-09T23:06:45.000865071Z 10.244.0.165 example-9.headless.name-space.svc.cluster.local example-9.headless.name-space.svc.cluster.local example-9.headless
2024-09-09T23:06:45.001438197Z done
Note the intermittent failure nearly ~30 seconds after the pod started:
+ kubectl -n name-space logs --timestamps pod/example-46-5f88d64f8f-fkkmf
2024-09-09T23:05:26.142870309Z ip address: 10.244.0.205
2024-09-09T23:05:56.156259513Z 10.244.0.205 example-46.headless.name-space.svc.cluster.local example-46.headless.name-space.svc.cluster.local example-46.headless
2024-09-09T23:05:56.156283138Z success
2024-09-09T23:05:56.159916727Z failure
2024-09-09T23:05:56.166636988Z 10.244.0.205 example-46.headless.name-space.svc.cluster.local example-46.headless.name-space.svc.cluster.local example-46.headless
2024-09-09T23:05:56.166718155Z success
2024-09-09T23:05:56.166743405Z done
Note the intermittent failure nearly ~60 seconds after the pod started:
+ kubectl -n name-space logs --timestamps pod/example-8-7d65cf6595-5kgm8
2024-09-09T23:05:15.292906336Z ip address: 10.244.0.164
2024-09-09T23:05:45.299029486Z 10.244.0.164 example-8.headless.name-space.svc.cluster.local example-8.headless.name-space.svc.cluster.local example-8.headless
2024-09-09T23:05:45.299171153Z success
2024-09-09T23:06:12.934370064Z 10.244.0.164 example-8.headless.name-space.svc.cluster.local example-8.headless.name-space.svc.cluster.local example-8.headless
2024-09-09T23:06:12.934805231Z failure
2024-09-09T23:06:12.935585357Z 10.244.0.164 example-8.headless.name-space.svc.cluster.local example-8.headless.name-space.svc.cluster.local example-8.headless
2024-09-09T23:06:12.935622357Z success
2024-09-09T23:06:12.935626232Z done
Correct name resolution nearly ten times faster then the typical ~30 seconds.
+ kubectl -n name-space logs --timestamps pod/example-50-5d7bd9f7dd-ldm9z
2024-09-09T23:05:25.994666234Z ip address: 10.244.0.203
2024-09-09T23:05:28.507822913Z 10.244.0.203 example-50.headless.name-space.svc.cluster.local example-50.headless.name-space.svc.cluster.local example-50.headless
2024-09-09T23:05:28.507850872Z success
2024-09-09T23:05:28.512090337Z failure
2024-09-09T23:05:28.517020845Z 10.244.0.203 example-50.headless.name-space.svc.cluster.local example-50.headless.name-space.svc.cluster.local example-50.headless
2024-09-09T23:05:28.517073470Z success
2024-09-09T23:05:28.517076387Z done
Next are some short duration examples after deleting the pod created with reproducer-1.yml
Resolved to previous IP address until ~30 seconds when name resolution become correct.
+ kubectl -n name-space logs --timestamps pod/example-32-94f578796-ggr5r
2024-09-09T23:28:58.365987901Z ip address: 10.244.0.189
2024-09-09T23:28:58.366879902Z 10.244.0.135 example-32.headless.name-space.svc.cluster.local example-32.headless.name-space.svc.cluster.local example-32.headless
2024-09-09T23:28:58.366941985Z success
2024-09-09T23:29:28.372333094Z 10.244.0.135 example-32.headless.name-space.svc.cluster.local example-32.headless.name-space.svc.cluster.local example-32.headless
2024-09-09T23:30:29.538217372Z 10.244.0.189 example-32.headless.name-space.svc.cluster.local example-32.headless.name-space.svc.cluster.local example-32.headless
2024-09-09T23:30:29.540434333Z done
In this case name resolution failed briefly ~90 seconds after pod started.
+ kubectl -n name-space logs --timestamps pod/example-33-5c94568945-zwhk8
2024-09-09T23:28:56.224260600Z ip address: 10.244.0.177
2024-09-09T23:28:56.225344685Z 10.244.0.136 example-33.headless.name-space.svc.cluster.local example-33.headless.name-space.svc.cluster.local example-33.headless
2024-09-09T23:28:56.225403852Z success
2024-09-09T23:29:26.229834542Z 10.244.0.136 example-33.headless.name-space.svc.cluster.local example-33.headless.name-space.svc.cluster.local example-33.headless
2024-09-09T23:30:22.876064342Z 10.244.0.177 example-33.headless.name-space.svc.cluster.local example-33.headless.name-space.svc.cluster.local example-33.headless
2024-09-09T23:30:22.878206470Z failure
2024-09-09T23:30:22.881135099Z 10.244.0.177 example-33.headless.name-space.svc.cluster.local example-33.headless.name-space.svc.cluster.local example-33.headless
2024-09-09T23:30:22.881363766Z success
2024-09-09T23:30:22.881381516Z done
Here name resolution alternated between the old & new address, and failed briefly before correctly providing new address.
+ kubectl -n name-space logs --timestamps pod/example-34-5f8cdcc9bb-zp2xj
2024-09-09T23:28:56.311666142Z ip address: 10.244.0.178
2024-09-09T23:28:58.814520539Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:28:58.814552873Z success
2024-09-09T23:28:58.816665626Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:28:58.845242167Z 10.244.0.146 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:28:58.845250417Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:28:58.845252042Z 10.244.0.146 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:28:58.845253708Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:28:58.845255333Z 10.244.0.146 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:28:58.845256750Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:28:58.845258125Z 10.244.0.146 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:29:03.874183954Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:29:03.874214370Z 10.244.0.146 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:29:03.874216745Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:29:03.874218495Z 10.244.0.146 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
...
2024-09-09T23:29:28.801997330Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:29:28.801998872Z 10.244.0.146 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:29:28.802000372Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:29:28.802001830Z 10.244.0.146 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:30:04.697987578Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:30:04.698009453Z 10.244.0.146 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:30:04.698011870Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:30:04.698013661Z 10.244.0.146 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:30:04.698017411Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:30:04.699209288Z failure
2024-09-09T23:30:04.701096499Z 10.244.0.178 example-34.headless.name-space.svc.cluster.local example-34.headless.name-space.svc.cluster.local example-34.headless
2024-09-09T23:30:04.701108082Z success
2024-09-09T23:30:04.701110207Z done
Finally an example of a longer duration test with reproducer-2.yml
Failures continue to occur and there appears to be a failure lasting ~5 seconds from 00:58:06-00:58:11.
+ kubectl -n name-space logs --timestamps pod/example-44-7b4c6db795-jnpx4
2024-09-10T00:53:45.606690224Z ip address: 10.244.0.148
2024-09-10T00:54:15.645760871Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:54:15.645830788Z success
2024-09-10T00:54:21.547652384Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:54:21.549761139Z failure
2024-09-10T00:54:21.551739643Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:54:21.551747935Z success
2024-09-10T00:55:08.732400872Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:55:08.733033040Z failure
2024-09-10T00:55:08.733892585Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:55:08.733955002Z success
2024-09-10T00:55:13.974409562Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:55:13.974909230Z failure
2024-09-10T00:55:13.975787899Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:55:13.975875275Z success
2024-09-10T00:55:19.037338608Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:55:19.037842693Z failure
2024-09-10T00:55:19.038832488Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:55:19.038843155Z success
2024-09-10T00:55:29.117418683Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:55:29.117966018Z failure
2024-09-10T00:55:29.118763229Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:55:29.118802687Z success
2024-09-10T00:55:44.373593325Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:55:44.374101410Z failure
2024-09-10T00:55:44.374768245Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:55:44.374816787Z success
2024-09-10T00:57:10.127070908Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:57:10.128437322Z failure
2024-09-10T00:57:10.133007897Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:57:10.134041104Z success
2024-09-10T00:57:51.941566804Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:57:51.941980888Z failure
2024-09-10T00:57:51.942865181Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:57:51.942923889Z success
2024-09-10T00:58:06.189370517Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:58:06.189968810Z failure
2024-09-10T00:58:11.194641064Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:58:11.194660897Z success
2024-09-10T00:58:31.853444367Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:58:31.854188701Z failure
2024-09-10T00:58:31.855070078Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T00:58:31.855520870Z success
2024-09-10T01:00:14.012500903Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:00:14.013042733Z failure
2024-09-10T01:00:14.014231435Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:00:14.014433351Z success
2024-09-10T01:00:25.099501939Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:00:25.100104894Z failure
2024-09-10T01:00:25.101076931Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:00:25.101219472Z success
2024-09-10T01:01:28.106594377Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:01:28.108120337Z failure
2024-09-10T01:01:28.109668172Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:01:28.109720922Z success
2024-09-10T01:01:47.173618917Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:01:47.176251003Z failure
2024-09-10T01:01:47.178596298Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:01:47.178728923Z success
2024-09-10T01:02:34.829412651Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:02:34.829895652Z failure
2024-09-10T01:02:34.831115611Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:02:34.831143778Z success
2024-09-10T01:02:45.651782651Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:02:45.652329402Z failure
2024-09-10T01:02:45.653055861Z 10.244.0.148 example-44.headless.name-space.svc.cluster.local example-44.headless.name-space.svc.cluster.local example-44.headless
2024-09-10T01:02:45.653182319Z success
What did you expect to happen?
I expected that once name resolution provided the correct address it would not fail intermittently or subsequently return the wrong (old) address.
It would also be nice if it were possible that name resolution could start sooner more often, and ideally never return the wrong (old) address.
How can we reproduce it (as minimally and precisely as possible)?
I initially used this configuration and script:
reproducer-1.yml.txt
script-1.sh.txt
The script above stopped after the first failure, to see more failures I used these:
reproducer-2.yml.txt
script-2.sh.txt
Anything else we need to know?
This may be related to some existing issues.
The mostly ~30 second delay: #92559
The intermittent failures: coredns/coredns#6518
Assuming I need to workaround the intermittent failure, are there any downsides to modifying the hosts file directly (not through HostAliases) other than losing changes when the container exits?
Kubernetes version
$ kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="40.20240529.0 (Silverblue)"
ID=fedora
VERSION_ID=40
VERSION_CODENAME=""
PLATFORM_ID="platform:f40"
PRETTY_NAME="Fedora Linux 40.20240529.0 (Silverblue)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:40"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://silverblue.fedoraproject.org"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-silverblue/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://github.com/fedora-silverblue/issue-tracker/issues"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=40
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=40
SUPPORT_END=2025-05-13
VARIANT="Silverblue"
VARIANT_ID=silverblue
OSTREE_VERSION='40.20240529.0'
$ uname -a
Linux fedora 6.8.10-300.fc40.aarch64 #1 SMP PREEMPT_DYNAMIC Fri May 17 21:52:12 UTC 2024 aarch64 GNU/Linux
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here