-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
Description
What happened:
I have 5 coredns pod running about 2 months, then I found 2 of them can not lookup new services created by k8s.
~/allen# kubectl -n core-dns-wlcb-prod get svc,po -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
svc/core-dns ClusterIP 172.29.8.8 192.168.2.184 49153/TCP,53/UDP,53/TCP 88d name=core-dns
NAME READY STATUS RESTARTS AGE IP NODE
po/core-dns-bb32dbc62e244ff-6zzb5 1/1 Running 1 67d 172.16.84.14 192.168.2.41
po/core-dns-bb32dbc62e244ff-89rdx 1/1 Running 1 67d 172.16.117.10 192.168.2.203
po/core-dns-bb32dbc62e244ff-pbfqm 1/1 Running 1 67d 172.16.84.15 192.168.2.41
po/core-dns-bb32dbc62e244ff-rfqbt 1/1 Running 1 67d 172.16.202.12 192.168.2.79
po/core-dns-bb32dbc62e244ff-vx5dx 1/1 Running 1 67d 172.16.83.20 192.168.2.184
the problem pod (172.16.83.20)
root@dns-check-e5b984de89ac472-dd98j:/# dig @172.16.83.20 core.titan-dev-open.svc.cluster.local
; <<>> DiG 9.10.3-P4-Ubuntu <<>> @172.16.83.20 core.titan-dev-open.svc.cluster.local
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 23894
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;core.titan-dev-open.svc.cluster.local. IN A
;; AUTHORITY SECTION:
cluster.local. 5 IN SOA ns.dns.cluster.local. hostmaster.cluster.local. 1578707575 7200 1800 86400 5
;; Query time: 0 msec
;; SERVER: 172.16.83.20#53(172.16.83.20)
;; WHEN: Sat Jan 11 15:38:37 CST 2020
;; MSG SIZE rcvd: 159
pod(172.16.202.12) works well
root@dns-check-e5b984de89ac472-dd98j:/# dig @172.16.202.12 core.titan-dev-open.svc.cluster.local
; <<>> DiG 9.10.3-P4-Ubuntu <<>> @172.16.202.12 core.titan-dev-open.svc.cluster.local
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55436
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;core.titan-dev-open.svc.cluster.local. IN A
;; ANSWER SECTION:
core.titan-dev-open.svc.cluster.local. 5 IN A 172.29.82.96
;; Query time: 1 msec
;; SERVER: 172.16.202.12#53(172.16.202.12)
;; WHEN: Sat Jan 11 15:38:57 CST 2020
;; MSG SIZE rcvd: 119
the bad pod (172.16.83.20) is not always bad, it can resolve kubernetes.default and other services created before. However it can not resolve new created k8s service. So, it seems there is some thing wrong for syncing machinism.
I got so many domain resolve logs but no helpful logs of k8s related.
What you expected to happen:
-
- coredns worker normally, can syncing all changes from k8s
-
- and I also want to know how I can get coredns logs about k8s resource syncing?
How to reproduce it (as minimally and precisely as possible):
I think restart pod could resolve the problem, but still not find the steps to reproduce
Anything else we need to know?:
Environment:
- the version of CoreDNS: 1.6.4
- Corefile:
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
endpoint http://xxxx.k8s.com:8080
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 10.1.1.1 {
max_fails 2
expire 10s
policy sequential
health_check 1s
}
log {
class denial
class error
}
cache 30
loop
reload
loadbalance
}
- logs, if applicable:
- OS (e.g:
cat /etc/os-release
): Ubuntu 16.04 - Others: