kubernetes plugin occasionally misses pod updates/deletions

**What happened**:
In my k8s cluster, I am using `kubernetes` plugin with `pods` set to `verified`. CoreDNS should resolve pod names like `1-2-3-4.ns.pod.cluster.local.` only for pods existing in given namespace.

I found that from time to time, CoreDNS resolves nonexisting (deleted) pods. I am getting succesful responses for IPs which do not belong to any pods at the time of the query.

**What you expected to happen**:
CoreDNS should respond with `NXDOMAIN` when querying for IP not belonging to any pod.

**How to reproduce it (as minimally and precisely as possible)**:
Unfortunalely, this is not easy to reproduce. I can only reproduce it in production environment, and even there this is not always happening.

**Anything else we need to know?**:
I spent some time debugging this and I think I found the cause. In [informer.go](https://github.com/coredns/coredns/blob/master/plugin/kubernetes/object/informer.go#L42), there is an error check:
```go
obj, err := convert(d.Object.(meta.Object))
if err != nil {
	return err
}
```
which escapes the loop in case of error. One of the errors returned by `convert()` is `errPodTerminating` which is caused by pods having `deletionTimestamp` set. This is normal for a pod to have `deletionTimestamp` set in `cache.Updated` delta before `cache.Deleted` comes.
The problem is that such pod exits the loop and following deltas are not processed. Usually we have only one delta and everything works, but from time to time more deltas are provided by informer and we drop events.

In my case, a `cache.Updated` delta is followed by `cache.Deleted` delta for the same pod. When this happens, CoreDNS does not ever remove this pod from internal state.

A simple patch resolves the issue and server works as expected:
```go
obj, err := convert(d.Object.(meta.Object))
if err != nil {
	if err == errPodTerminating {
		continue
	}
	return err
}
```

Note that similar check is present in `cache.Deleted` case below, so pods containing `deletionTimestamp` are removed correctly if they are not preceeded by a delta returning error.

There are more ways of exiting aforementioned loop with error, although I have never encoutered them. I am not sure if we should ever return with error. From my observation returning error just drops deltas silently and nothing happens, watches are not restarted and  there are no error messages. Maybe in case of any errors we should just `continue` and proces all deltas?

I can provide PR with above patch.

**Environment**:
- the version of CoreDNS: 1.11.4 (the same code present in master)
- Corefile: (relevant fragment)
```
cluster.local:53 {
    kubernetes {
        pods verified
    }
    errors
    loadbalance round_robin
}
```
- logs, if applicable: CoreDNS emits no logs relevant to this issue
- OS (e.g: `cat /etc/os-release`): docker image based on Alma Linux 9; OS is not relevant



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kubernetes plugin occasionally misses pod updates/deletions #7119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kubernetes plugin occasionally misses pod updates/deletions #7119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions