tolerations: [{operator: Exists}] on operator prevents node drain

### Is there an existing issue for this?

- [X] I have searched the existing issues

### What happened?

## Problem
A `cilium-operator` pod drained from a node is automatically rescheduled back onto that drained node. This may very well be a Kubernetes oversight, but Cilium's helm chart can be easily patched against it.

The [helm chart's default toleration clause](https://github.com/cilium/cilium/blob/main/install/kubernetes/cilium/values.yaml#L2398-L2399) is interpreted [by Kubernetes to mean a global match](https://github.com/kubernetes/kubernetes/blob/master/pkg/util/tolerations/tolerations.go#L79) for any key or value:

[**Helm chart**](https://github.com/cilium/cilium/blob/main/install/kubernetes/cilium/values.yaml#L2398-L2399)
```
  tolerations:
  - operator: Exists
```

[**Kubernetes code**](https://github.com/kubernetes/kubernetes/blob/master/pkg/util/tolerations/tolerations.go#L79)
```
// An empty key with Exists operator means match all keys & values.
```

**Possible Fix**
Plenty of other areas in the helm chart use this.
```
  tolerations: []
```

## Similar
[18995](https://github.com/cilium/cilium/issues/18995) is very close, but was resolved with an HA operator config. I have two operators running, but still observe the problem.

## Steps to Reproduce
1. Drain a node with `cilium-operator` that has only `tolerations: [{operator: Exists}]` in the spec.
2. See the node is annotated with the `NoSchedule` taint, and `unschedulable: true`.
3. Observe the operator pod deleted from the node.
4. Observe a new operator pod immediately scheduled to the same drained node.
5. The pod is never scheduled to another node, no matter how long you wait.

## Expected
The `cilium-operator` pod is scheduled to another node.

## Actual
The pod remains hauntingly, aggravatingly, and most deeply annoyingly fixated on the drained node.  The scheduler log shows this:

```
I1011 18:49:14.299093       1 schedule_one.go:265] "Successfully bound pod to node" pod="kube-system/cilium-operator-59d78d96f4-khndr" node="lab1-qz2-sr1-rk18-s24-mstr001" evaluatedNodes=3 feasibleNodes=3
```

The `feasibleNodes=3` note means that the affinity/taint filters in Kubernetes failed to identify an unschedulable node as unavailable to the operator pod.

## Demo
Notes:
1. The `primary2` node is already set to SchedulingDisabled.
7. Ignoring the fact the pod shouldn't be on there now, it certainly shouldn't after another drain.
8. Drain the node.
9. Re-list the pods, and note that the operator scheduled right back onto the unschedulable node.

```bash
root@primary1:~# kubectl get nodes -o wide
NAME                            STATUS                     ROLES           AGE     VERSION    INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
primary1   Ready                      control-plane   7h23m   v1.24.9    192.168.50.195   <none>        Ubuntu 18.04.5 LTS   4.15.0-1080-ibm-gt   containerd://1.6.8
primary2   Ready,SchedulingDisabled   <none>          7h19m   v1.23.11   192.168.50.197   <none>        Ubuntu 18.04.5 LTS   4.15.0-1080-ibm-gt   containerd://1.6.8
primary3   Ready                      <none>          7h19m   v1.24.9    192.168.50.199   <none>        Ubuntu 18.04.5 LTS   4.15.0-1080-ibm-gt   containerd://1.6.8

root@primary1:~# kubectl get pods -o wide -A
NAMESPACE       NAME                                 READY   STATUS    RESTARTS   AGE     IP               NODE       NOMINATED NODE   READINESS GATES
kube-system     cilium-operator-59d78d96f4-blgx4     1/1     Running   0          5h48m   192.168.50.199   primary3   <none>           <none>
kube-system     cilium-operator-59d78d96f4-khndr     1/1     Running   0          3h52m   192.168.50.197   primary2   <none>           <none>

root@primary1:~# kubectl drain primary2 --ignore-daemonsets
node/primary2 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/cilium-phlcx, kube-system/kube-proxy-rqfn2
evicting pod kube-system/cilium-operator-59d78d96f4-khndr
pod/cilium-operator-59d78d96f4-khndr evicted
node/primary2 drained

root@primary1:~# kubectl get pods -o wide -A
NAMESPACE       NAME                                 READY   STATUS    RESTARTS   AGE     IP               NODE       NOMINATED NODE   
kube-system     cilium-operator-59d78d96f4-blgx4     1/1     Running   0          5h52m   192.168.50.199   primary3   <none>           <none>
kube-system     cilium-operator-59d78d96f4-sclwl     1/1     Running   0          98s     192.168.50.197   primary2   <none>           <none>
```


### Cilium Version

I'm running an old `1.11.1`, but the Helm chart problem is on all releases.

### Kernel Version

n/a

### Kubernetes Version

1.24.9, but like Cilium pretty much all versions for the past four years are affected.

### Sysdump

_No response_

### Relevant log output

_No response_

### Anything else?

_No response_

### Code of Conduct

- [X] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tolerations: [{operator: Exists}] on operator prevents node drain #28549

Is there an existing issue for this?

What happened?

Problem

Similar

Steps to Reproduce

Expected

Actual

Demo

Cilium Version

Kernel Version

Kubernetes Version

Sysdump

Relevant log output

Anything else?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tolerations: [{operator: Exists}] on operator prevents node drain #28549

Description

Is there an existing issue for this?

What happened?

Problem

Similar

Steps to Reproduce

Expected

Actual

Demo

Cilium Version

Kernel Version

Kubernetes Version

Sysdump

Relevant log output

Anything else?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions