-
Notifications
You must be signed in to change notification settings - Fork 3.4k
helm: Remove default toleration on hubble-relay deployment #13237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This removes the default toleration of the hubble-relay deployment which allowed it to be scheduled on any node. In contrast to cilium and cilium-operator, which are capable of running in the host network namespace, Hubble Relay does require pod connectivity to be functional. The existing catch-all toleration was intended to provide cluster-wide network visibility in cases where nodes are unavailable. However, the current toleration can cause Hubble Relay to be scheduled on these unhealthy nodes and prevent it from running correctly, even though untainted nodes would have been available. Single-node clusters (such as minikube) intended to run workloads will not have any tainted nodes and are thus are unaffected by this change. Users who who have taints on every node in their cluster will have to use the newly introduced `hubble-relay.tolerations` Helm value to introduce custom tolerations for Hubble Relay. Fixes: #13166 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Note: In contrast to what was discussed in #13166, I opted to remove the toleration from Relay by default. While we could define
|
test-me-please |
@gandro Should this be backported to 1.8? |
It wouldn't hurt. Should be fairly straightforward to backport. Adding the label. |
This removes the default toleration of the hubble-relay deployment which
allowed it to be scheduled on any node. In contrast to cilium and
cilium-operator, which are capable of running in the host network
namespace, Hubble Relay does require pod connectivity to be functional.
The existing catch-all toleration was intended to provide cluster-wide
network visibility in cases where nodes are unavailable. However, the
current toleration can cause Hubble Relay to be scheduled on these
unhealthy nodes and prevent it from running correctly, even though
untainted nodes would have been available.
Single-node clusters (such as minikube) intended to run workloads will
not have any tainted nodes and are thus are unaffected by this change.
Users who who have taints on every node in their cluster will have to
use the newly introduced
hubble-relay.tolerations
Helm value tointroduce custom tolerations for Hubble Relay.
Fixes: #13166