-
Notifications
You must be signed in to change notification settings - Fork 3.4k
loadbalancer: Keep backends around that are terminating and not serving #40969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loadbalancer: Keep backends around that are terminating and not serving #40969
Conversation
/test |
4ab9dc9
to
3a56eac
Compare
3a56eac
to
5fe283a
Compare
/test |
5fe283a
to
42fd2c8
Compare
/test |
Here is a more detailed explanation what's going on using a simple HTTP server that gracefully shuts down on SIGTERM and set it up with a readiness probe.
I'm curl'ing the server from a pod in another node:
When I scale down the deployment to 0 replicas the HTTP server receives a SIGTERM which closes the listening Cilium without this will remove the backend from both services and backends maps when Cilium with this fix will mark the backend as quarantined as soon as it has |
47faf02
to
3189234
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks great now.
/test |
2dc7912
to
d85ce8f
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a clean implementation, mostly LGTM!
The main change in this PR is to exclude backends that are terminating but not serving from load balancing, even if they are the only available backend.
I wish Kubernetes integrated such conditions in the service endpoint updates, eliminating the need to add another "terminating-not-serving" state. Couple of follow-ups:
(1) Do we need to persist this new state in the BPF LB map?
(2) Can you extend the documentation - https://docs.cilium.io/en/latest/network/kubernetes/kubeproxy-free/#graceful-termination?
|
d85ce8f
to
c7c0dd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! thanks Jussi
c7c0dd6
to
803742a
Compare
Saw at least one failure around https://github.com/kubernetes/kubernetes/blob/790393ae92e97262827d4f1fba24e8ae65bbada0/test/e2e/network/service.go#L3041C1-L3042C1 in the k8s e2e which seems very suspicious given the changes in this PR. I think we might be implementing the terminating endpoints fallback incorrectly. Though looking at the test I think there might be a chance that it would try to connect before Cilium has processed the EndpointSlice change in which case it wouldn't error out. Will need to dive into this before I dare merging this PR (even though it seems that it is unrelated). EDIT: Ah looks like we started seeing this failure when we bumped the version: https://github.com/cilium/cilium/actions/runs/16915128150. The failing test case was added already 3 years ago so it's something else that they've changed that started introducing this issue. It could still be we have a bug in how we handle the terminating endpoint fallback, but it's not related to this PR. I'll look into that issue separately, but not have it block this one. |
/test |
A terminating pod that no longer signals readiness will be seen as an endpoint with {ready: false, serving: false, terminating: true}. These were skipped leading to the backend being removed from the backends map which causes connection disruptions. Fix this by keeping the non-serving terminating backends in the backends BPF map, but not using it as a fallback backend if only terminating backends exists. This is implemented by: - Including all conditions in [k8s.Backend], not just Terminating. - Adding a new backend state (TerminatingNotServing) used with Terminating&!Serving conditions. - Not considering TerminatingNotServing backends when selecting the backends. Signed-off-by: Jussi Maki <jussi@isovalent.com>
803742a
to
1059edd
Compare
/test |
Dropped the needs-backport label for the moment, given #41199 |
This fixes handling of terminating endpoints that are marked as not serving. These were skipped by
ParseEndpointSliceV1
which lead to them being removed from the backend BPF maps and disrupting connections too early when a terminating pod stopped signalling readiness (conditions became{ready: false, serving: false, terminating: true}
.To fix this we:
BackendStateTerminatingNotServing
for terminating backends that are to be kept in BPF maps, but not load-balanced to.ParseEndpointSliceV1
. In case the conditions are the unexpected{ready: false, serving: false, terminating: false}
the backend is simply quarantined.