loadbalancer: Keep backends around that are terminating and not serving #40969

joamaki · 2025-08-06T11:27:00Z

This fixes handling of terminating endpoints that are marked as not serving. These were skipped by ParseEndpointSliceV1 which lead to them being removed from the backend BPF maps and disrupting connections too early when a terminating pod stopped signalling readiness (conditions became {ready: false, serving: false, terminating: true}.

To fix this we:

introduce BackendStateTerminatingNotServing for terminating backends that are to be kept in BPF maps, but not load-balanced to.
remove the skipping of non-serving backends from ParseEndpointSliceV1. In case the conditions are the unexpected {ready: false, serving: false, terminating: false} the backend is simply quarantined.

Kubernetes endpoints that are terminating are retained in the backends BPF state regardless of the "serving" condition to avoid connection disruptions when a pod no longer signals readiness to process new connections.

joamaki · 2025-08-06T11:30:48Z

/test

joamaki · 2025-08-08T10:06:33Z

/test

joamaki · 2025-08-08T13:36:38Z

/test

pkg/loadbalancer/reflectors/conversions.go

joamaki · 2025-08-12T07:29:44Z

Here is a more detailed explanation what's going on using a simple HTTP server that gracefully shuts down on SIGTERM and set it up with a readiness probe.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: unready-pod
  labels:
    app: unready
spec:
  replicas: 1
  selector:
    matchLabels:
      app: unready
  template:
    metadata:
      labels:
        app: unready
    spec:
      terminationGracePeriodSeconds: 120
      containers:
      - name: unready
        image: test.jussi/testpod:0.1
        readinessProbe:
          periodSeconds: 1
          httpGet:
            path: /ready
            port: 8080
        livenessProbe:
          exec:
            command:
            - "true"
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: svc-unready
spec:
  selector:
    app: unready
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

package main

import (
        "context"
        "fmt"
        "log/slog"
        "net/http"
        "os"
        "os/signal"
        "syscall"
        "time"
)

func main() {
        srv := &http.Server{Addr: ":8080"}
        http.HandleFunc("/sleep", sleeper)
        http.HandleFunc("/ready", func(w http.ResponseWriter, r *http.Request) {
                w.WriteHeader(http.StatusOK)
        })
        slog.Info("Listening on :8080")
        go srv.ListenAndServe()

        sigs := make(chan os.Signal, 1)
        signal.Notify(sigs, syscall.SIGTERM)
        <-sigs
        slog.Info("SIGTERM received, gracefully shutting down")
        ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
        defer cancel()
        srv.Shutdown(ctx)
}

func sleeper(w http.ResponseWriter, r *http.Request) {
        slog.Info("Request received", "method", r.Method, "url", r.URL, "remote", r.RemoteAddr)
        fmt.Fprintf(w, "sleeping 60 seconds...\n")
        w.(http.Flusher).Flush()
        time.Sleep(60 * time.Second)
}

I'm curl'ing the server from a pod in another node:

kubectl run --image=gcr.io/kubernetes-e2e-test-images/agnhost:2.6 --overrides='{"apiVersion": "v1", "spec": {"nodeSelector": { "kubernetes.io/hostname": "kind-control-plane" }}}'  test
time kubectl exec test -- curl --keepalive-time 1 'svc-unready:80/sleep'

When I scale down the deployment to 0 replicas the HTTP server receives a SIGTERM which closes the listening
socket and thus failing the readiness probe. This in turn flips serving to false (evidenced by kubectl get endpointslices/svc-unready-hdt4z -o yaml -w|grep -A3 'conditions:')

Cilium without this will remove the backend from both services and backends maps when serving=false leaving a dangling reference in the conntrack entry and thus results in a RST packet when the TCP keepalive hits it, e.g. curl terminates before 60s with curl: (56) Recv failure: Connection refused.

Cilium with this fix will mark the backend as quarantined as soon as it has serving=false and keeps it in the backend map without changes. This allows the existing connections to continue.

pkg/k8s/endpoints.go

marseel

Thanks, looks great now.

joamaki · 2025-08-12T13:25:47Z

/test

joamaki · 2025-08-12T15:39:13Z

/test

aditighag

It's a clean implementation, mostly LGTM!
The main change in this PR is to exclude backends that are terminating but not serving from load balancing, even if they are the only available backend.
I wish Kubernetes integrated such conditions in the service endpoint updates, eliminating the need to add another "terminating-not-serving" state. Couple of follow-ups:
(1) Do we need to persist this new state in the BPF LB map?
(2) Can you extend the documentation - https://docs.cilium.io/en/latest/network/kubernetes/kubeproxy-free/#graceful-termination?

joamaki · 2025-08-13T06:58:30Z

It's a clean implementation, mostly LGTM! The main change in this PR is to exclude backends that are terminating but not serving from load balancing, even if they are the only available backend. I wish Kubernetes integrated such conditions in the service endpoint updates, eliminating the need to add another "terminating-not-serving" state. Couple of follow-ups: (1) Do we need to persist this new state in the BPF LB map? (2) Can you extend the documentation - https://docs.cilium.io/en/latest/network/kubernetes/kubeproxy-free/#graceful-termination?

No need to persist anything differently. Essentially the difference to "terminating" is just that we don't pick the backend up as a fallback if no active exist (e.g. won't go into services map).
Ah good point! I'll update it with the new information learned during this.

borkmann

lgtm! thanks Jussi

Documentation/network/kubernetes/kubeproxy-free.rst

joamaki · 2025-08-13T14:49:15Z

Saw at least one failure around https://github.com/kubernetes/kubernetes/blob/790393ae92e97262827d4f1fba24e8ae65bbada0/test/e2e/network/service.go#L3041C1-L3042C1 in the k8s e2e which seems very suspicious given the changes in this PR. I think we might be implementing the terminating endpoints fallback incorrectly. Though looking at the test I think there might be a chance that it would try to connect before Cilium has processed the EndpointSlice change in which case it wouldn't error out. Will need to dive into this before I dare merging this PR (even though it seems that it is unrelated).

EDIT: Ah looks like we started seeing this failure when we bumped the version: https://github.com/cilium/cilium/actions/runs/16915128150. The failing test case was added already 3 years ago so it's something else that they've changed that started introducing this issue. It could still be we have a bug in how we handle the terminating endpoint fallback, but it's not related to this PR. I'll look into that issue separately, but not have it block this one.

joamaki · 2025-08-14T08:40:51Z

/test

A terminating pod that no longer signals readiness will be seen as an endpoint with {ready: false, serving: false, terminating: true}. These were skipped leading to the backend being removed from the backends map which causes connection disruptions. Fix this by keeping the non-serving terminating backends in the backends BPF map, but not using it as a fallback backend if only terminating backends exists. This is implemented by: - Including all conditions in [k8s.Backend], not just Terminating. - Adding a new backend state (TerminatingNotServing) used with Terminating&!Serving conditions. - Not considering TerminatingNotServing backends when selecting the backends. Signed-off-by: Jussi Maki <jussi@isovalent.com>

joamaki · 2025-08-15T10:33:05Z

/test

giorio94 · 2025-08-18T07:15:35Z

Dropped the needs-backport label for the moment, given #41199

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 6, 2025

joamaki force-pushed the pr/joamaki/lb-graceful-termination-of-not-serving branch 6 times, most recently from 4ab9dc9 to 3a56eac Compare August 6, 2025 16:45

joamaki mentioned this pull request Aug 7, 2025

k8s: In endpoint parsing reuse Backend and Ports for each address #40987

Merged

joamaki force-pushed the pr/joamaki/lb-graceful-termination-of-not-serving branch from 3a56eac to 5fe283a Compare August 8, 2025 10:04

joamaki added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Aug 8, 2025

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 8, 2025

joamaki added the needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch label Aug 8, 2025

joamaki force-pushed the pr/joamaki/lb-graceful-termination-of-not-serving branch from 5fe283a to 42fd2c8 Compare August 8, 2025 12:37

joamaki marked this pull request as ready for review August 11, 2025 09:30

joamaki requested review from a team as code owners August 11, 2025 09:30

joamaki requested review from aditighag, marseel, rastislavs and aanm August 11, 2025 09:30

joamaki enabled auto-merge August 11, 2025 09:31

borkmann reviewed Aug 11, 2025

View reviewed changes

pkg/loadbalancer/reflectors/conversions.go Outdated Show resolved Hide resolved

borkmann reviewed Aug 11, 2025

View reviewed changes

pkg/loadbalancer/reflectors/conversions.go Show resolved Hide resolved

borkmann requested a review from aojea August 11, 2025 15:10

aanm approved these changes Aug 12, 2025

View reviewed changes

rastislavs approved these changes Aug 12, 2025

View reviewed changes

marseel requested changes Aug 12, 2025

View reviewed changes

pkg/k8s/endpoints.go Outdated Show resolved Hide resolved

pkg/k8s/endpoints.go Outdated Show resolved Hide resolved

pkg/k8s/endpoints.go Show resolved Hide resolved

joamaki force-pushed the pr/joamaki/lb-graceful-termination-of-not-serving branch 3 times, most recently from 47faf02 to 3189234 Compare August 12, 2025 09:23

marseel approved these changes Aug 12, 2025

View reviewed changes

joamaki force-pushed the pr/joamaki/lb-graceful-termination-of-not-serving branch 2 times, most recently from 2dc7912 to d85ce8f Compare August 12, 2025 14:57

aditighag approved these changes Aug 12, 2025

View reviewed changes

joamaki force-pushed the pr/joamaki/lb-graceful-termination-of-not-serving branch from d85ce8f to c7c0dd6 Compare August 13, 2025 07:18

joamaki requested a review from a team as a code owner August 13, 2025 07:18

joamaki requested a review from hemanthmalla August 13, 2025 07:18

borkmann approved these changes Aug 13, 2025

View reviewed changes

Documentation/network/kubernetes/kubeproxy-free.rst Outdated Show resolved Hide resolved

joamaki force-pushed the pr/joamaki/lb-graceful-termination-of-not-serving branch from c7c0dd6 to 803742a Compare August 13, 2025 14:45

joamaki force-pushed the pr/joamaki/lb-graceful-termination-of-not-serving branch from 803742a to 1059edd Compare August 15, 2025 09:31

joamaki added this pull request to the merge queue Aug 15, 2025

Merged via the queue into cilium:main with commit 6f41c98 Aug 15, 2025
68 checks passed

joamaki deleted the pr/joamaki/lb-graceful-termination-of-not-serving branch August 15, 2025 12:57

giorio94 removed the needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch label Aug 18, 2025

joamaki mentioned this pull request Aug 18, 2025

Keep unready backends in BPF maps for existing connections #41244

Open

loadbalancer: Keep backends around that are terminating and not serving #40969

loadbalancer: Keep backends around that are terminating and not serving #40969

Uh oh!

Conversation

joamaki commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joamaki commented Aug 6, 2025

Uh oh!

joamaki commented Aug 8, 2025

Uh oh!

joamaki commented Aug 8, 2025

Uh oh!

Uh oh!

Uh oh!

joamaki commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marseel left a comment

Choose a reason for hiding this comment

Uh oh!

joamaki commented Aug 12, 2025

Uh oh!

joamaki commented Aug 12, 2025

Uh oh!

aditighag left a comment

Choose a reason for hiding this comment

Uh oh!

joamaki commented Aug 13, 2025

Uh oh!

borkmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joamaki commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joamaki commented Aug 14, 2025

Uh oh!

joamaki commented Aug 15, 2025

Uh oh!

Uh oh!

giorio94 commented Aug 18, 2025

Uh oh!

Uh oh!

joamaki commented Aug 6, 2025 •

edited

Loading

joamaki commented Aug 13, 2025 •

edited

Loading