loadbalancer: Implement UDP socket termination #39012

joamaki · 2025-04-17T13:33:48Z

This ports the pkg/service/connections.go to the new control-plane.

joamaki · 2025-04-22T13:59:09Z

/test

pkg/loadbalancer/config.go

pkg/loadbalancer/reconciler/termination.go

jrife

I cherry-picked my changes from #38693 on top of this branch and tested socket termination manually. Everything seemed to work; I saw the UDP socket get destroyed repeatedly.

pkg/loadbalancer/reconciler/termination.go

joamaki · 2025-04-28T12:15:25Z

ping @aditighag @tommyp1ckles

joamaki · 2025-04-30T12:24:32Z

/test

joamaki · 2025-04-30T14:19:45Z

ping @aditighag @tommyp1ckles

aditighag

Sorry for the delay. Posted some comments from the first pass.
(1) Can you split the 2nd commit into preparatory changes (e.g., changes related to sock rev nat map creation/utility functions, etc) and changes porting the socket termination logic to the new LB control plane?
(2) How do we ensure the socket termination loop runs only on the relevant backend DB updates?

pkg/loadbalancer/reconciler/termination.go

pkg/loadbalancer/reconciler/termination_test.go

pkg/loadbalancer/maps/lbmaps.go

joamaki · 2025-05-05T13:55:53Z

/test

aditighag · 2025-05-05T22:12:58Z

(2) Can you expand on this? Maybe give some examples of non-relevant updates? Why would we not be interested in every backend that either has been deleted or has become non-viable due to health checking? The only thing I can think of that's "not relevant" is when a backend that has become non-viable (e.g. only has quarantined backends) and then there's a change that still keeps it non-relevant, e.g. one of the quarantined backends is removed. But is there harm in that situation to go and try find sockets to terminate?

There are multiple backend attributes, state being one of them - https://github.com/cilium/cilium/blob/main/pkg/loadbalancer/backend.go#L25-L25. Do we want to run the socket termination loop for backend updates corresponding to changes to attributes such as weight, clusterID, UnhealthyUpdatedAt, etc that don't change its state?

joamaki · 2025-05-06T06:06:46Z

(2) Can you expand on this? Maybe give some examples of non-relevant updates? Why would we not be interested in every backend that either has been deleted or has become non-viable due to health checking? The only thing I can think of that's "not relevant" is when a backend that has become non-viable (e.g. only has quarantined backends) and then there's a change that still keeps it non-relevant, e.g. one of the quarantined backends is removed. But is there harm in that situation to go and try find sockets to terminate?

There are multiple backend attributes, state being one of them - https://github.com/cilium/cilium/blob/main/pkg/loadbalancer/backend.go#L25-L25. Do we want to run the socket termination loop for backend updates corresponding to changes to attributes such as weight, clusterID, UnhealthyUpdatedAt, etc that don't change its state?

It waits for changes to take a look at for 50ms and then does a tree traversal to go through them to find relevant changes. This is fast and lockless, so that should be fine. If we want to reduce the overhead of context switching and tree traversal we can bump the waiting up by a bit to make this even cheaper.

Alternatively we could tie this into the reconciler and when it deletes a backend it could also queue up the socket destruction.

You got me now questioning whether this might be burning enough cycles looking through irrelevant changes that it’s worth saving those. I’ll do a benchmark today to quantify.

joamaki · 2025-05-08T09:25:50Z

(2) Can you expand on this? Maybe give some examples of non-relevant updates? Why would we not be interested in every backend that either has been deleted or has become non-viable due to health checking? The only thing I can think of that's "not relevant" is when a backend that has become non-viable (e.g. only has quarantined backends) and then there's a change that still keeps it non-relevant, e.g. one of the quarantined backends is removed. But is there harm in that situation to go and try find sockets to terminate?

There are multiple backend attributes, state being one of them - https://github.com/cilium/cilium/blob/main/pkg/loadbalancer/backend.go#L25-L25. Do we want to run the socket termination loop for backend updates corresponding to changes to attributes such as weight, clusterID, UnhealthyUpdatedAt, etc that don't change its state?

It waits for changes to take a look at for 50ms and then does a tree traversal to go through them to find relevant changes. This is fast and lockless, so that should be fine. If we want to reduce the overhead of context switching and tree traversal we can bump the waiting up by a bit to make this even cheaper.

Alternatively we could tie this into the reconciler and when it deletes a backend it could also queue up the socket destruction.

You got me now questioning whether this might be burning enough cycles looking through irrelevant changes that it’s worth saving those. I’ll do a benchmark today to quantify.

Ok, I've added a benchmark for the iteration. We're spending 300ns per backend change, which let's say for 100 backend changes would cost 600us, which is about 0.06% of one CPU core (see the comment for more details). So if we'd have a very very busy cluster it's still very little spent on this and it's unlikely that we'd have this sort of constant churn.

My argument here is that it's better to sacrifice few CPU cycles to have an implementation that is decoupled: the other code has no control coupling to the UDP socket termination and the more we implement in this style the easier the code is to work with => we strive to have each component in the system to have a single concern, referential transparency and less control coupling. This in fact is the core design principle around StateDB: we move away from tight coupling of components and imperative callbacks that not end up with a big ball of mud (e.g. see pkg/service, pkg/endpoint etc.) or reduce latency/throughput in unexpected ways, but also to avoid production incidents like dead locks when one component directly calls into the black box of another (control coupling). That is why I find the trade-off of doing little bit extra work here much better than the alternatives (e.g. BPF reconciler directly coupled to this thing when it deletes a backend).

EDIT: Ah, it's the backend instances lookup that IsAlive() does that takes most of the time. Adding a check for the protocol and skipping non-UDP backends brings the time down to 24ns/backend. So this can now process 41 million backend changes per second if the changes are all non-UDP. That seems fine?

joamaki · 2025-05-09T13:04:57Z

/test

The Current() and All() were missing which makes the linter unhappy. Signed-off-by: Jussi Maki <jussi@isovalent.com>

For implementing socket termination we'll need access to the SockRevNat BPF map. Add it to the LBMaps interface and expand implementations. Signed-off-by: Jussi Maki <jussi@isovalent.com>

Add IsAlive() for checking whether the backend is "alive", that is it has instances that are either active & healthy, or if no active instances then a terminating one that is healthy. This will be used by follow-up commit on the socket termination to check if a backend should be considered dead and thus the sockets connected to it should be terminated. Signed-off-by: Jussi Maki <jussi@isovalent.com>

This ports pkg/service/connections.go to the new control-plane to terminate connected UDP sockets when the backend is deleted or marked unhealthy. Signed-off-by: Jussi Maki <jussi@isovalent.com>

Port the 'TestTerminateUDPConnectionsToBackend' from pkg/loadbalancer/legacy/service/service_test.go to test the datapath side of the socket termination. Signed-off-by: Jussi Maki <jussi@isovalent.com>

Add benchmark for validating the overhead of processing every backend change in the socket termination loop. Results on my machine: goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/loadbalancer/reconciler cpu: 13th Gen Intel(R) Core(TM) i9-13950HX BenchmarkChangeIteration_TCP BenchmarkChangeIteration_TCP-32 50 24211174 ns/op 41303242 backends/sec 24.21 ns/backend BenchmarkChangeIteration_UDP BenchmarkChangeIteration_UDP-32 5 247585791 ns/op 4039004 backends/sec 247.6 ns/backend PASS We're spending 24ns looking at each backend change. If we assume a busy cluster with say 100 backends changing every 50 milliseconds we would be using 24ns * 100 * (1000/50) = 48000ns = 48us of CPU time every second or in other words 100*(48000 / 10^9) = 0.0048% of a single CPU core. If all changes were for UDP backends we'd hit the IsAlive() check and we'd get: 247ns * 100 * (1000/50) = 494us per second, which is 0.049% of a CPU core. Increasing the rate limit interval to say 500ms or 1s would reduce the overhead of the revision and graveyard revision index tree traversals significantly further reducing the CPU cost, but would increase the latency at which we react to dead backends. This seems like an acceptable trade-off to me considering that now the termination logic is completely decoupled from everything else. The alternative way to implement this would be to have the BPF reconciler queue up the terminations, but this would increase overall complexity. Signed-off-by: Jussi Maki <jussi@isovalent.com>

joamaki · 2025-05-12T08:20:03Z

@aditighag PTAL

joamaki · 2025-05-12T08:20:15Z

/test

Comments addressed

dylandreimerink

Look good to me, any concerns were already brought up by other reviewers

aditighag · 2025-06-11T15:55:06Z

There are multiple backend attributes, state being one of them - https://github.com/cilium/cilium/blob/main/pkg/loadbalancer/backend.go#L25-L25. Do we want to run the socket termination loop for backend updates corresponding to changes to attributes such as weight, clusterID, UnhealthyUpdatedAt, etc that don't change its state?

Missed following up -- mainly to highlight the potential overhead of iterating kernel sockets unnecessarily. We can revisit this if users start reporting noticeable overhead.
The benchmarks you added should also help us keep track of the processing overhead in cilium.

This comment was marked as resolved.

Sign in to view

maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Apr 17, 2025

joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 7b9b15d to 58afb77 Compare April 17, 2025 13:34

This comment was marked as resolved.

Sign in to view

joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 58afb77 to c4cede4 Compare April 17, 2025 13:35

maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 17, 2025

joamaki force-pushed the pr/joamaki/lb-connection-termination branch 3 times, most recently from ad75187 to 26593a1 Compare April 17, 2025 15:11

joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Apr 17, 2025

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 17, 2025

joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 26593a1 to 2f3b136 Compare April 22, 2025 13:27

joamaki requested a review from tommyp1ckles April 22, 2025 13:58

joamaki marked this pull request as ready for review April 22, 2025 14:04

joamaki requested review from a team as code owners April 22, 2025 14:04

joamaki requested a review from aditighag April 22, 2025 14:04

tommyp1ckles reviewed Apr 23, 2025

View reviewed changes

pkg/loadbalancer/config.go Outdated Show resolved Hide resolved

pkg/loadbalancer/reconciler/termination.go Show resolved Hide resolved

pkg/loadbalancer/reconciler/termination.go Show resolved Hide resolved

joamaki enabled auto-merge April 23, 2025 08:09

jrife reviewed Apr 23, 2025

View reviewed changes

pkg/loadbalancer/reconciler/termination.go Show resolved Hide resolved

joamaki requested a review from tommyp1ckles April 24, 2025 07:51

joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 2f3b136 to d0fcf23 Compare April 30, 2025 08:47

joamaki requested a review from dylandreimerink April 30, 2025 14:24

aditighag previously requested changes Apr 30, 2025

View reviewed changes

pkg/loadbalancer/reconciler/termination.go Show resolved Hide resolved

pkg/loadbalancer/reconciler/termination_test.go Outdated Show resolved Hide resolved

pkg/loadbalancer/maps/lbmaps.go Show resolved Hide resolved

joamaki requested a review from aditighag May 5, 2025 13:02

joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 0006d2f to df83588 Compare May 8, 2025 09:21

joamaki force-pushed the pr/joamaki/lb-connection-termination branch from df83588 to 0fa647a Compare May 8, 2025 09:36

joamaki added 6 commits May 9, 2025 17:12

netns: Add missing methods to netns_other.go

232df17

The Current() and All() were missing which makes the linter unhappy. Signed-off-by: Jussi Maki <jussi@isovalent.com>

loadbalancer/maps: Add SockRevNatMap

b551b43

For implementing socket termination we'll need access to the SockRevNat BPF map. Add it to the LBMaps interface and expand implementations. Signed-off-by: Jussi Maki <jussi@isovalent.com>

loadbalancer/reconciler: Implement socket termination

b4d63ac

This ports pkg/service/connections.go to the new control-plane to terminate connected UDP sockets when the backend is deleted or marked unhealthy. Signed-off-by: Jussi Maki <jussi@isovalent.com>

loadbalancer/reconciler: Datapath test of socket termination

db8505a

Port the 'TestTerminateUDPConnectionsToBackend' from pkg/loadbalancer/legacy/service/service_test.go to test the datapath side of the socket termination. Signed-off-by: Jussi Maki <jussi@isovalent.com>

joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 0fa647a to 332a720 Compare May 9, 2025 15:16

dylandreimerink approved these changes May 13, 2025

View reviewed changes

joamaki added this pull request to the merge queue May 13, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 13, 2025

joamaki added this pull request to the merge queue May 13, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 13, 2025

joamaki added this pull request to the merge queue May 14, 2025

Merged via the queue into cilium:main with commit 0dc2bda May 14, 2025
66 checks passed

joamaki deleted the pr/joamaki/lb-connection-termination branch May 14, 2025 07:50

joamaki mentioned this pull request Jun 12, 2025

sockets: Terminate sockets with BPF socket iterators #38693

Merged

jrife mentioned this pull request Aug 1, 2025

Improve efficiency of socket termination #40862

Open

loadbalancer: Implement UDP socket termination #39012

loadbalancer: Implement UDP socket termination #39012

Uh oh!

Conversation

joamaki commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

joamaki commented Apr 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jrife left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joamaki commented Apr 28, 2025

Uh oh!

joamaki commented Apr 30, 2025

Uh oh!

joamaki commented Apr 30, 2025

Uh oh!

aditighag left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joamaki commented May 5, 2025

Uh oh!

aditighag commented May 5, 2025

Uh oh!

joamaki commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joamaki commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joamaki commented May 9, 2025

Uh oh!

joamaki commented May 12, 2025

Uh oh!

joamaki commented May 12, 2025

Uh oh!

dylandreimerink left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aditighag commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

joamaki commented Apr 17, 2025 •

edited

Loading

joamaki commented May 6, 2025 •

edited

Loading

joamaki commented May 8, 2025 •

edited

Loading

aditighag commented Jun 11, 2025 •

edited

Loading