Skip to content

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Apr 17, 2025

This ports the pkg/service/connections.go to the new control-plane.

@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Apr 17, 2025
@joamaki joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 7b9b15d to 58afb77 Compare April 17, 2025 13:34
@maintainer-s-little-helper

This comment was marked as resolved.

@joamaki joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 58afb77 to c4cede4 Compare April 17, 2025 13:35
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 17, 2025
@joamaki joamaki force-pushed the pr/joamaki/lb-connection-termination branch 3 times, most recently from ad75187 to 26593a1 Compare April 17, 2025 15:11
@joamaki joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Apr 17, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 17, 2025
@joamaki joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 26593a1 to 2f3b136 Compare April 22, 2025 13:27
@joamaki joamaki requested a review from tommyp1ckles April 22, 2025 13:58
@joamaki
Copy link
Contributor Author

joamaki commented Apr 22, 2025

/test

@joamaki joamaki marked this pull request as ready for review April 22, 2025 14:04
@joamaki joamaki requested review from a team as code owners April 22, 2025 14:04
@joamaki joamaki requested a review from aditighag April 22, 2025 14:04
@joamaki joamaki enabled auto-merge April 23, 2025 08:09
Copy link
Contributor

@jrife jrife left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cherry-picked my changes from #38693 on top of this branch and tested socket termination manually. Everything seemed to work; I saw the UDP socket get destroyed repeatedly.

@joamaki joamaki requested a review from tommyp1ckles April 24, 2025 07:51
@joamaki
Copy link
Contributor Author

joamaki commented Apr 28, 2025

ping @aditighag @tommyp1ckles

@joamaki joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 2f3b136 to d0fcf23 Compare April 30, 2025 08:47
@joamaki
Copy link
Contributor Author

joamaki commented Apr 30, 2025

/test

@joamaki
Copy link
Contributor Author

joamaki commented Apr 30, 2025

ping @aditighag @tommyp1ckles

@joamaki joamaki requested a review from dylandreimerink April 30, 2025 14:24
Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Posted some comments from the first pass.
(1) Can you split the 2nd commit into preparatory changes (e.g., changes related to sock rev nat map creation/utility functions, etc) and changes porting the socket termination logic to the new LB control plane?
(2) How do we ensure the socket termination loop runs only on the relevant backend DB updates?

@joamaki joamaki requested a review from aditighag May 5, 2025 13:02
@joamaki
Copy link
Contributor Author

joamaki commented May 5, 2025

/test

@aditighag
Copy link
Member

(2) Can you expand on this? Maybe give some examples of non-relevant updates? Why would we not be interested in every backend that either has been deleted or has become non-viable due to health checking? The only thing I can think of that's "not relevant" is when a backend that has become non-viable (e.g. only has quarantined backends) and then there's a change that still keeps it non-relevant, e.g. one of the quarantined backends is removed. But is there harm in that situation to go and try find sockets to terminate?

There are multiple backend attributes, state being one of them - https://github.com/cilium/cilium/blob/main/pkg/loadbalancer/backend.go#L25-L25. Do we want to run the socket termination loop for backend updates corresponding to changes to attributes such as weight, clusterID, UnhealthyUpdatedAt, etc that don't change its state?

@joamaki
Copy link
Contributor Author

joamaki commented May 6, 2025

(2) Can you expand on this? Maybe give some examples of non-relevant updates? Why would we not be interested in every backend that either has been deleted or has become non-viable due to health checking? The only thing I can think of that's "not relevant" is when a backend that has become non-viable (e.g. only has quarantined backends) and then there's a change that still keeps it non-relevant, e.g. one of the quarantined backends is removed. But is there harm in that situation to go and try find sockets to terminate?

There are multiple backend attributes, state being one of them - https://github.com/cilium/cilium/blob/main/pkg/loadbalancer/backend.go#L25-L25. Do we want to run the socket termination loop for backend updates corresponding to changes to attributes such as weight, clusterID, UnhealthyUpdatedAt, etc that don't change its state?

It waits for changes to take a look at for 50ms and then does a tree traversal to go through them to find relevant changes. This is fast and lockless, so that should be fine. If we want to reduce the overhead of context switching and tree traversal we can bump the waiting up by a bit to make this even cheaper.

Alternatively we could tie this into the reconciler and when it deletes a backend it could also queue up the socket destruction.

You got me now questioning whether this might be burning enough cycles looking through irrelevant changes that it’s worth saving those. I’ll do a benchmark today to quantify.

@joamaki joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 0006d2f to df83588 Compare May 8, 2025 09:21
@joamaki
Copy link
Contributor Author

joamaki commented May 8, 2025

(2) Can you expand on this? Maybe give some examples of non-relevant updates? Why would we not be interested in every backend that either has been deleted or has become non-viable due to health checking? The only thing I can think of that's "not relevant" is when a backend that has become non-viable (e.g. only has quarantined backends) and then there's a change that still keeps it non-relevant, e.g. one of the quarantined backends is removed. But is there harm in that situation to go and try find sockets to terminate?

There are multiple backend attributes, state being one of them - https://github.com/cilium/cilium/blob/main/pkg/loadbalancer/backend.go#L25-L25. Do we want to run the socket termination loop for backend updates corresponding to changes to attributes such as weight, clusterID, UnhealthyUpdatedAt, etc that don't change its state?

It waits for changes to take a look at for 50ms and then does a tree traversal to go through them to find relevant changes. This is fast and lockless, so that should be fine. If we want to reduce the overhead of context switching and tree traversal we can bump the waiting up by a bit to make this even cheaper.

Alternatively we could tie this into the reconciler and when it deletes a backend it could also queue up the socket destruction.

You got me now questioning whether this might be burning enough cycles looking through irrelevant changes that it’s worth saving those. I’ll do a benchmark today to quantify.

Ok, I've added a benchmark for the iteration. We're spending 300ns per backend change, which let's say for 100 backend changes would cost 600us, which is about 0.06% of one CPU core (see the comment for more details). So if we'd have a very very busy cluster it's still very little spent on this and it's unlikely that we'd have this sort of constant churn.

My argument here is that it's better to sacrifice few CPU cycles to have an implementation that is decoupled: the other code has no control coupling to the UDP socket termination and the more we implement in this style the easier the code is to work with => we strive to have each component in the system to have a single concern, referential transparency and less control coupling. This in fact is the core design principle around StateDB: we move away from tight coupling of components and imperative callbacks that not end up with a big ball of mud (e.g. see pkg/service, pkg/endpoint etc.) or reduce latency/throughput in unexpected ways, but also to avoid production incidents like dead locks when one component directly calls into the black box of another (control coupling). That is why I find the trade-off of doing little bit extra work here much better than the alternatives (e.g. BPF reconciler directly coupled to this thing when it deletes a backend).

EDIT: Ah, it's the backend instances lookup that IsAlive() does that takes most of the time. Adding a check for the protocol and skipping non-UDP backends brings the time down to 24ns/backend. So this can now process 41 million backend changes per second if the changes are all non-UDP. That seems fine?

@joamaki joamaki force-pushed the pr/joamaki/lb-connection-termination branch from df83588 to 0fa647a Compare May 8, 2025 09:36
@joamaki
Copy link
Contributor Author

joamaki commented May 9, 2025

/test

joamaki added 6 commits May 9, 2025 17:12
The Current() and All() were missing which makes the linter unhappy.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
For implementing socket termination we'll need access to the SockRevNat BPF map.
Add it to the LBMaps interface and expand implementations.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Add IsAlive() for checking whether the backend is "alive", that is it has instances
that are either active & healthy, or if no active instances then a terminating one
that is healthy.

This will be used by follow-up commit on the socket termination to check if a backend
should be considered dead and thus the sockets connected to it should be terminated.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
This ports pkg/service/connections.go to the new control-plane to terminate
connected UDP sockets when the backend is deleted or marked unhealthy.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Port the 'TestTerminateUDPConnectionsToBackend' from pkg/loadbalancer/legacy/service/service_test.go
to test the datapath side of the socket termination.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Add benchmark for validating the overhead of processing every backend
change in the socket termination loop.

Results on my machine:

goos: linux
goarch: amd64
pkg: github.com/cilium/cilium/pkg/loadbalancer/reconciler
cpu: 13th Gen Intel(R) Core(TM) i9-13950HX
BenchmarkChangeIteration_TCP
BenchmarkChangeIteration_TCP-32               50          24211174 ns/op          41303242 backends/sec         24.21 ns/backend
BenchmarkChangeIteration_UDP
BenchmarkChangeIteration_UDP-32                5         247585791 ns/op           4039004 backends/sec        247.6 ns/backend
PASS

We're spending 24ns looking at each backend change. If we assume a busy cluster with say 100 backends
changing every 50 milliseconds we would be using 24ns * 100 * (1000/50) = 48000ns = 48us of CPU time every
second or in other words 100*(48000 / 10^9) = 0.0048% of a single CPU core.

If all changes were for UDP backends we'd hit the IsAlive() check and we'd get:
247ns * 100 * (1000/50) = 494us per second, which is 0.049% of a CPU core.

Increasing the rate limit interval to say 500ms or 1s would reduce the overhead of the revision and
graveyard revision index tree traversals significantly further reducing the CPU cost, but would increase
the latency at which we react to dead backends.

This seems like an acceptable trade-off to me considering that now the termination logic is completely
decoupled from everything else. The alternative way to implement this would be to have the BPF reconciler
queue up the terminations, but this would increase overall complexity.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/lb-connection-termination branch from 0fa647a to 332a720 Compare May 9, 2025 15:16
@joamaki
Copy link
Contributor Author

joamaki commented May 12, 2025

@aditighag PTAL

@joamaki
Copy link
Contributor Author

joamaki commented May 12, 2025

/test

@joamaki joamaki dismissed aditighag’s stale review May 13, 2025 08:04

Comments addressed

Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me, any concerns were already brought up by other reviewers

@joamaki joamaki added this pull request to the merge queue May 13, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 13, 2025
@joamaki joamaki added this pull request to the merge queue May 13, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 13, 2025
@joamaki joamaki added this pull request to the merge queue May 14, 2025
Merged via the queue into cilium:main with commit 0dc2bda May 14, 2025
66 checks passed
@joamaki joamaki deleted the pr/joamaki/lb-connection-termination branch May 14, 2025 07:50
@aditighag
Copy link
Member

aditighag commented Jun 11, 2025

There are multiple backend attributes, state being one of them - https://github.com/cilium/cilium/blob/main/pkg/loadbalancer/backend.go#L25-L25. Do we want to run the socket termination loop for backend updates corresponding to changes to attributes such as weight, clusterID, UnhealthyUpdatedAt, etc that don't change its state?

Missed following up -- mainly to highlight the potential overhead of iterating kernel sockets unnecessarily. We can revisit this if users start reporting noticeable overhead.
The benchmarks you added should also help us keep track of the processing overhead in cilium.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants