Skip to content

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Aug 8, 2025

Cilium's topology-aware implementation did not correctly implement the following safe guards 1:

  1. If no fitting backend found for current zone us backends from all zones
  2. Use all backends if one or more endpoints did not have zone hints

It did correctly implement:
(1. insufficient endpoints (endpointslice controller responsibility)
(2. unbalanced allocation (endpointslice controller responsibility)
3. Use all backends if local node missing zone label

This is fixed by doing an additional iteration over the backends to check
4. and 5. before trying to apply the zone hints.

Fixes: e35c099 ("experimental: Implement support for topology-aware routing")
Fixes: #41022

Add missing safeguards to topology-aware routing: use all backends when no suitable one matching the zone hints are found or a backend exists without a zone hint.

Cilium's topology-aware implementation did not correctly implement
the following safe guards [1]:

4. If no fitting backend found for current zone us backends from all zones
5. Use all backends if one or more endpoints did not have zone hints

It did correctly implement:
(1. insufficient endpoints (endpointslice controller responsibility)
(2. unbalanced allocation (endpointslice controller responsibility)
3. Use all backends if local node missing zone label

This is fixed by doing an additional iteration over the backends to check
4. and 5. before trying to apply the zone hints.

[1]: https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/#enabling-topology-aware-routing.

Fixes: e35c099 ("experimental: Implement support for topology-aware routing")
Fixes: cilium#41022
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki requested a review from a team as a code owner August 8, 2025 13:31
@joamaki joamaki added release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch labels Aug 8, 2025
@joamaki joamaki requested review from aditighag and brb August 8, 2025 13:31
@joamaki
Copy link
Contributor Author

joamaki commented Aug 8, 2025

/test

@joamaki joamaki enabled auto-merge August 8, 2025 13:34
@aanm aanm added the release-blocker/1.18 This issue will prevent the release of the next version of Cilium. label Aug 11, 2025
Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@joamaki joamaki added this pull request to the merge queue Aug 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 11, 2025
@joamaki joamaki added this pull request to the merge queue Aug 11, 2025
Merged via the queue into cilium:main with commit 586061f Aug 11, 2025
76 checks passed
@joamaki joamaki deleted the pr/joamaki/lb-fix-topology-aware-fallback branch August 11, 2025 09:36
@github-project-automation github-project-automation bot moved this from Proposed to Done in Release blockers Aug 11, 2025
@YutaroHayakawa YutaroHayakawa mentioned this pull request Aug 11, 2025
10 tasks
@joestringer joestringer added the backport/author The backport will be carried out by the author of the PR. label Aug 12, 2025
@aanm aanm removed the release-blocker/1.18 This issue will prevent the release of the next version of Cilium. label Aug 12, 2025
@aanm
Copy link
Member

aanm commented Aug 13, 2025

@joamaki FYI this PR has not been backported to v1.18 even tho it has the needs-backport/1.18 label.

@julianwiedmann
Copy link
Member

Looks like the backport happened here: #41116

@julianwiedmann julianwiedmann added backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. and removed needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch labels Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/author The backport will be carried out by the author of the PR. backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

trafficDistribution failing when no endpoint in current zone
5 participants