Skip to content

bgpv1: fix reconciliation of services with shared VIPs #35333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 21, 2024

Conversation

rastislavs
Copy link
Contributor

As multiple services can share the same VIP (e.g. using lbipam.cilium.io/sharing-key), the service reconciliation logic needs to be adapted to not withdraw a route to VIP that is still in use by some other service.

To address that, this change introduces resource reference counting by reconciling service advertisements.

This fix is specific to BGPv1 service reconciliation, BGPv2 was fixed in #35166.

Fixes: #35018

As multiple services can share the same VIP (e.g. using
lbipam.cilium.io/sharing-key), the service reconciliation logic
needs to be adapted to not withdraw a route to VIP that is
still in use by some other service.
To address that, this change introduces resource reference
counting by reconciling service advertisements.
This fix is specific to BGPv1 service reconciliation only.

Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
@rastislavs rastislavs added release-note/bug This PR fixes an issue in a previous release of Cilium. area/bgp Impacts the Border Gateway Protocol feature. needs-backport/1.15 needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Oct 10, 2024
Updates the BGPv1 advertisement tests to test a case where
multiple services share the same LoadBalancer VIP.

Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
@rastislavs
Copy link
Contributor Author

/test

@rastislavs rastislavs marked this pull request as ready for review October 10, 2024 12:17
@rastislavs rastislavs requested a review from a team as a code owner October 10, 2024 12:17
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 21, 2024
@julianwiedmann julianwiedmann added this pull request to the merge queue Oct 21, 2024
Merged via the queue into cilium:main with commit b3a3108 Oct 21, 2024
67 checks passed
@tklauser tklauser mentioned this pull request Oct 22, 2024
10 tasks
@tklauser tklauser added backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. and removed needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Oct 22, 2024
@tklauser tklauser mentioned this pull request Oct 22, 2024
4 tasks
@tklauser
Copy link
Member

Backport to v1.15 in #35469 lead to non-trivial merge conflicts, this will need a manual backport. Marking as backport/author.

@tklauser tklauser added the backport/author The backport will be carried out by the author of the PR. label Oct 22, 2024
@rastislavs rastislavs added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 labels Oct 22, 2024
@joestringer
Copy link
Member

joestringer commented Oct 22, 2024

I'm curious whether the backport criteria for older releases was considered for this fix, was this considered a major bugfix? The reason I ask is that it seems like we have BGPv2 which is where the primary focus for future BGP effort is.. the more effort we spend in backporting fixes for old code to older branches, the less time we have to make the new version better. I see that the automatic backporting failed and was rejected by the backporter, which suggests that there's significant changes going on in this area of the codebase. We have the option to only backport fixes to the latest release and avoid the extra maintenance effort of backporting to older releases, since users still have the option to upgrade to the latest stable release to address this issue.

I think that we may as well take the backport into v1.15 since we already did the work for that backport, no point to revisit for this specific change. But for future changes I think it's healthy for us to have discussions about the bar for how significant a bug is before we spend the additional effort in fixing it in the older releases. If there's a common mitigation, especially if BGP control plane is beta in v1.15 then maybe we don't need to be putting the extra effort into making the v1.15 version of this more stable.

@rastislavs
Copy link
Contributor Author

The reason why I marked this for backport to 1.15 is that there is no workaround for the issue in v1.15 (BGP CP misbehaves if lbipam.cilium.io/sharing-key is used and this feature therefore can not be used with BGP in 1.15).

But I agree that upgrading to 1.16 is always an option, and we should rather invest the time to the new codebase (BGPv2). Will reconsider this in similar cases next time.

@github-actions github-actions bot added backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. labels Oct 22, 2024
@joestringer
Copy link
Member

(BGP CP misbehaves if lbipam.cilium.io/sharing-key is used and this feature therefore can not be used with BGP in 1.15).

FWIW this sounds like a feature request to me, not a major bugfix. If it never worked due to this issue then we haven't introduced a regression on v1.15 for users. Given there is the option to upgrade, I think we'd generally prefer users to upgrade to help us make the newer versions better.

@rastislavs
Copy link
Contributor Author

FWIW this sounds like a feature request to me, not a major bugfix

Makes sense, thanks for the guidance. I closed the backport PR to 1.15, as based on this discussion it does not meet the backport criteria.

@rastislavs rastislavs removed the backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. label Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bgp Impacts the Border Gateway Protocol feature. backport/author The backport will be carried out by the author of the PR. backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cilium BGP stops announcing shared IP
6 participants