-
Notifications
You must be signed in to change notification settings - Fork 3.4k
bgpv1: fix reconciliation of services with shared VIPs #35333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As multiple services can share the same VIP (e.g. using lbipam.cilium.io/sharing-key), the service reconciliation logic needs to be adapted to not withdraw a route to VIP that is still in use by some other service. To address that, this change introduces resource reference counting by reconciling service advertisements. This fix is specific to BGPv1 service reconciliation only. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
Updates the BGPv1 advertisement tests to test a case where multiple services share the same LoadBalancer VIP. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
cc68a3c
to
e5a40fd
Compare
/test |
Backport to v1.15 in #35469 lead to non-trivial merge conflicts, this will need a manual backport. Marking as |
I'm curious whether the backport criteria for older releases was considered for this fix, was this considered a major bugfix? The reason I ask is that it seems like we have BGPv2 which is where the primary focus for future BGP effort is.. the more effort we spend in backporting fixes for old code to older branches, the less time we have to make the new version better. I see that the automatic backporting failed and was rejected by the backporter, which suggests that there's significant changes going on in this area of the codebase. We have the option to only backport fixes to the latest release and avoid the extra maintenance effort of backporting to older releases, since users still have the option to upgrade to the latest stable release to address this issue. I think that we may as well take the backport into v1.15 since we already did the work for that backport, no point to revisit for this specific change. But for future changes I think it's healthy for us to have discussions about the bar for how significant a bug is before we spend the additional effort in fixing it in the older releases. If there's a common mitigation, especially if BGP control plane is beta in v1.15 then maybe we don't need to be putting the extra effort into making the v1.15 version of this more stable. |
The reason why I marked this for backport to 1.15 is that there is no workaround for the issue in v1.15 (BGP CP misbehaves if But I agree that upgrading to 1.16 is always an option, and we should rather invest the time to the new codebase (BGPv2). Will reconsider this in similar cases next time. |
FWIW this sounds like a feature request to me, not a major bugfix. If it never worked due to this issue then we haven't introduced a regression on v1.15 for users. Given there is the option to upgrade, I think we'd generally prefer users to upgrade to help us make the newer versions better. |
Makes sense, thanks for the guidance. I closed the backport PR to 1.15, as based on this discussion it does not meet the backport criteria. |
As multiple services can share the same VIP (e.g. using
lbipam.cilium.io/sharing-key
), the service reconciliation logic needs to be adapted to not withdraw a route to VIP that is still in use by some other service.To address that, this change introduces resource reference counting by reconciling service advertisements.
This fix is specific to BGPv1 service reconciliation, BGPv2 was fixed in #35166.
Fixes: #35018