Skip to content

Conversation

rastislavs
Copy link
Contributor

@rastislavs rastislavs commented Jun 2, 2025

As of #38008, BGPRouterManager.Stop() is called when BGP CP cell is being destroyed to properly clean up all servers when the agent is stopped. This change however affected Graceful Restart behavior and should be revisited:

GoBGP sends a Cease notification to all BGP peers as part of the server Stop() implementation (code ref), which makes Graceful Restart on the peered routers ineffective.

From rfc4724, section 4 - Graceful Restart Mechanism for BGP:

"normal BGP procedures MUST be followed when the TCP session terminates
due to the sending or receiving of a BGP NOTIFICATION message"

To make Graceful Restart work, we allow stopping the router without destroying GoBGP server, at least until GoBGP provides a way to not send Cease notifications upon server Stop() to GR-enabled peers.

The Router's Stop() method now takes a StopRequest argument, which can be used to specify whether full destroy of the Router is needed or not. Full destroy is helpful for tests, but for now it should not be used for standard Cilium agent use-case to make GR work upon agent termination.

bgp: Fix router Stop handling to not affect Graceful Restart

@rastislavs rastislavs added release-note/misc This PR makes changes that have no direct user impact. area/bgp Impacts the Border Gateway Protocol feature. labels Jun 2, 2025
GoBGP sends a Cease notification to all BGP peers as part of
the server Stop() implementation, which terminates
Graceful Restart progress on the peered routers.

To make Graceful Restart work, allow stopping the router
without destroying GoBGP server, at least until GoBGP provides
a way to not send Cease notifications upon server Stop()
to GR-enabled peers.

The Router Stop() method now takes a StopRequest argument, which
can be used to specify whether full destroy of the Router is needed or not.
Full destroy is helpful for tests, but for now it should not be used
for standard Cilium agent use-case to make GR work upon agent termination.

Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
@rastislavs rastislavs added the affects/v1.17 This issue affects v1.17 branch label Jun 2, 2025
@rastislavs
Copy link
Contributor Author

/test

@rastislavs rastislavs marked this pull request as ready for review June 2, 2025 14:29
@rastislavs rastislavs requested a review from a team as a code owner June 2, 2025 14:29
@harsimran-pabla
Copy link
Contributor

harsimran-pabla commented Jun 2, 2025

Should the label be release-note/bug?

@rastislavs rastislavs added release-note/bug This PR fixes an issue in a previous release of Cilium. and removed release-note/misc This PR makes changes that have no direct user impact. labels Jun 2, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 2, 2025
@aditighag aditighag added this pull request to the merge queue Jun 2, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 2, 2025
@joestringer joestringer added this pull request to the merge queue Jun 2, 2025
Merged via the queue into cilium:main with commit 172b0e8 Jun 2, 2025
71 of 72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.17 This issue affects v1.17 branch area/bgp Impacts the Border Gateway Protocol feature. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants