-
Notifications
You must be signed in to change notification settings - Fork 3.4k
cilium: Unconditionally upsert neighbor entries #37352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(👀 into runtime ci issue) |
Some of the service runtime tests set up the service handling with a nil s.backendDiscovery: m.svc = newService(&FakeMonitorAgent{}, lbmap, nil, nil, true) Therefore, add a nil check in s.upsertBackendNeighbors and s.deleteBackendNeighbors so that when the tests run they won't panic. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Starting with Cilium 1.18, we unconditionally upsert backend neighbor entries. This means that if backends are in a different L2 domain, we always keep the gateway entry "fresh" as managed neighbor, and if backends are in the same L2 domain, the kernel always has up to date hw address. Unconditionally given even with 'services with externalIPs' type, the external IPs could be in the same L2. We already keep track of in-cluster nodes to have them as managed neighbors, so this small change additionally keeps track of external endpoints which is not expected to add a lot to the total set of managed neighbors. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
fdc2c46
to
6e88460
Compare
/test |
ysksuzuki
reviewed
Feb 4, 2025
ysksuzuki
approved these changes
Feb 5, 2025
dylandreimerink
added a commit
to dylandreimerink/cilium
that referenced
this pull request
Jun 10, 2025
PR cilium#37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
to dylandreimerink/cilium
that referenced
this pull request
Jun 10, 2025
PR cilium#37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 11, 2025
PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 11, 2025
PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. This commit also explicitly enables L2 neighbor discovery during testing in various places so we still get coverage for it. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 11, 2025
PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. This commit also explicitly enables L2 neighbor discovery during testing in various places so we still get coverage for it. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 11, 2025
PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. This commit also explicitly enables L2 neighbor discovery during testing in various places so we still get coverage for it. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 11, 2025
PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. This commit also explicitly enables L2 neighbor discovery during testing in various places so we still get coverage for it. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 11, 2025
PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. This commit also explicitly enables L2 neighbor discovery during testing in various places so we still get coverage for it. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 11, 2025
PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. This commit also explicitly enables L2 neighbor discovery during testing in various places so we still get coverage for it. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 11, 2025
PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. This commit also explicitly enables L2 neighbor discovery during testing in various places so we still get coverage for it. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 11, 2025
PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. This commit also explicitly enables L2 neighbor discovery during testing in various places so we still get coverage for it. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 12, 2025
PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. Forwardable IPs will still be created, but no further processing will be done. We already have guards in place in the desired neighbor calculator that throws an error to notify the user when they are running with XDP enabled but without L2 neighbor discovery enabled. This commit also explicitly enables L2 neighbor discovery during testing in various places so we still get coverage for it. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 12, 2025
This commit introduced the reworked neighbor subsystem logic. The neighbor logic that currently exists in the linux node handler was not designed with resilience in mind and was missing a few key features. The interface of the neighbor package towards the rest of the agent is in the form of "Forwardable IPs". Any component that wishes to guarantee that XDP can forward traffic to a given IP address can register such a forwardable IP. Since the same IP could be "owned" by multiple components (e.g. a service and node) we need some logic to manage that. The `ForwardableIPManager` is responsible for that, and all components have to go through it to register/remove their forwardable IPs. The `DesiredNeighborCalculator` takes forwardable IPs, L2 devices and the routing table as inputs and calculates the "Desired Neighbors". This works by checking the next hop for all permutations of forwardable IPs and L2 devices, then deduplicating the results. We can do a partial update when new forwardable IPs are added. We periodically do a full update to deal with removal of forwardable IPs. When the set of L2 devices changes or the contents of the routing table change, we also do a full update at once. Desired neighbors are reconciled to the kernel via a generic reconciler. The existing logic for creating neighbor table entries is reused. Entries can be kernel managed or agent managed, depending on the probed kernel support. In addition we have a "Neighbor refresher" job. It watches the neighbor statedb table for changes. When a agent managed entry goes stale, it triggers a refresh of the entry via the reconciler. This creates an active feedback loop instead of a recreating on a timer. The refresher also monitors for deletions of the neighbor entries. If we still desire a neighbor entry, it will be recreated. As final touch, we use the table initializer mechanism to ensure pruning of existing neighbor entries on startup only happens once all "sources" of forwardable IPs confirm that they have added their initial list of forwardable IPs. This ensures that we do not prune from the kernel neighbor table until we have an initial set of desired neighbors to reconcile. All of these features together eliminate a set of issues that happen when devices, routes and neighbors change outside of Cilium's control. We also eliminate issues where some entries were not being refreshed. PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. When XDP is enabled, we automatically also enable neighbor discovery. When XDP is disabled, the user can still enable neighbor discovery by passing the `--enable-l2-neighbor-discovery` flag to the agent. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 13, 2025
This commit introduced the reworked neighbor subsystem logic. The neighbor logic that currently exists in the linux node handler was not designed with resilience in mind and was missing a few key features. The interface of the neighbor package towards the rest of the agent is in the form of "Forwardable IPs". Any component that wishes to guarantee that XDP can forward traffic to a given IP address can register such a forwardable IP. Since the same IP could be "owned" by multiple components (e.g. a service and node) we need some logic to manage that. The `ForwardableIPManager` is responsible for that, and all components have to go through it to register/remove their forwardable IPs. The `DesiredNeighborCalculator` takes forwardable IPs, L2 devices and the routing table as inputs and calculates the "Desired Neighbors". This works by checking the next hop for all permutations of forwardable IPs and L2 devices, then deduplicating the results. We can do a partial update when new forwardable IPs are added. We periodically do a full update to deal with removal of forwardable IPs. When the set of L2 devices changes or the contents of the routing table change, we also do a full update at once. Desired neighbors are reconciled to the kernel via a generic reconciler. The existing logic for creating neighbor table entries is reused. Entries can be kernel managed or agent managed, depending on the probed kernel support. In addition we have a "Neighbor refresher" job. It watches the neighbor statedb table for changes. When a agent managed entry goes stale, it triggers a refresh of the entry via the reconciler. This creates an active feedback loop instead of a recreating on a timer. The refresher also monitors for deletions of the neighbor entries. If we still desire a neighbor entry, it will be recreated. As final touch, we use the table initializer mechanism to ensure pruning of existing neighbor entries on startup only happens once all "sources" of forwardable IPs confirm that they have added their initial list of forwardable IPs. This ensures that we do not prune from the kernel neighbor table until we have an initial set of desired neighbors to reconcile. All of these features together eliminate a set of issues that happen when devices, routes and neighbors change outside of Cilium's control. We also eliminate issues where some entries were not being refreshed. PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. When XDP is enabled, we automatically also enable neighbor discovery. When XDP is disabled, the user can still enable neighbor discovery by passing the `--enable-l2-neighbor-discovery` flag to the agent. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 13, 2025
This commit introduced the reworked neighbor subsystem logic. The neighbor logic that currently exists in the linux node handler was not designed with resilience in mind and was missing a few key features. The interface of the neighbor package towards the rest of the agent is in the form of "Forwardable IPs". Any component that wishes to guarantee that XDP can forward traffic to a given IP address can register such a forwardable IP. Since the same IP could be "owned" by multiple components (e.g. a service and node) we need some logic to manage that. The `ForwardableIPManager` is responsible for that, and all components have to go through it to register/remove their forwardable IPs. The `DesiredNeighborCalculator` takes forwardable IPs, L2 devices and the routing table as inputs and calculates the "Desired Neighbors". This works by checking the next hop for all permutations of forwardable IPs and L2 devices, then deduplicating the results. We can do a partial update when new forwardable IPs are added. We periodically do a full update to deal with removal of forwardable IPs. When the set of L2 devices changes or the contents of the routing table change, we also do a full update at once. Desired neighbors are reconciled to the kernel via a generic reconciler. The existing logic for creating neighbor table entries is reused. Entries can be kernel managed or agent managed, depending on the probed kernel support. In addition we have a "Neighbor refresher" job. It watches the neighbor statedb table for changes. When a agent managed entry goes stale, it triggers a refresh of the entry via the reconciler. This creates an active feedback loop instead of a recreating on a timer. The refresher also monitors for deletions of the neighbor entries. If we still desire a neighbor entry, it will be recreated. As final touch, we use the table initializer mechanism to ensure pruning of existing neighbor entries on startup only happens once all "sources" of forwardable IPs confirm that they have added their initial list of forwardable IPs. This ensures that we do not prune from the kernel neighbor table until we have an initial set of desired neighbors to reconcile. All of these features together eliminate a set of issues that happen when devices, routes and neighbors change outside of Cilium's control. We also eliminate issues where some entries were not being refreshed. PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. When XDP is enabled, we automatically also enable neighbor discovery. When XDP is disabled, the user can still enable neighbor discovery by passing the `--enable-l2-neighbor-discovery` flag to the agent. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 13, 2025
This commit introduced the reworked neighbor subsystem logic. The neighbor logic that currently exists in the linux node handler was not designed with resilience in mind and was missing a few key features. The interface of the neighbor package towards the rest of the agent is in the form of "Forwardable IPs". Any component that wishes to guarantee that XDP can forward traffic to a given IP address can register such a forwardable IP. Since the same IP could be "owned" by multiple components (e.g. a service and node) we need some logic to manage that. The `ForwardableIPManager` is responsible for that, and all components have to go through it to register/remove their forwardable IPs. The `DesiredNeighborCalculator` takes forwardable IPs, L2 devices and the routing table as inputs and calculates the "Desired Neighbors". This works by checking the next hop for all permutations of forwardable IPs and L2 devices, then deduplicating the results. We can do a partial update when new forwardable IPs are added. We periodically do a full update to deal with removal of forwardable IPs. When the set of L2 devices changes or the contents of the routing table change, we also do a full update at once. Desired neighbors are reconciled to the kernel via a generic reconciler. The existing logic for creating neighbor table entries is reused. Entries can be kernel managed or agent managed, depending on the probed kernel support. In addition we have a "Neighbor refresher" job. It watches the neighbor statedb table for changes. When a agent managed entry goes stale, it triggers a refresh of the entry via the reconciler. This creates an active feedback loop instead of a recreating on a timer. The refresher also monitors for deletions of the neighbor entries. If we still desire a neighbor entry, it will be recreated. As final touch, we use the table initializer mechanism to ensure pruning of existing neighbor entries on startup only happens once all "sources" of forwardable IPs confirm that they have added their initial list of forwardable IPs. This ensures that we do not prune from the kernel neighbor table until we have an initial set of desired neighbors to reconcile. All of these features together eliminate a set of issues that happen when devices, routes and neighbors change outside of Cilium's control. We also eliminate issues where some entries were not being refreshed. PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. When XDP is enabled, we automatically also enable neighbor discovery. When XDP is disabled, the user can still enable neighbor discovery by passing the `--enable-l2-neighbor-discovery` flag to the agent. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 13, 2025
This commit introduced the reworked neighbor subsystem logic. The neighbor logic that currently exists in the linux node handler was not designed with resilience in mind and was missing a few key features. The interface of the neighbor package towards the rest of the agent is in the form of "Forwardable IPs". Any component that wishes to guarantee that XDP can forward traffic to a given IP address can register such a forwardable IP. Since the same IP could be "owned" by multiple components (e.g. a service and node) we need some logic to manage that. The `ForwardableIPManager` is responsible for that, and all components have to go through it to register/remove their forwardable IPs. The `DesiredNeighborCalculator` takes forwardable IPs, L2 devices and the routing table as inputs and calculates the "Desired Neighbors". This works by checking the next hop for all permutations of forwardable IPs and L2 devices, then deduplicating the results. We can do a partial update when new forwardable IPs are added. We periodically do a full update to deal with removal of forwardable IPs. When the set of L2 devices changes or the contents of the routing table change, we also do a full update at once. Desired neighbors are reconciled to the kernel via a generic reconciler. The existing logic for creating neighbor table entries is reused. Entries can be kernel managed or agent managed, depending on the probed kernel support. In addition we have a "Neighbor refresher" job. It watches the neighbor statedb table for changes. When a agent managed entry goes stale, it triggers a refresh of the entry via the reconciler. This creates an active feedback loop instead of a recreating on a timer. The refresher also monitors for deletions of the neighbor entries. If we still desire a neighbor entry, it will be recreated. As final touch, we use the table initializer mechanism to ensure pruning of existing neighbor entries on startup only happens once all "sources" of forwardable IPs confirm that they have added their initial list of forwardable IPs. This ensures that we do not prune from the kernel neighbor table until we have an initial set of desired neighbors to reconcile. All of these features together eliminate a set of issues that happen when devices, routes and neighbors change outside of Cilium's control. We also eliminate issues where some entries were not being refreshed. PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. When XDP is enabled, we automatically also enable neighbor discovery. When XDP is disabled, the user can still enable neighbor discovery by passing the `--enable-l2-neighbor-discovery` flag to the agent. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink
added a commit
that referenced
this pull request
Jun 14, 2025
This commit introduced the reworked neighbor subsystem logic. The neighbor logic that currently exists in the linux node handler was not designed with resilience in mind and was missing a few key features. The interface of the neighbor package towards the rest of the agent is in the form of "Forwardable IPs". Any component that wishes to guarantee that XDP can forward traffic to a given IP address can register such a forwardable IP. Since the same IP could be "owned" by multiple components (e.g. a service and node) we need some logic to manage that. The `ForwardableIPManager` is responsible for that, and all components have to go through it to register/remove their forwardable IPs. The `DesiredNeighborCalculator` takes forwardable IPs, L2 devices and the routing table as inputs and calculates the "Desired Neighbors". This works by checking the next hop for all permutations of forwardable IPs and L2 devices, then deduplicating the results. We can do a partial update when new forwardable IPs are added. We periodically do a full update to deal with removal of forwardable IPs. When the set of L2 devices changes or the contents of the routing table change, we also do a full update at once. Desired neighbors are reconciled to the kernel via a generic reconciler. The existing logic for creating neighbor table entries is reused. Entries can be kernel managed or agent managed, depending on the probed kernel support. In addition we have a "Neighbor refresher" job. It watches the neighbor statedb table for changes. When a agent managed entry goes stale, it triggers a refresh of the entry via the reconciler. This creates an active feedback loop instead of a recreating on a timer. The refresher also monitors for deletions of the neighbor entries. If we still desire a neighbor entry, it will be recreated. As final touch, we use the table initializer mechanism to ensure pruning of existing neighbor entries on startup only happens once all "sources" of forwardable IPs confirm that they have added their initial list of forwardable IPs. This ensures that we do not prune from the kernel neighbor table until we have an initial set of desired neighbors to reconcile. All of these features together eliminate a set of issues that happen when devices, routes and neighbors change outside of Cilium's control. We also eliminate issues where some entries were not being refreshed. PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. When XDP is enabled, we automatically also enable neighbor discovery. When XDP is disabled, the user can still enable neighbor discovery by passing the `--enable-l2-neighbor-discovery` flag to the agent. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Jun 14, 2025
This commit introduced the reworked neighbor subsystem logic. The neighbor logic that currently exists in the linux node handler was not designed with resilience in mind and was missing a few key features. The interface of the neighbor package towards the rest of the agent is in the form of "Forwardable IPs". Any component that wishes to guarantee that XDP can forward traffic to a given IP address can register such a forwardable IP. Since the same IP could be "owned" by multiple components (e.g. a service and node) we need some logic to manage that. The `ForwardableIPManager` is responsible for that, and all components have to go through it to register/remove their forwardable IPs. The `DesiredNeighborCalculator` takes forwardable IPs, L2 devices and the routing table as inputs and calculates the "Desired Neighbors". This works by checking the next hop for all permutations of forwardable IPs and L2 devices, then deduplicating the results. We can do a partial update when new forwardable IPs are added. We periodically do a full update to deal with removal of forwardable IPs. When the set of L2 devices changes or the contents of the routing table change, we also do a full update at once. Desired neighbors are reconciled to the kernel via a generic reconciler. The existing logic for creating neighbor table entries is reused. Entries can be kernel managed or agent managed, depending on the probed kernel support. In addition we have a "Neighbor refresher" job. It watches the neighbor statedb table for changes. When a agent managed entry goes stale, it triggers a refresh of the entry via the reconciler. This creates an active feedback loop instead of a recreating on a timer. The refresher also monitors for deletions of the neighbor entries. If we still desire a neighbor entry, it will be recreated. As final touch, we use the table initializer mechanism to ensure pruning of existing neighbor entries on startup only happens once all "sources" of forwardable IPs confirm that they have added their initial list of forwardable IPs. This ensures that we do not prune from the kernel neighbor table until we have an initial set of desired neighbors to reconcile. All of these features together eliminate a set of issues that happen when devices, routes and neighbors change outside of Cilium's control. We also eliminate issues where some entries were not being refreshed. PR #37352 made it so backends for services unconditionally get neighbor entries. This introduced a performance regression due to the amount of netlink calls to resolve next hops. We have inherited this performance regression which still exists in this new implementation. Strictly speaking we do not need to create the neighbor entries unless we are running XDP. So lets disable L2 neighbor discovery by default. When XDP is enabled, we automatically also enable neighbor discovery. When XDP is disabled, the user can still enable neighbor discovery by passing the `--enable-l2-neighbor-discovery` flag to the agent. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/datapath
Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
release-note/misc
This PR makes changes that have no direct user impact.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(see commit desc)