bpf: fib: always use bpf_redirect_neigh() when available #37725

jrife · 2025-02-18T19:25:33Z

In some environments, neighbor entries for non-cluster hosts remain stale forever, breaking connectivity in cases where their IP is reassigned to a new host with a different MAC address. This isn't a problem for hosts within the cluster; Cilium sets the managed and extern_learn flags for those neighbor entries, so the kernel actively refreshes them.

In our particular case, we have a set of VMs forming our Kubernetes cluster running a NodePort service and another VM in the same L2 domain talking to that service. This set of VMs is upgraded as a unit by spinning up new VMs, draining the old ones, and reassigning the IP addresses, although not necessarily in that order. At some point during this process, the MAC address assigned to the IP for that external VM changes. The network switches in our setup enable ARP suppression which, as I understand it, suppresses GARPs if the ARP suppression cache on the switch already has an entry for a given MAC address. This, combined with Cilium's current behavior leads to a situation where we we need to apply some workarounds to get things talking again. What we hope would occur is that cluster nodes would send an ARP request for any STALE neighbor entries in their ARP table, but currently this never happens.

fib_redirect() and others call bpf_fib_lookup() and pass its result along to fib_do_redirect(). Currently, fib_do_redirect() only uses bpf_redirect_neigh() if the FIB lookup was unsuccessful, returning BPF_FIB_LKUP_RET_NO_NEIGH. Otherwise, it "manually" populates the L2 header of the packet and uses bpf_redirect() to send it out through the host interface. However, bpf_fib_lookup() returns BPF_FIB_LKUP_RET_SUCCESS regardless of the state of the neighbor entry, and bpf_redirect() doesn't trigger a neighbor event in the kernel. The net result is that stale entries can remain stale which is problematic in setups like the one described above.

The first commit makes it so Cilium always uses bpf_redirect_neigh() when available to trigger refresh of stale entries and avoid corner cases like ours where invalid entries can can remain stale forever.
The last two commits use the BPF_FIB_LOOKUP_SKIP_NEIGH [1] flag in preceding calls to bpf_fib_lookup() to avoid duplicate work in cases where we want to defer to bpf_redirect_neigh() for the neighbor resolution. The first commit introduces a probe that ensures that BPF_FIB_LOOKUP_SKIP_NEIGH is supported on the current system, as some older kernels may support bpf_fib_lookup() but not the BPF_FIB_LOOKUP_SKIP_NEIGH flag. The second commit plumbs through the probe result to let the FIB code decide whether or not to set it.

[1]: bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup

Without This Change

The neighbor entry for 192.168.4.255, a VM in the same L2 domain as my cluster nodes, remains STALE after making an HTTP request to a NodePort service hosted on that node.

jordan@crow:~$ ip neigh | grep 225
192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 STALE
...
jordan@crow:~$ ip neigh | grep 225
192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 STALE

With This Change

The HTTP request from 192.168.4.255 triggers a neighbor event on the cluster node, so the neighbor entry transitions from STALE to REACHABLE.

jordan@crow:~$ ip neigh | grep 192.168.4.225
192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 STALE
...
jordan@crow:~$ ip neigh | grep 192.168.4.225
192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 DELAY
...
jordan@crow:~$ ip neigh | grep 192.168.4.225
192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 REACHABLE

Fixes: #34503

bpf: fib: Fix issue where neighbor entries remain stale forever in some cases.

ysksuzuki

Thank you for the PR! I have two questions.

Have you considered making this behavior, where bpf_redirect_neigh() is always used when available, opt-in instead of enabling it by default? Or have you already evaluated that the potential performance impact would be trivial?
Does this issue occur only when the destination is a host outside the cluster? If so, would it be worth considering an optimization where bpf_redirect_neigh() is always used only when the destination is outside the cluster?

borkmann · 2025-02-19T10:46:50Z

In some environments, neighbor entries for non-cluster hosts remain stale forever, breaking connectivity in cases where their IP is reassigned to a new host with a different MAC address. This isn't a problem for hosts within the cluster; Cilium sets the managed and extern_learn flags for those neighbor entries, so the kernel actively refreshes them.

In our particular case, we have a set of VMs forming our Kubernetes cluster running a NodePort service and another VM in the same L2 domain talking to that service. This set of VMs is upgraded as a unit by spinning up new VMs, draining the old ones, and reassigning the IP addresses, although not necessarily in that order. At some point during this process, the MAC address assigned to the IP for that external VM changes. The network switches in our setup enable ARP suppression which, as I understand it, suppresses GARPs if the ARP suppression cache on the switch already has an entry for a given MAC address. This, combined with Cilium's current behavior leads to a situation where we we need to apply some workarounds to get things talking again. What we hope would occur is that cluster nodes would send an ARP request for any STALE neighbor entries in their ARP table, but currently this never happens.

fib_redirect() and others call bpf_fib_lookup() and pass its result along to fib_do_redirect(). Currently, fib_do_redirect() only uses bpf_redirect_neigh() if the FIB lookup was unsuccessful, returning BPF_FIB_LKUP_RET_NO_NEIGH. Otherwise, it "manually" populates the L2 header of the packet and uses bpf_redirect() to send it out through the host interface. However, bpf_fib_lookup() returns BPF_FIB_LKUP_RET_SUCCESS regardless of the state of the neighbor entry, and bpf_redirect() doesn't trigger a neighbor event in the kernel. The net result is that stale entries can remain stale which is problematic in setups like the one described above.

Always use bpf_redirect_neigh() if available to trigger refresh of stale entries and avoid corner cases like ours where invalid entries can can remain stale forever. Use BPF_FIB_LOOKUP_SKIP_NEIGH [1] in preceding calls to bpf_fib_lookup() to avoid duplicate work.

[1]: bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup

Hey Jordan, just a question for clarification/better understanding:

In the kernel's BPF fib lookup helper, we test nud_state for NUD_VALID (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/filter.c#n6180):

	if (!neigh || !(READ_ONCE(neigh->nud_state) & NUD_VALID))
		return BPF_FIB_LKUP_RET_NO_NEIGH;

The NUD_VALID is defined as (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/net/neighbour.h#n39):

#define NUD_VALID	(NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE|NUD_STALE|NUD_DELAY)

The kernel has a NEIGH_VAR_BASE_REACHABLE_TIME_MS which iiuc defines the period:

base_reachable_time (since Linux 2.2) Once a neighbor has been found, the entry is considered
to be valid for at least a random value between base_reachable_time/2 and 3*base_reachable_time/2.
An entry's validity will be extended if it receives positive feedback from higher level protocols.
Defaults to 30 seconds. This file is now obsolete in favor of base_reachable_time_ms.
base_reachable_time_ms (since Linux 2.6.12) As for base_reachable_time, but measures time in
milliseconds. Defaults to 30000 milliseconds.

Dumb q: Would it rather make sense to tweak base reachable time instead? Is that mostly for NUD_REACHABLE state or does it also relate to NUD_STALE somehow?

Are you saying with redirect_neigh() the NUD_STALE automatically triggers a new resolution to go back into NUD_REACHABLE?

Edit: found the state diagram:

Ok, so in the kernel the transition from NUD_REACHABLE -> NUD_STALE does trigger re-resolution:

/* Neighbour state is suspicious;
   disable fast path.

   Called with write_locked neigh.
 */
static void neigh_suspect(struct neighbour *neigh)
{
	neigh_dbg(2, "neigh %p is suspected\n", neigh);

	WRITE_ONCE(neigh->output, neigh->ops->output);
}

borkmann · 2025-02-19T13:33:08Z

Without This Change

The neighbor entry for 192.168.4.255, a VM in the same L2 domain as my cluster nodes, remains STALE after making an HTTP request to a NodePort service hosted on that node.
jordan@crow:~$ ip neigh | grep 225
192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 STALE
...
jordan@crow:~$ ip neigh | grep 225
192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 STALE
With This Change

The HTTP request from 192.168.4.255 triggers a neighbor event on the cluster node, so the neighbor entry transitions from STALE to REACHABLE.
jordan@crow:~$ ip neigh | grep 192.168.4.225
192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 STALE
...
jordan@crow:~$ ip neigh | grep 192.168.4.225
192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 DELAY
...
jordan@crow:~$ ip neigh | grep 192.168.4.225
192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 REACHABLE

Is it correct to assume that the above entries are the backend IPs? If yes, would this PR which is in main already have helped? #37352

jrife · 2025-02-19T16:44:31Z

@ysksuzuki,

Thank you for the PR! I have two questions.

Have you considered making this behavior, where bpf_redirect_neigh() is always used when available, opt-in instead of enabling it by default? Or have you already evaluated that the potential performance impact would be trivial?

Thank you for taking a look! In my view this should simply be the default behavior if there are no ill effects, as it works for a broader set of circumstances. I've run some iperf3 benchmarks at 100 Gbps and did not experience any noticeable changes in throughput or CPU utilization between this and the old approach. I will try to find the measurements somewhere and provide them here. I'm happy to run some additional benchmarks if you are not convinced though :).

Does this issue occur only when the destination is a host outside the cluster? If so, would it be worth considering an optimization where bpf_redirect_neigh() is always used only when the destination is outside the cluster?

The issue I experienced does, yes. Outside the cluster but in the same L2 domain as the cluster nodes. I believe it may further complicate things in fib.h to need to differentiate between cluster and non-cluster IPs as we invoke fib_do_redirect(). If there is no performance concern I think we should aim for the simplest approach.

@borkmann,

Dumb q: Would it rather make sense to tweak base reachable time instead? Is that mostly for NUD_REACHABLE state or does it also relate to NUD_STALE somehow?

If my read of the docs and code is correct, I'm not sure tweaking this solves the underlying issue. Seems like it just takes longer for the entry to go stale? A long enough timeout may mask the issue I described in one environment, but as with any timeout value, what works in one environment may not work for another.

Are you saying with redirect_neigh() the NUD_STALE automatically triggers a new resolution to go back into NUD_REACHABLE?

Right, any subsequent calls to neigh_output() after neigh_suspect() happens (when an entry goes stale) triggers neigh_resolve_output() as opposed to the usual neigh_connected_output(). neigh_resolve_output() triggers neigh_event_send() before jumping into dev_queue_xmit().

static const struct neigh_ops arp_generic_ops = {
	.family =		AF_INET,
	.solicit =		arp_solicit,
	.error_report =		arp_error_report,
	.output =		neigh_resolve_output,
	.connected_output =	neigh_connected_output,
};

static const struct neigh_ops arp_hh_ops = {
	.family =		AF_INET,
	.solicit =		arp_solicit,
	.error_report =		arp_error_report,
	.output =		neigh_resolve_output,
	.connected_output =	neigh_resolve_output,
};

static inline int neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
{
	return neigh_event_send_probe(neigh, skb, true);
}

Is it correct to assume that the above entries are the backend IPs? If yes, would this PR which is in main already have helped? #37352

To clarify, 192.168.4.225 is the IP of another VM in the same L2 domain as my cluster nodes but not part of the cluster itself. The neighbor entry exists on my cluster node (crow) as 192.168.4.225 talks to a node port service hosted on crow. I'm not sure #37352 helps here, as it seems to be targeting neighbor entries for backends managed by Cilium itself.

All in all, bpf_redirect_neigh() keeps neighbor entries fresh in addition to doing the neigh table lookup. By and large, Cilium keeps most relevant neighbor entries fresh (managed extern_learn for cluster nodes, etc.), but there are some corner cases lurking like this one. I was actually surprised to learn that Cilium doesn't already just default to bpf_redirect_neigh() wherever possible, as it seems like the behavior more closely matches what you'd get when using the kernel's stack. This could be my own ignorance talking though.

Are there performance reasons for the current preference for bpf_redirect() in most cases? I was concerned at first there might be a performance hit switching to bpf_redirect_neigh() in all cases due to the duplicated neighbor lookup (at least without BPF_FIB_LOOKUP_SKIP_NEIGH) in bpf_fib_lookup() and some of the extra work it has to do compared to bpf_redirect(). I was unable to detect any significant differences with 100 Gbps iperf3 test which I'm happy to share here, although that doesn't mean they don't exist :). If indeed we are concerned about every ounce of performance and bpf_redirect_neigh() is noticeably slower or worse in some dimension, perhaps we can be a bit more intelligent about when we use bpf_redirect_neigh() as @ysksuzuki hinted at above; however, I wanted to explore the simple approach first.

ysksuzuki · 2025-02-20T02:32:32Z

To clarify, 192.168.4.225 is the IP of another VM in the same L2 domain as my cluster nodes but not part of the cluster itself. The neighbor entry exists on my cluster node (crow) as 192.168.4.225 talks to a node port service hosted on crow

@jrife Just to confirm, the issue occurs when the datapath handles the reply traffic from the backends at nodeport_rev_dnat_ingress_ipv[4,6], and it hits a stale ARP cache entry for the client 192.168.4.225?

We can rely on the managed and extern_learn neighbor entries for the backends in the forward path. (With #37352, the Cilium agent manages these entries even if the backends are external hosts.) However, we can't rely on them in the reply path, as the Cilium agent doesn't manage the client's neighbor entries.

jrife · 2025-02-20T17:54:33Z

@jrife Just to confirm, the issue occurs when the datapath handles the reply traffic from the backends at nodeport_rev_dnat_ingress_ipv[4,6], and it hits a stale ARP cache entry for the client 192.168.4.225?

This is correct.

jrife · 2025-03-19T20:34:29Z

Bumping this thread again, since I'm not sure what the verdict was here @ysksuzuki .

jrife · 2025-04-21T17:23:28Z

@borkmann @ysksuzuki Looks like this one went stale. I'm wondering what the resolution is here. Is there anything in particular you are concerned about or data you would like to see?

julianwiedmann · 2025-05-06T13:56:52Z

My immediate reaction to this was "ugh, this feels like a workaround for something that should be improved in the kernel. And it won't help bpf_xdp when encountering the same problem".

At the same time, once #38308 is merged I feel it's a good moment to take a fresh look at all our FIB-driven redirect handling. And see whether we can simplify the logic more, or extend it to also cover scenarios like this.

jrife · 2025-05-06T23:28:38Z

My immediate reaction to this was "ugh, this feels like a workaround for something that should be improved in the kernel. And it won't help bpf_xdp when encountering the same problem".

From my point of view, it's less a kernel issue and more that the behavior of bpf_redirect skips a step that the kernel would normally perform on the way out, namely engaging with the neighboring subsystem and triggering side-effects like neighbor resolution on stale entires. bpf_redirect_neigh does do this stuff, so all things being equal (including performance) I'm not sure why you wouldn't just prefer redirect_neigh here; it seems like you'd be less likely to hit weird edge cases like this. Any insight into why bpf_redirect is currently preferred here?

borkmann · 2025-05-07T08:14:55Z

My immediate reaction to this was "ugh, this feels like a workaround for something that should be improved in the kernel. And it won't help bpf_xdp when encountering the same problem".

From my point of view, it's less a kernel issue and more that the behavior of bpf_redirect skips a step that the kernel would normally perform on the way out, namely engaging with the neighboring subsystem and triggering side-effects like neighbor resolution on stale entires. bpf_redirect_neigh does do this stuff, so all things being equal (including performance) I'm not sure why you wouldn't just prefer redirect_neigh here; it seems like you'd be less likely to hit weird edge cases like this. Any insight into why bpf_redirect is currently preferred here?

Yes, we should just stick to redirect_neigh and remove the corner cases. For XDP it's unfortunately not possible to resolve anything ad-hoc, so there we need to rely on managed neigh to keep entries fresh and have this corner case wrt staleness for non-managed neighbors.

julianwiedmann · 2025-05-13T14:46:45Z

At the same time, once #38308 is merged I feel it's a good moment to take a fresh look at all our FIB-driven redirect handling. And see whether we can simplify the logic more, or extend it to also cover scenarios like this.

Started here, would be cool to rebase once this lands and see how easy the code looks overall.

bpf/lib/egress_gateway.h

jrife · 2025-05-13T17:14:21Z

Started #39490, would be cool to rebase once this lands and see how easy the code looks overall.

Sure I can wait for this to merge. This PR probably needs a rebase in general, since it's a bit old.

jrife · 2025-08-15T17:32:09Z

ci-gateway-api failure looks like #40243. Retriggering.

jrife · 2025-08-15T17:32:14Z

/ci-ginkgo

jrife · 2025-08-15T17:33:08Z

/ci-gateway-api

jrife · 2025-08-15T18:04:47Z

ci-ginkgo is failing fairly consistently with

• Failure [173.568 seconds]
K8sDatapathServicesTest
/home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:461
  Checks N/S loadbalancing
  /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:461
    With ClusterIP external access
    /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:461
      ClusterIP can be accessed when external access is enabled [It]
      /home/runner/work/cilium/cilium/test/ginkgo-ext/scopes.go:515

      cannot curl to service IP from host: time-> DNS: '0.000022()', Connect: '0.000000',Transfer '0.000000', total '0.000000'command terminated with exit code 7
      
      Expected command: kubectl exec -n kube-system log-gatherer-52wsk -- curl -k --path-as-is -s -D /dev/stderr --fail --connect-timeout 5 --max-time 20 [http://10.96.114.208:80/](http://10.96.114.208/) -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'" 
      To succeed, but it failed:
      Exitcode: 7 
      Err: exit status 7
      Stdout:
       	 time-> DNS: '0.000022()', Connect: '0.000000',Transfer '0.000000', total '0.000000'
      Stderr:
       	 command terminated with exit code 7

At a glance, this seems unrelated to this PR, as this is the first time I'm seeing this and started happening after I rebased. Inspecting the most recent runs I see several runs with the same failure:

I'll do a bit more investigation and perhaps file a CI bug.

Edit: Another PR with the same failure: #41190

joestringer · 2025-08-15T18:14:09Z

FWIW looking at the above links, if you browse to "Echo Workflow Dispatch Inputs" and expand the step inside, it's clear that these results are each from different PRs. The last link skipped that step, which implies the workflow was triggered directly (likely due to schedule) rather than via workflow_dispatch, so that likely means that it was also observed directly on a branch.

jrife · 2025-08-15T18:20:13Z

@joestringer Thanks. I just discovered the "Echo Workflow Dispatch Inputs" step after desperately searching for a way to correlate the run with its origin. I updated my description above to point to another example I found. Since this seems like a CI regression and not localized to this PR, I'll open an issue.

jrife · 2025-08-18T19:55:57Z

/test

rgo3

Only left one more nit, otherwise this LGTM for loader changes. Thanks for your patience Jordan!

In some environments, neighbor entries for non-cluster hosts remain stale forever, breaking connectivity in cases where their IP is reassigned to a new host with a different MAC address. This isn't a problem for hosts within the cluster; Cilium sets the `managed` and `extern_learn` flags for those neighbor entries, so the kernel actively refreshes them. In our particular case, we have a set of VMs forming our Kubernetes cluster running a NodePort service and another VM in the same L2 domain talking to that service. This set of VMs is upgraded as a unit by spinning up new VMs, draining the old ones, and reassigning the IP addresses, although not necessarily in that order. At some point during this process, the MAC address assigned to the IP for that external VM changes. The network switches in our setup enable ARP suppression which, as I understand it, suppresses GARPs if the ARP suppression cache on the switch already has an entry for a given MAC address. This, combined with Cilium's current behavior leads to a situation where we we need to apply some workarounds to get things talking again. What we hope would occur is that cluster nodes would send an ARP request for any STALE neighbor entries in their ARP table, but currently this never happens. fib_redirect() and others call bpf_fib_lookup() and pass its result along to fib_do_redirect(). Currently, fib_do_redirect() only uses bpf_redirect_neigh() if the FIB lookup was unsuccessful, returning BPF_FIB_LKUP_RET_NO_NEIGH. Otherwise, it "manually" populates the L2 header of the packet and uses bpf_redirect() to send it out through the host interface. However, bpf_fib_lookup() returns BPF_FIB_LKUP_RET_SUCCESS regardless of the state of the neighbor entry, and bpf_redirect() doesn't trigger a neighbor event in the kernel. The net result is that stale entries can remain stale which is problematic in setups like the one described above. Always use bpf_redirect_neigh() if available to trigger refresh of stale entries and avoid corner cases like ours where invalid entries can can remain stale forever. Without This Change ------------------- The neighbor entry for 192.168.4.255, a VM in the same L2 domain as my cluster nodes, remains STALE after making an HTTP request to a NodePort service hosted on that node. jordan@crow:~$ ip neigh | grep 225 192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 STALE ... jordan@crow:~$ ip neigh | grep 225 192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 STALE With This Change ---------------- The HTTP request from 192.168.4.255 triggers a neighbor event on the cluster node, so the neighbor entry transitions from STALE to REACHABLE. jordan@crow:~$ ip neigh | grep 192.168.4.225 192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 STALE ... jordan@crow:~$ ip neigh | grep 192.168.4.225 192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 DELAY ... jordan@crow:~$ ip neigh | grep 192.168.4.225 192.168.4.225 dev enp5s0 lladdr bc:24:11:93:7f:d4 REACHABLE Fixes: cilium#34503 Signed-off-by: Jordan Rife <jrife@google.com>

Prepare for the next patch which passes BPF_FIB_LOOKUP_SKIP_NEIGH to bpf_fib_lookup in cases where we want to defer to bpf_redirect_neigh for neighbor resolution by adding a probe to see if the current system supports it. Fixes: cilium#34503 Signed-off-by: Jordan Rife <jrife@google.com>

Use BPF_FIB_LOOKUP_SKIP_NEIGH in cases where the system supports it and we want to defer to bpf_redirect_neigh for neighbor resolution. Fixes: cilium#34503 Signed-off-by: Jordan Rife <jrife@google.com>

pchaigno · 2025-08-19T16:23:27Z

/test

jrife · 2025-08-19T18:17:27Z

The ci-multi-pool failure seems to be an existing flake looking at a series of recent runs. Created #41283 with details.

jrife · 2025-08-19T18:18:18Z

/ci-multi-pool

jrife · 2025-08-19T18:27:01Z

Retriggering ci-eks. Looks like the node groups couldn't be created (#40836). Link

jrife · 2025-08-19T18:27:53Z

/ci-eks

julianwiedmann · 2025-09-02T09:35:17Z

bpf/include/bpf/config/node.h

+NODE_CONFIG(bool, supports_fib_lookup_skip_neigh,
+	    "Whether or not BPF_FIB_LOOKUP_SKIP_NEIGH is supported.")


For my own education - how would we accurately test this in bpf/complexity-tests ?

In the past we would simply add the relevant HAVE_FEATURE_FOO macro to the configs for relevant kernel versions. But for load-time configs that's no longer possible ...?

See #40367.

👍 ok so on a casual look the new verifier-loader still needs to learn (for instance here) how to set up SupportsFibLookupSkipNeigh according to the tested kernel version.

And that matches the fact that ci-verifier currently runs all its permutations with SupportsFibLookupSkipNeigh:false, even on bpf-next.

Yes, when we add new load time config option or convert from compile time we should add them to the ...Permutations function(s). We still have to figure out which combinations we want to run and which we don't care about as much (we are likely most interested in combinations that result in higher complexity, not all permutations possible).

Since that logic in Go, we should indeed be able to use probes instead of hard-coding settings to certain kernel versions.

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 18, 2025

jrife force-pushed the jrife/always-use-bpf_redirect_neigh branch from a97bbea to b342aed Compare February 18, 2025 21:59

jrife marked this pull request as ready for review February 18, 2025 22:28

jrife requested review from a team as code owners February 18, 2025 22:28

jrife requested a review from ysksuzuki February 18, 2025 22:28

jrife force-pushed the jrife/always-use-bpf_redirect_neigh branch 2 times, most recently from e72bf8a to d176280 Compare February 18, 2025 22:33

ysksuzuki reviewed Feb 19, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Apr 19, 2025

github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Apr 22, 2025

julianwiedmann self-requested a review April 22, 2025 13:12

julianwiedmann added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels May 6, 2025

maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels May 6, 2025

julianwiedmann reviewed May 13, 2025

View reviewed changes

bpf/lib/egress_gateway.h Outdated Show resolved Hide resolved

jrife mentioned this pull request Aug 15, 2025

CI: Conformance Ginkgo (ci-ginkgo): E2E Test (1.33, f14-datapath-service-ns-xdp-2): ClusterIP can be accessed when external access is enabled #41194

Closed

jrife force-pushed the jrife/always-use-bpf_redirect_neigh branch 4 times, most recently from 6ab39c0 to d101914 Compare August 18, 2025 19:38

rgo3 approved these changes Aug 18, 2025

View reviewed changes

pchaigno enabled auto-merge August 19, 2025 06:31

jrife added 3 commits August 19, 2025 08:35

bpf: fib: Use BPF_FIB_LOOKUP_SKIP_NEIGH when available

a7aeb35

Use BPF_FIB_LOOKUP_SKIP_NEIGH in cases where the system supports it and we want to defer to bpf_redirect_neigh for neighbor resolution. Fixes: cilium#34503 Signed-off-by: Jordan Rife <jrife@google.com>

jrife force-pushed the jrife/always-use-bpf_redirect_neigh branch from d101914 to a7aeb35 Compare August 19, 2025 15:43

jrife mentioned this pull request Aug 19, 2025

CI: Conformance Multi Pool IPAM (ci-multi-pool): check-log-errors: "No response from probe" #41283

Closed

pchaigno added this pull request to the merge queue Aug 19, 2025

Merged via the queue into cilium:main with commit 430dd4f Aug 19, 2025
68 checks passed

jrife mentioned this pull request Aug 19, 2025

Fix --exclude-local-address with eBPF Host-Routing #41275

Merged

julianwiedmann reviewed Sep 2, 2025

View reviewed changes

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 2, 2025

		NODE_CONFIG(bool, supports_fib_lookup_skip_neigh,
		"Whether or not BPF_FIB_LOOKUP_SKIP_NEIGH is supported.")

bpf: fib: always use bpf_redirect_neigh() when available #37725

bpf: fib: always use bpf_redirect_neigh() when available #37725

Uh oh!

Conversation

jrife commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Without This Change

With This Change

Uh oh!

ysksuzuki left a comment

Choose a reason for hiding this comment

Uh oh!

borkmann commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

borkmann commented Feb 19, 2025

Without This Change

With This Change

Uh oh!

jrife commented Feb 19, 2025

Uh oh!

ysksuzuki commented Feb 20, 2025

Uh oh!

jrife commented Feb 20, 2025

Uh oh!

jrife commented Mar 19, 2025

Uh oh!

This comment was marked as resolved.

jrife commented Apr 21, 2025

Uh oh!

julianwiedmann commented May 6, 2025

Uh oh!

jrife commented May 6, 2025

Uh oh!

borkmann commented May 7, 2025

Uh oh!

julianwiedmann commented May 13, 2025

Uh oh!

Uh oh!

jrife commented May 13, 2025

Uh oh!

jrife commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrife commented Aug 15, 2025

Uh oh!

jrife commented Aug 15, 2025

Uh oh!

jrife commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joestringer commented Aug 15, 2025

Uh oh!

jrife commented Aug 15, 2025

Uh oh!

jrife commented Aug 18, 2025

Uh oh!

rgo3 left a comment

Choose a reason for hiding this comment

Uh oh!

pchaigno commented Aug 19, 2025

Uh oh!

jrife commented Aug 19, 2025

Uh oh!

jrife commented Aug 19, 2025

Uh oh!

jrife commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrife commented Aug 19, 2025

Uh oh!

Uh oh!

julianwiedmann Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

pchaigno Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

julianwiedmann Sep 2, 2025

Choose a reason for hiding this comment

jrife commented Feb 18, 2025 •

edited

Loading

borkmann commented Feb 19, 2025 •

edited

Loading

jrife commented Aug 15, 2025 •

edited

Loading

jrife commented Aug 15, 2025 •

edited

Loading

jrife commented Aug 19, 2025 •

edited

Loading