Skip to content

bpf:classifiers: add Overlay VXLAN/Geneve classifiers #39650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 17, 2025

Conversation

smagnani96
Copy link
Contributor

@smagnani96 smagnani96 commented May 21, 2025

With this patch, the datapath is now able to signal Overlay packets during Trace/Drop notification events.
This is useful for two main reasons (treated in separate upcoming PRs):

  1. Hubble/Monitor do not need to rely on fixed ports to decode overlay packets, they simply check this flag;
  2. Datapath can then send more bytes during such events in case of Overlay packets when not needed

These new flags are solely used for tracing events and they do not interfere with bpf metrics computation.
The VXLAN/GENEVE flags can be computed either:

  1. via ctx->mark (SKB only): we do set MARK_MAGIC_OVERLAY on the egress path of a packet.
  2. via parsing L2/L3/L4 headers: matching for UDP with TUNNEL_PORT.

Please do refer to commit messages for a detailed description and considerations of the introduced changes.

Fixes: #39032

 Introduce traffic classifiers CLS_FLAG_{VXLAN,GENEVE} for tracing overlay traffic.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 21, 2025
@smagnani96 smagnani96 added kind/enhancement This would improve or streamline existing functionality. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. labels May 21, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 21, 2025
@smagnani96 smagnani96 force-pushed the pr/smagnani96/bpf-overlay-classifiers branch from cb56786 to 36eb42c Compare May 21, 2025 08:58
@smagnani96
Copy link
Contributor Author

/test

@smagnani96 smagnani96 force-pushed the pr/smagnani96/bpf-overlay-classifiers branch 2 times, most recently from 2f5096c to 11a2aa6 Compare May 21, 2025 10:16
@smagnani96
Copy link
Contributor Author

/test

@smagnani96 smagnani96 self-assigned this May 21, 2025
@smagnani96
Copy link
Contributor Author

smagnani96 commented May 21, 2025

verifier-complexity-.zip

This zip contains charts of all the ci-verifier runs, comparing verified instruction on main vs this run.
The most notable stuff to point out is the increase of complexity (around +140k insns verified) in the following configs: kernels 5.15/5.10/6.6 programs bpf_host5/bpf_host6 functions tail_handle_nat_fwd_ipv6/tail_handle_snat_fwd_ipv6.

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Right now this only gets used deep down in the trace/drop notification, correct? And the drop path is boring from a complexity point-of-view, because it lives in its own tailcall.

Could you motivate the distinction between VXLAN and GENEVE? Is it so that hubble can easily parse the optional headers of the GENEVE packet?

@smagnani96 smagnani96 force-pushed the pr/smagnani96/bpf-overlay-classifiers branch 2 times, most recently from d8091f8 to 74aa402 Compare May 26, 2025 17:03
@smagnani96
Copy link
Contributor Author

/ci-verifier

@smagnani96 smagnani96 force-pushed the pr/smagnani96/bpf-overlay-classifiers branch from 74aa402 to 252897a Compare May 27, 2025 10:39
@smagnani96
Copy link
Contributor Author

/ci-verifier

@smagnani96 smagnani96 force-pushed the pr/smagnani96/bpf-overlay-classifiers branch from 252897a to 9fa5345 Compare May 29, 2025 08:06
@smagnani96 smagnani96 requested a review from brb May 29, 2025 08:06
@smagnani96
Copy link
Contributor Author

/test

smagnani96 added a commit to kaworu/cilium that referenced this pull request May 29, 2025
We introduced in cilium#39650 the initial support to signal from the datapath
trace/drop notify events related to overlay packets.
This patch allows our Monitor tool to decode overlay packets and output
additional information regarding the encap'd packet.

Prior to these changes:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK
```

After applying the patch:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan]
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan]
```

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
@smagnani96 smagnani96 marked this pull request as ready for review May 29, 2025 09:33
@smagnani96 smagnani96 requested review from a team as code owners May 29, 2025 09:33
@julianwiedmann julianwiedmann self-requested a review May 29, 2025 09:53
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 had a first look, some comments in-line.

One structural suggestion - I'd split off all the send_trace_notifcy() refactors where you need to touch the whole tree, and push them into their own PR. They are innocent enough (a bit of renaming + macro magic), but have much higher conflict potential. So let's get those in first, and then do a second PR that only focuses on extending the inner mechanics of the classifier.

kaworu pushed a commit to kaworu/cilium that referenced this pull request Jun 2, 2025
We introduced in cilium#39650 the initial support to signal from the datapath
trace/drop notify events related to overlay packets.
This patch allows our Monitor tool to decode overlay packets and output
additional information regarding the encap'd packet.

Prior to these changes:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK
```

After applying the patch:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan]
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan]
```

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
@smagnani96
Copy link
Contributor Author

smagnani96 commented Jun 4, 2025

I just pushed new changes based on Julian's suggestions (marked them as solved, PTAL for further details).

We significantly preserve verifier complexity wrt the previous implementation. This also translates into less performance degradation, given the additional logic in ctx_classify now is executed only on very specific codepaths.
Before this last push, we used to always run for IS_BPF_WIREGUARD and IS_BPF_HOST, while now:

  1. the logic that looks at packet mark is executed only in to-wireguard and to-network
  2. the logic that parses packet headers runs only in from-wireguard and from-network. There's an additional edge case for VinE that is to-stack in bpf_host: before forwarding the packet for encryption, we perform a trace notification. The new changes account also for this.

Testing this PR on top of #37634 seems to provide good output both when:

  1. IPSec
  2. Wireguard
  3. Tunneling w/o encryption

Let's proceed and then enhance/introduce if we notice we're missing some bits.

With this commit, we prepare our test suite for classifiers for the
new values introduced in subsequent commits.
This allows us to write reusable tests for both IPv4/IPv6, while also
specifying subtests for each possible category of flags (CLS_FLAG_*).

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
In #39794, we enabled passing the carried L3 protocol when calling
`send_trace_notify`. Let's use this argument for `ctx_classify`.
This allows to set the CLS_FLAG_IPV6 whenever we detect an IPV6 packet,
and not only when having a layer 3 IPv6 packet. In addition, this is
useful in subsequent commits where we extend classifiers: the new logic
will save verifier complexity and packet processing time given we have
already the protocol computed. For some edge case in which the variable
protocol is left to zero and not being set to the current protocol
(ex. from drop notification or when being called after a failing validate_ethertype),
we can compute the protocol from `ctx_classify` itself.

Here we also initialize the proto variable in bpf_lxc.c to avoid
compilation errors due to variable not being initialized. This was not
needed before as it was not being used, therefore removed during compilation.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This commits introduces two new classifiers CLS_FLAG_{VXLAN,GENEVE}.
This are used to signal from datapath that the current packet being
traced/dropped is using one of these two protocols as underlay.
This is not used for computing bpf metrics so far, but will be useful
in a subsequent PR for decoding encap'd layer from Hubble/Monitor.
Having different signals for VXLAN and Geneve traffic is actually useful
so that Hubble/Monitor does not need to check the Cilium configuration
to retrieve which tunnel protocol is in use. This way, the datapath
automatically marks every overlay notification, and Hubble can easily
chose between the next decoding layer to pick. In addition, we can start
sending more/less packet during notification events based on the packet
itself, if it is native or overlay.

With this patch, we introduce two possible ways of determining a packet
is overlay:

1. ctx->mark:   available only with SKB, we look whether the mark contains
                MARK_MAGIC_OVERLAY. That value is usually set on our egress
                path. This has the advantage that is fast, as it does not
                require to further inspect packet L2/L3/L4 headers.
2. raw headers: this method inspects the whole packet layers up to L4 UDP.
                By looking at protocol and known ports (TUNNEL_PORT in use),
                we can determine a packet is overlay.

While (2) can always work in every condition, (1) works only when the mark
is set. This means that in all codepaths where the mark is cleared it can
fail. It would be still backed up by logic in (2), given that it will
always run in case (1) was not informative enough.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
Simply moving the definition of available `enum trace_point` into a
different file to avoid circular dependencies in subsequent commits.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This patch introduces a new `enum trace_point` TRACE_POINT_UNKNOWN.
This is not intended to be used for `send_trace_notify` notification events,
as it will be translated into a compile error. The new observation point
is for special cases, such as the one enabled in the next commit.
Given that drop notification (1) do not account for the observation point
and (2) are executed in custom tailcalls, we can use the new TRACE_POINT_UNKNOWN
in the next commit where we optimize classifiers based on the trace
observation point. The TRACE_POINT_UNKNOWN allows to run all the possible
classification logic, both via mark and packet parsing.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This commit patches `ctx_classify` to account for the current trace
observation point. This allows to preserve both (1) packet processing
performance in all the codepaths where such overlay flags would never
be computed, and (2) preserve verifier complexity with a minimal impact.

The codepaths where we can encounter and trace overlay packets are:

1. to-{wireguard,netdev}: ctx->mark set to MARK_MAGIC_OVERLAY
2. from-{netdev,wireguard}: no mark, need to check UDP + tunnel port.
3. to-stack: with IPSec enabled, we trace TO_STACK events when forwarding
             overlay packets to the stack for encryption (VinE).

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
@smagnani96 smagnani96 force-pushed the pr/smagnani96/bpf-overlay-classifiers branch from 4ba82a9 to fab75ff Compare June 4, 2025 14:37
@smagnani96 smagnani96 removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Jun 4, 2025
@smagnani96
Copy link
Contributor Author

/test

@smagnani96 smagnani96 marked this pull request as ready for review June 4, 2025 17:07
kaworu pushed a commit to kaworu/cilium that referenced this pull request Jun 16, 2025
We introduced in cilium#39650 the initial support to signal from the datapath
trace/drop notify events related to overlay packets.
This patch allows our Monitor tool to decode overlay packets and output
additional information regarding the encap'd packet.

Prior to these changes:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK
```

After applying the patch:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan]
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan]
```

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 16, 2025
@kaworu kaworu added this pull request to the merge queue Jun 17, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 17, 2025
@kaworu kaworu added this pull request to the merge queue Jun 17, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 17, 2025
@kaworu kaworu added this pull request to the merge queue Jun 17, 2025
Merged via the queue into main with commit 66b60bc Jun 17, 2025
340 of 362 checks passed
@kaworu kaworu deleted the pr/smagnani96/bpf-overlay-classifiers branch June 17, 2025 09:38
kaworu pushed a commit to kaworu/cilium that referenced this pull request Jun 17, 2025
We introduced in cilium#39650 the initial support to signal from the datapath
trace/drop notify events related to overlay packets.
This patch allows our Monitor tool to decode overlay packets and output
additional information regarding the encap'd packet.

Prior to these changes:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK
```

After applying the patch:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan]
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan]
```

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
@smagnani96 smagnani96 linked an issue Jun 17, 2025 that may be closed by this pull request
smagnani96 added a commit to kaworu/cilium that referenced this pull request Jun 19, 2025
We introduced in cilium#39650 the initial support to signal from the datapath
trace/drop notify events related to overlay packets.
This patch allows our Monitor tool to decode overlay packets and output
additional information regarding the encap'd packet.

Prior to these changes:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK
```

After applying the patch:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan]
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan]
```

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96 added a commit to kaworu/cilium that referenced this pull request Jun 20, 2025
We introduced in cilium#39650 the initial support to signal from the datapath
trace/drop notify events related to overlay packets.
This patch allows our Monitor tool to decode overlay packets and output
additional information regarding the encap'd packet.

Prior to these changes:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK
```

After applying the patch:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan]
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan]
```

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jun 21, 2025
We introduced in #39650 the initial support to signal from the datapath
trace/drop notify events related to overlay packets.
This patch allows our Monitor tool to decode overlay packets and output
additional information regarding the encap'd packet.

Prior to these changes:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK
```

After applying the patch:

```bash
-> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan]
<- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan]
```

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bpf: datapath support for signaling overlay packets while tracing
4 participants