-
Notifications
You must be signed in to change notification settings - Fork 3.4k
bpf:classifiers: add Overlay VXLAN/Geneve classifiers #39650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cb56786
to
36eb42c
Compare
/test |
2f5096c
to
11a2aa6
Compare
/test |
This zip contains charts of all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Right now this only gets used deep down in the trace/drop notification, correct? And the drop path is boring from a complexity point-of-view, because it lives in its own tailcall.
Could you motivate the distinction between VXLAN and GENEVE? Is it so that hubble can easily parse the optional headers of the GENEVE packet?
d8091f8
to
74aa402
Compare
/ci-verifier |
74aa402
to
252897a
Compare
/ci-verifier |
252897a
to
9fa5345
Compare
/test |
We introduced in cilium#39650 the initial support to signal from the datapath trace/drop notify events related to overlay packets. This patch allows our Monitor tool to decode overlay packets and output additional information regarding the encap'd packet. Prior to these changes: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK ``` After applying the patch: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan] <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan] ``` Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 had a first look, some comments in-line.
One structural suggestion - I'd split off all the send_trace_notifcy()
refactors where you need to touch the whole tree, and push them into their own PR. They are innocent enough (a bit of renaming + macro magic), but have much higher conflict potential. So let's get those in first, and then do a second PR that only focuses on extending the inner mechanics of the classifier.
We introduced in cilium#39650 the initial support to signal from the datapath trace/drop notify events related to overlay packets. This patch allows our Monitor tool to decode overlay packets and output additional information regarding the encap'd packet. Prior to these changes: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK ``` After applying the patch: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan] <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan] ``` Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
I just pushed new changes based on Julian's suggestions (marked them as solved, PTAL for further details). We significantly preserve verifier complexity wrt the previous implementation. This also translates into less performance degradation, given the additional logic in
Testing this PR on top of #37634 seems to provide good output both when:
Let's proceed and then enhance/introduce if we notice we're missing some bits. |
With this commit, we prepare our test suite for classifiers for the new values introduced in subsequent commits. This allows us to write reusable tests for both IPv4/IPv6, while also specifying subtests for each possible category of flags (CLS_FLAG_*). Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
In #39794, we enabled passing the carried L3 protocol when calling `send_trace_notify`. Let's use this argument for `ctx_classify`. This allows to set the CLS_FLAG_IPV6 whenever we detect an IPV6 packet, and not only when having a layer 3 IPv6 packet. In addition, this is useful in subsequent commits where we extend classifiers: the new logic will save verifier complexity and packet processing time given we have already the protocol computed. For some edge case in which the variable protocol is left to zero and not being set to the current protocol (ex. from drop notification or when being called after a failing validate_ethertype), we can compute the protocol from `ctx_classify` itself. Here we also initialize the proto variable in bpf_lxc.c to avoid compilation errors due to variable not being initialized. This was not needed before as it was not being used, therefore removed during compilation. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This commits introduces two new classifiers CLS_FLAG_{VXLAN,GENEVE}. This are used to signal from datapath that the current packet being traced/dropped is using one of these two protocols as underlay. This is not used for computing bpf metrics so far, but will be useful in a subsequent PR for decoding encap'd layer from Hubble/Monitor. Having different signals for VXLAN and Geneve traffic is actually useful so that Hubble/Monitor does not need to check the Cilium configuration to retrieve which tunnel protocol is in use. This way, the datapath automatically marks every overlay notification, and Hubble can easily chose between the next decoding layer to pick. In addition, we can start sending more/less packet during notification events based on the packet itself, if it is native or overlay. With this patch, we introduce two possible ways of determining a packet is overlay: 1. ctx->mark: available only with SKB, we look whether the mark contains MARK_MAGIC_OVERLAY. That value is usually set on our egress path. This has the advantage that is fast, as it does not require to further inspect packet L2/L3/L4 headers. 2. raw headers: this method inspects the whole packet layers up to L4 UDP. By looking at protocol and known ports (TUNNEL_PORT in use), we can determine a packet is overlay. While (2) can always work in every condition, (1) works only when the mark is set. This means that in all codepaths where the mark is cleared it can fail. It would be still backed up by logic in (2), given that it will always run in case (1) was not informative enough. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
Simply moving the definition of available `enum trace_point` into a different file to avoid circular dependencies in subsequent commits. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This patch introduces a new `enum trace_point` TRACE_POINT_UNKNOWN. This is not intended to be used for `send_trace_notify` notification events, as it will be translated into a compile error. The new observation point is for special cases, such as the one enabled in the next commit. Given that drop notification (1) do not account for the observation point and (2) are executed in custom tailcalls, we can use the new TRACE_POINT_UNKNOWN in the next commit where we optimize classifiers based on the trace observation point. The TRACE_POINT_UNKNOWN allows to run all the possible classification logic, both via mark and packet parsing. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This commit patches `ctx_classify` to account for the current trace observation point. This allows to preserve both (1) packet processing performance in all the codepaths where such overlay flags would never be computed, and (2) preserve verifier complexity with a minimal impact. The codepaths where we can encounter and trace overlay packets are: 1. to-{wireguard,netdev}: ctx->mark set to MARK_MAGIC_OVERLAY 2. from-{netdev,wireguard}: no mark, need to check UDP + tunnel port. 3. to-stack: with IPSec enabled, we trace TO_STACK events when forwarding overlay packets to the stack for encryption (VinE). Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
4ba82a9
to
fab75ff
Compare
/test |
We introduced in cilium#39650 the initial support to signal from the datapath trace/drop notify events related to overlay packets. This patch allows our Monitor tool to decode overlay packets and output additional information regarding the encap'd packet. Prior to these changes: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK ``` After applying the patch: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan] <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan] ``` Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
We introduced in cilium#39650 the initial support to signal from the datapath trace/drop notify events related to overlay packets. This patch allows our Monitor tool to decode overlay packets and output additional information regarding the encap'd packet. Prior to these changes: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK ``` After applying the patch: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan] <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan] ``` Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
We introduced in cilium#39650 the initial support to signal from the datapath trace/drop notify events related to overlay packets. This patch allows our Monitor tool to decode overlay packets and output additional information regarding the encap'd packet. Prior to these changes: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK ``` After applying the patch: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan] <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan] ``` Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
We introduced in cilium#39650 the initial support to signal from the datapath trace/drop notify events related to overlay packets. This patch allows our Monitor tool to decode overlay packets and output additional information regarding the encap'd packet. Prior to these changes: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK ``` After applying the patch: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan] <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan] ``` Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
We introduced in #39650 the initial support to signal from the datapath trace/drop notify events related to overlay packets. This patch allows our Monitor tool to decode overlay packets and output additional information regarding the encap'd packet. Prior to these changes: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK ``` After applying the patch: ```bash -> network flow 0x4e41ca37 , identity host->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.2.63:4240 -> 10.244.1.6:41068 tcp ACK [tunnel 172.18.0.4:47375 -> 172.18.0.2:8472 vxlan] <- network flow 0xb55b35e9 , identity remote-node->unknown state unknown ifindex eth0 orig-ip 0.0.0.0: 10.244.1.6:41068 -> 10.244.2.63:4240 tcp ACK [tunnel 172.18.0.2:46960 -> 172.18.0.4:8472 vxlan] ``` Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
With this patch, the datapath is now able to signal Overlay packets during Trace/Drop notification events.
This is useful for two main reasons (treated in separate upcoming PRs):
These new flags are solely used for tracing events and they do not interfere with bpf metrics computation.
The VXLAN/GENEVE flags can be computed either:
Please do refer to commit messages for a detailed description and considerations of the introduced changes.
Fixes: #39032