Skip to content

Conversation

NikAleksandrov
Copy link

@NikAleksandrov NikAleksandrov commented Mar 10, 2023

In order to workaround systemd's bad recent changes where they decided to manage "foreign" rules and to flush them on certain events (e.g. device flap), we should add our rules as "proto kernel" so systemd will just skip them and leave them in place in such events instead of wiping them from underneath us if a networkd policy like below is not installed (https://docs.cilium.io/en/stable/operations/system_requirements/#systemd-based-distributions):

ManageForeignRoutes=no
ManageForeignRoutingPolicyRules=no

Normal rules with Cilium deployed look like:
$ ip ru
9: from all fwmark 0x200/0xf00 lookup 2004
10: from all fwmark 0xa00/0xf00 lookup 2005
100: from all lookup local
32766: from all lookup main
32767: from all lookup default

After a network event we see systemd flushing all unspec rules (9, 10 and 100, the last one being critical):
$ ip rule list
32766: from all lookup main
32767: from all lookup default

With this change the rules remain in place and everything continues working as expected.

The change also affects routes installed by Cilium, they are now with proto kernel.
For example:

$ ip -d r sh
unicast 10.0.0.0/24 via 10.0.2.245 dev cilium_host proto kernel scope global src 10.0.2.245 mtu 1373 
unicast 10.0.1.0/24 via 10.0.2.245 dev cilium_host proto kernel scope global src 10.0.2.245 mtu 1373 
unicast 10.0.2.0/24 via 10.0.2.245 dev cilium_host proto kernel scope global src 10.0.2.245 
unicast 10.0.2.245 dev cilium_host proto kernel scope link

$ ip -d rule sh
1:      from all fwmark 0xd00/0xf00 lookup 200 proto kernel
1:      from all fwmark 0xe00/0xf00 lookup 200 proto kernel
9:      from all fwmark 0x200/0xf00 lookup 2004 proto kernel
10:     from all fwmark 0xa00/0xf00 lookup 2005 proto kernel
100:    from all lookup local proto kernel
32766:  from all lookup main proto kernel
32767:  from all lookup default proto kernel

$ ip -d rou sh table 200
unicast 10.0.0.0/24 dev cilium_host proto kernel scope global mtu 1450 
unicast 10.0.1.0/24 dev cilium_host proto kernel scope global mtu 1450 
local 10.0.2.0/24 dev cilium_vxlan proto kernel scope host 

$ ip -d rou sh table 2004
local default dev lo proto kernel scope host 

$ ip -d rou sh table 2005
unicast default via 10.0.2.245 dev cilium_host proto kernel scope global 
unicast 10.0.2.245 dev cilium_host proto kernel scope link 

Note that we rely on the fact that route deletes with proto == 0 cause the kernel to not check if
the route protocol matches (it is ignored). Note also that there is one exception for the conversion done
which is about node-to-node encryption routes because those overload the route protocol and use it
to recognize routes installed by Cilium for node-to-node encryption. That can be revisited additionally.

Patches overview:
Patch 01 - adds a new RTProto constant in linux_defaults to be used for fib rules and routes
Patch 02 - adds a new RulePriorityLocalLookup constant in linux_defaults to be used for fib rules and routes
Patch 03 - changes rules and routes in bpf/init.sh (also default rt proto is taken as an argument)
Patch 04 - changes init.sh to take the local lookup rule priority as an argument
Patch 05 - updates the netlink library to get RTA_PROTOCOL support for fib rules
Patch 06 - adds support for rule protocol to datapath/linux/route
Patch 07 - updates datapath/linux/route
Patch 08 - updates egress gateway
Patch 09 - updates datapath/loader
Patch 10 - updates datapath/linux/routing (skips migration code, more in commit msg)
Patch 11 - updates datapath/linux/node (the patch doesn't modify node-to-node encryption routes)
Patch 12 - removes RouteProtocolIPSec and uses proto kernel instead
Patch 13 - adds "-d" to bugtool when dumping fib rules so we can log their protocol
Patch 14 - ensures that local lookup IP rules are installed on agent init and also cleans up old local lookup rules with a
different rt proto only if they're before the new ones with proto kernel

I have tested locally IPv4, IPv6 with and without tunnels, with and without ipsec. Routes and rules are installed
and cleaned up properly. This set also fixes another bug when we restore ip rules on init.sh failure, we'd
install rule #0 with proto unspec instead of kernel as the original rule.

Closes: #18706

Thanks,
Nik

@NikAleksandrov NikAleksandrov requested a review from borkmann March 10, 2023 11:02
@NikAleksandrov NikAleksandrov self-assigned this Mar 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 10, 2023
@brandshaide
Copy link
Contributor

brandshaide commented Mar 14, 2023

@NikAleksandrov any update on this PR since it's still in status Draft but seems to be ready to be shipped according to @borkmann ?We've stumbled upon this issue on our AKS cluster and this fix would've saved us a lot of time ;-)

@NikAleksandrov
Copy link
Author

@NikAleksandrov any update on this PR since it's still in status Draft but seems to be ready to be shipped according to @borkmann ?We've stumbled upon this issue on our AKS cluster and this fix would've saved us a lot of time ;-)

@brandshaide it is incomplete, to fix the issue fully we need to change all routes and rules to use proto kernel
but we're currently blocked by merging support for the proto attribute in the netlink library (PR).
Once that is merged I can push the additional changes that will complete the fix.

@brandshaide
Copy link
Contributor

@NikAleksandrov any update on this PR since it's still in status Draft but seems to be ready to be shipped according to @borkmann ?We've stumbled upon this issue on our AKS cluster and this fix would've saved us a lot of time ;-)

@brandshaide it is incomplete, to fix the issue fully we need to change all routes and rules to use proto kernel but we're currently blocked by merging support for the proto attribute in the netlink library (PR). Once that is merged I can push the additional changes that will complete the fix.

@NikAleksandrov aweseome! and thanks a lot for your efforts 👍

@NikAleksandrov NikAleksandrov force-pushed the pr/rules-rtprot-kernel branch 3 times, most recently from 49efd48 to bbf0bc3 Compare March 17, 2023 14:36
@NikAleksandrov NikAleksandrov marked this pull request as ready for review March 17, 2023 15:09
@NikAleksandrov NikAleksandrov requested review from a team as code owners March 17, 2023 15:09
@NikAleksandrov NikAleksandrov changed the title init.sh: install ip rules with proto kernel Install fib rules and routes with proto kernel to avoid systemd messing with them Mar 18, 2023
@NikAleksandrov NikAleksandrov force-pushed the pr/rules-rtprot-kernel branch 2 times, most recently from c43f8e1 to 0943194 Compare March 20, 2023 09:09
@NikAleksandrov
Copy link
Author

/test

@NikAleksandrov NikAleksandrov force-pushed the pr/rules-rtprot-kernel branch from 0943194 to bef2ff1 Compare March 21, 2023 13:50
Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks nice to me. A non-blocking suggestion. I think it's better to define Go constant like DefaultRTProto (and the equivalent shell-script variable) and use that instead of hard-coding kernel protocol. So that when this kernel protocol comes at a problem, we can easily change that instead of finding all the places we set protocol.

@NikAleksandrov
Copy link
Author

Overall, looks nice to me. A non-blocking suggestion. I think it's better to define Go constant like DefaultRTProto (and the equivalent shell-script variable) and use that instead of hard-coding kernel protocol. So that when this kernel protocol comes at a problem, we can easily change that instead of finding all the places we set protocol.

Yep, good point. Coming right up.

@NikAleksandrov
Copy link
Author

@YutaroHayakawa I've requested a new review, the code is basically the same just using a constant as you suggested.

@NikAleksandrov NikAleksandrov force-pushed the pr/rules-rtprot-kernel branch from 8f5ee73 to d353287 Compare March 22, 2023 09:45
@pchaigno
Copy link
Member

This PR introduced a CI-only regression that was fixed by #24577. We'll thus need to backport that second PR wherever we backport this one.

@gandro
Copy link
Member

gandro commented Mar 27, 2023

This PR introduced a CI-only regression that was fixed by #24577. We'll thus need to backport that second PR wherever we backport this one.

Why did CI not catch the regression in this PR?

@pchaigno
Copy link
Member

@gandro CI wasn't run after the rebase. I don't know if the test passed before the last rebase.

@tklauser tklauser mentioned this pull request Mar 28, 2023
9 tasks
@tklauser tklauser added affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch needs-backport/1.13 affects/v1.13 This issue affects v1.13 branch backport/author The backport will be carried out by the author of the PR. and removed backport-pending/1.13 labels Mar 29, 2023
@jibi jibi removed affects/v1.11 This issue affects v1.11 branch affects/v1.12 This issue affects v1.12 branch backport/author The backport will be carried out by the author of the PR. needs-backport/1.13 affects/v1.13 This issue affects v1.13 branch labels Mar 31, 2023
@jibi
Copy link
Member

jibi commented Mar 31, 2023

Dropped the needs-backport labels as this PR is likely to be reverted

@vethastanley
Copy link

I see the fix is reverted. Do we know whether this issue will be fixed anytime recently?

atykhyy added a commit to Apptane/cilium that referenced this pull request Jun 28, 2025
atykhyy added a commit to Apptane/cilium that referenced this pull request Jun 28, 2025
Signed-off-by: Anton Tykhyy <atykhyy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Host network broken after one of the underlying interfaces of a bond goes down