-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Always migrate cilium_calls_* during ELF load #28740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test |
I have no opinions on the code, but regarding the release notes, remember that the readers of the release notes are our users, that don't necessarily understand the inner workings of Cilium. Particularly since this is tagged "bug", it's important to let users know what kind of bug is getting fixed and when they would be affected by it. I suggest something like: |
Good idea, thanks 👍 |
143cada
to
0e7762c
Compare
/test |
0e7762c
to
66287a3
Compare
/test |
66287a3
to
b0985dc
Compare
/test |
b0985dc
to
8734d8b
Compare
/test |
See code comments, as the change is subtle. This sets things up for subsequent commits. As a minor benefit, make the code slightly less unreadable. Signed-off-by: Timo Beckers <timo@isovalent.com>
/test |
8734d8b
to
c66d06a
Compare
Note: this is obviously a hack. There's a long-standing issue with Cilium's loader infrastructure where program arrays like cilium_calls_* are reused if the map's properties in the ELF are compatible with the pinned version present on the system. This has been the case since the early iproute2 days, and ebpf-go has the same behaviour by default. This is unsound, but doesn't always cause issues in practice. However, as Cilium's eBPF code base grew, along with its contributor base, increasing code churn and the need to backport fixes, this method started falling apart. Pinned tail call maps are actively used for handling traffic, and should be considered fixed, as they are an integral part of the program's control flow. When another BPF object needs to be attached to a kernel hook, it's incorrect to replace entries in an existing program array used by a prior version of the BPF object, since logic in some tail calls may have been enabled or disabled in the new one, or may have been removed altogether. A prog array's contents cannot be replaced atomically, meaning that while the prog array is being repopulated by a new ELF, a packet may be processed by a chain of programs that are partially old and partially new. This is bad. This patch fixes the specific case of cilium_calls_*, since it's the only prog array across all ELFs that consistently acts as the central tail call map. The main motivation for this approach is backportability as far back as 1.12, since we're seeing increased connectivity interruptions (leading to test flakes) during up/downgrades and restarts in that branch. This is caused by other backports creating code churn, as well as the stark difference in the contents of tail calls between major Cilium versions. This code will be replaced by a 'commit' system that does away with the re- pinning behaviour and keeps new map fd's in memory; only committing them to bpffs when the necessary endpoint prog(s) were successfully replaced. Fixes: cilium#20691 Signed-off-by: Timo Beckers <timo@isovalent.com>
c66d06a
to
a137ea6
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the code walk :)
Discussed offline with Timo. We won't backport to v1.12 as it's too big of a backport and thus risky. Instead, we'll skip this specific error in v1.12 CI (and v1.13 downgrade tests). |
The following upstream change unconditionally repin the map breaks the functionality for L2 interfaces of Kubevirt. cilium#28740 The bpf map for L2 interfaces are manually created in pkg/datapath/connector/utils.go and remain static, making repinning unnecessary. This change make sure the tail calls are not disturbed during graft process. It should not affect other endpoints since only L2 endpoints are calling graftDatapath. BUG: 333792124, 334106213 Change-Id: Icf2a2bf9629d79761782bd3c2ce3c9fa89d5f42f Reviewed-on: https://gke-internal-review.googlesource.com/c/third_party/cilium/+/990344 Tested-by: Prow_Bot_V2 <425329972751-compute@developer.gserviceaccount.com> Reviewed-by: Zang Li <zangli@google.com> Lint: Lint 🤖 <android-build-ayeaye@system.gserviceaccount.com> Reviewed-by: Prow_Bot_V2 <425329972751-compute@developer.gserviceaccount.com> Reviewed-by: Sugang Li <sugangli@google.com> Unit-Verified: Prow_Bot_V2 <425329972751-compute@developer.gserviceaccount.com>
There was an upstream PR (cilium#28740) which changed the way tail call map re-pin works during datapath reload. As part of this they updated the interface to the code that initiates reload, so that instead of calling reloadDatapath() twice for each interface (once per direction), the new code now requires calling reloadDatapath() once and have the single call handle both directions. Our multi-nic datapath reloader was using the old model so was exposed to tail call map issues now that the underlying function implementation has changed. This change updates the google_loader code to match the new upstream model. Bug: b/339324159, b/338122169 Change-Id: I54b0324f1be61f07cc4b2e2d69232f0c9bceb584 Reviewed-on: https://gke-internal-review.googlesource.com/c/third_party/cilium/+/1018651 Lint: Lint 🤖 <android-build-ayeaye@system.gserviceaccount.com> Reviewed-by: Sugang Li <sugangli@google.com> Reviewed-by: Yifeng Shen <yfshen@google.com> Unit-Verified: Prow_Bot_V2 <425329972751-compute@developer.gserviceaccount.com> Tested-by: Prow_Bot_V2 <425329972751-compute@developer.gserviceaccount.com> Reviewed-by: Prow_Bot_V2 <425329972751-compute@developer.gserviceaccount.com> Reviewed-by: Maciej Skrocki <maciejskrocki@google.com>
Note: this is obviously a hack. There's a long-standing issue with Cilium's
loader infrastructure where program arrays like cilium_calls_* are reused if
the map's properties in the ELF are compatible with the pinned version present
on the system. This has been the case since the early iproute2 days, and
ebpf-go has the same behaviour by default. This is unsound, but doesn't always
cause issues in practice.
However, as Cilium's eBPF code base grew, along with its contributor base, and
increasing code churn and the need to backport fixes, this method started
falling apart. Pinned tail call maps are actively used for handling traffic,
and should be considered fixed, as they are an integral part of the program's
control flow. When another BPF object needs to be attached to a kernel hook,
it's incorrect to replace entries in an existing program array used by a prior
version of the BPF object, since logic in some tail calls may have been enabled
or disabled in the new one, or may have been removed altogether.
A prog array's contents cannot be replaced atomically, meaning that while the
prog array is being repopulated by a new ELF, a packet may be processed by a
chain of programs that are partially old and partially new. This is bad.
This patch fixes the specific case of cilium_calls_*, since it's the only
prog array across all ELFs that consistently acts as the central tail call map.
The main motivation for this approach is backportability as far back as 1.12,
since we're seeing increased connectivity interruptions (leading to test flakes)
during up/downgrades and restarts in that branch. This is caused by other
backports creating code churn, as well as the stark difference in the contents
of tail calls between major Cilium versions.
This code will be replaced by a 'commit' system that does away with the re-
pinning behaviour and keeps new map fd's in memory; only committing them to
bpffs when the necessary endpoint prog(s) were successfully replaced.
Fixes: #20691