v1.17 Backports 2025-01-21 #37126

rastislavs · 2025-01-21T12:41:34Z

Once this PR is merged, a GitHub action will update the labels of these PRs:

 36788 35718 36598 36827 36858 36861 36856 36878 36727 36852 36859 36905 36913 36921 36769 36899 36917 36924 36892 36364 36932 36935 36953 36966 36819 36929 36371 36433 36982 36853 36649 36602 37008 36798 37005 37054 36980 37048 36823 37042 37081 37053 37059 36875 37086

[ upstream commit cdecbcb ] GNU make on the host may use --jobserver-style=fifo (default on my machine). It also implies --jobserver-auth=fifo:/tmp/GMfifo$MAKE_PID, an undocumented flag, used internally by make and passed to the child instances of make. This flag appears in $(MAKEFLAGS). The cilium-build target in Documentation/Makefile passes MAKEFLAGS to another make instance, called in a docker image. The problem is that --jobserver-auth passed to make inside docker points to a file that doesn't exist in the container filesystem namespace, and make fails with an error like this: make: *** internal error: invalid --jobserver-auth string 'fifo:/tmp/GMfifo361142'. Stop. make: *** [Makefile:48: cilium-build] Error 2 make: Leaving directory '/home/max/.opt/go/src/github.com/cilium/cilium-snat/Documentation' Fix this by filtering out --jobserver-auth=... from MAKEFLAGS when passing it to make inside docker. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 5a4fdc9 ] Fixes: #34754 Currently, the cilium-operator xRoute reconcilers reconcile all (non-GAMMA) xRoutes regardless of their parentage. This can lead to race conditions where the Cilium operator can clobber metadata updates made by other operatores (ie, the Kong Kubernetes Ingress Controller). This PR adjusts the xRoute reconcilers to skip routes associated with the Cilium gateway. Signed-off-by: Andrew E. Timmes <atimmes@seatgeek.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit ed4772f ] In preparation of the subsequent commits, rework the errors usage in rules sanitization. Specifically: - fix some linting errors (capitalized error strings, punctuation in error strings, useless usage of error formatting and so on) - use errors.Join to propagate errUnsupportedICMPWithToPorts alongside other possible validation errors - fix tests to rely on require.ErrorIs when checking for a specific error Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 947b8d3 ] Extract IngressCommonRule sanitization into its own method. This will be useful in a subsequent commit, where the sanitization for IngressDeny rules will be added. Since IngressCommonRule is embedded into both Ingress and IngressDeny rules, the new method can be used while sanitizing both rule types. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit bd170ba ] Extract EgressCommonRule sanitization into its own method. This will be useful in a subsequent commit, where the sanitization for EgressDeny rules will be added. Since EgressCommonRule is embedded into both Egress and EgressDeny rules, the new method can be used while sanitizing both rule types. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 557498b ] Add sanitization for IngressDeny rules. The sanitization applies the same logic used for Ingress rules, dropping the parts that are not relevant for the IngressDeny ones (e.g: IngressDeny has no L7 rules). Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 91cd89e ] Add sanitization for EgressDeny rules. The sanitization applies the same logic used for Egress rules, dropping the parts that are not relevant for the EgressDeny ones (e.g: EgressDeny has no L7 rules). Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 0e97e01 ] Add helper methods to {Egress,EgressDeny,EgressCommon}Rule to get the list of L3 members and the list of L3 members that are not compatible with ToPorts rules. This simplifies the extension and reuse of the sanitization logic in EgressCommonRule to EgressRule and EgressDenyRule. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 1fa46b2 ] Extend the current unit tests to take into account the sanitization of both IngressDeny and EgressDeny rules. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 04bd9a1 ] This was missed in #36534 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 963b5b2 ] 439f558 ("CLI: cilium upgrade preserver previous configurations") changed the default behavior of `cilium upgrade` to preserve the previous configuration upon upgrade. While this is the expected behavior in most situations, it turned out being problematic in the ipsec upgrade downgrade tests (specifically in case of patch matrix entries). Indeed, the images are not overwritten for the stable configuration, leading in turn to not actually performing the downgrade. Let's fix this by explicitly resetting the values when performing the upgrade and downgrade operations. Reported-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit ab10262 ] When *any* routing mode is supported, it makes no sense to list this as requirement. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 19909d0 ] I believe this dependency originated from the need for `HaveLargeInstructionLimit`. As we now require a 5.4-equivalent kernel there is no more need to point out this requirement. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit d2c4e0f ] This is redundant, we now require a 5.4-equivalent kernel. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit f5f831f ] If KPR is false, setting these to true does not make any sense. Thus, remove this to avoid confusion. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 5dca7ad ] hostPort.enabled=true under KPR off is not supported given the former relies on the latter's infrastructure. Remove this from the docs to avoid confusion. Before 1a4e7d4 ("docs: Adjust KPR GSG to --kpr=boolean changes") this sentence was meant in the context of KPR=partial. Fixes: 1a4e7d4 ("docs: Adjust KPR GSG to --kpr=boolean changes") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 808b4cf ] Be consistent with the other features that are described on this page, and provide a snippet with the needed cilium config. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 3fd2281 ] Service events are not used when CNPs and CCNPs are disabled. However, we currently subscribe to the service cache, and enqueue all events in a local cache, in turn causing a memory leak in that case given that the events would never be dequeued. Let's get this fixed. Fixes: 253a05b ("policy: Add config for enabling Cilium NetworkPolicy") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 34e296e ] Talos Linux's 'Forwarding kube-dns to Host DNS' functionality doesn't work together with Cilium's BPF host routing. Signed-off-by: Philip Schmid <phisch@cisco.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 3fe873c ] The conformance external workloads workflow recently started failing consistently on the v1.15 branch with the following error: > msg="Failed to install iptables rules" error="cannot install static proxy rules: unable to run 'ip6tables -t mangle -A CILIUM_PRE_mangle -m socket --transparent -m mark ! --mark 0x00000e00/0x00000f00 -m mark ! --mark 0x00000800/0x00000f00 -m comment --comment cilium: any->pod redirect proxied traffic to host proxy -j MARK --set-mark 0x00000200' iptables command: exit status 1 stderr=\"Warning: Extension MARK revision 0 not supported, missing kernel module?\\nip6tables: No chain/target/match by that name.\\n\"" subsys=iptables This seems to be the consequence of a regression backported by Canonical into linux-image-5.15.0-127-generic. Let's circumvent this problem by upgrading the selected image family, which was now largely out of date. Related: https://bugs.launchpad.net/ubuntu/+source/linux-meta-oracle-5.15/+bug/2091960 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit c6f84b2 ] Relates: 9991bfc Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 4638384 ] When the images are built from stable branches we should also publish floating images from those branches. Fixes: 5ce3112 (".github: simplify build-images-ci workflow") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit dc5037b ] Documented the known limitation of Cilium's eBPF host routing when relying on netfilter hooks at the same time. Signed-off-by: Philip Schmid <phisch@cisco.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 900cb76 ] This limitation can be removed after #35143. Relates: #35143 Relates: #24318 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 684a62e ] The new version of the multicluster plugins needs coredns 1.11.4, so this is updating the corresponding MCS-API docs on how to recompile CoreDNS to reflect that. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 757a56c ] Fixes: cdecbcb ("build: Filter out --jobserver-auth from MAKEFLAGS") Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit fceab4f ] The labels configuration for the agent is quite important to how Cilium respects the labels of workloads in order to properly enforce network policy. Through recent review a few ambiguities were pointed out to me, so this commit attempts to clarify those points. Thanks to Filip Nikolic and André Martins for the detailed discussion that motivated this wording change. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit e53c222 ] Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit c835ca8 ] The check-ipsec-leak.bt script reports errors for plain-text packets (non-ESP only, this is not used for Wireguard) captured from bridge when: 1. the packets have pod IP as both src and dst 2. or the packets have either src or dst IP inside pod CIDR, plus the packets are from proxy ($trace_override) Let's focus on 2. In the normal case of pod->world via L7, proxy emits packets like pod->world, but masquerading (nf/bpf) snats the packets to node->world, therefore the script never sees pod IP on bridge, no errors are reported. In the case of egress gateway via L7, the packet is first forwarded to the gateway node after L7 policy check with the pod->world notation, and the gateway node later forwards the request using to gateway->world. Here the script sees that 1st packet with src pod IP && from_proxy, triggering the (false) alarm. This PR is to make sure that these type of packets won't be tracked as from_proxy ($trace_override). We currently avoid this "false" alarm by running egress-gateway connectivity tests separately from all the others and with no leak-detection script running. As we're going to introduce leak detection in more and more CI tests hopefully, let's add this check to skip tracking unwanted traffic. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit a6ea748 ] In the previous commit, we introduced to the bpftrace the capability to skip tracking traffic with destination address outside pod CIDRs. At this point, in conformance-ipsec-e2e tests we do not need to split tests anymore (egress-gateway vs all the others), previously modified in #35323, "workflows/ipsec: Cover egress gateway", ea6bdac. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit e48dda2 ] Cleans up a few things: - Documents the ability to select CCG by labels - Mentions that hubble flows are now labeled - Links from main policy page. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 45d638a ] This commit pass down the targetPort in a MCS-API derived Service from the local Service. We only need targetPort for local Endpoints so we do not need to globally sync this as ports are synced from endpoints on remote clusters that already account for targetPort. Thus we just take the ports merged globally from the ServiceImport and add the context of the targetPorts if we need to on the local cluster. This also clarifies the EndpointSlice sync on target port as the targetPort field is now available into the slim Service copy done by Cilium be we shouldn't actually use it as targetPort may be a string and we do not have this info from remote clusters. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

[ upstream commit 6d90e39 ] Do not explicitly call RunMetadataResolver in case the pod namespace label has not been retrieved yet (e.g., the pod informer has not yet received the creation event for the pod), given that the same operation is always guaranteed to be started immediately afterwards as well. Moreover, this duplicate call can eventually cause the endpoint regeneration state machine to go out of sync, leaving the endpoint stuck in "waiting-to-regenerate" state forever. Specifically, the above problem occurs due to the following sequence of events: 1. A new pod gets created and scheduled onto a node, in turn eventually leading to an endpoint creation API request. 2. When the request is processed, the pod informer has not yet received the pod creation event, hence causing the pod retrieval to fail. 3. At this point, the set of identity labels only contains the init one (because we couldn't retrieve the pod labels), hence triggering the first call to ep.RunMetadataResolver. In turn, this starts a controller in the background (not blocking, as the second parameter is false), which attempts to retrieve the labels again, sets the identity (using the init one in case of retrieval failure) and triggers the endpoint regeneration. The endpoint first transitions into waiting-to-regenerate state); eventually, the regeneration starts to be processed, and the endpoint moves into regenerating state. 4. ep.RunMetadataResolver is called a second time; this step would normally be blocking (the second parameter is true), but in case the labels are already correct due to the previous call to the same function, no regeneration would be required, eventually returning regenTriggered = false; 5. Given that regenTriggered is false, the regeneration is triggered again immediately afterwards [1], transitioning the endpoint into waiting-to-regenerate state again. It is important to note that in this case we propagate the endpoint creation context as part of the regeneration event, rather than using the endpoint context. This appears correct, but ends up being problematic in this specific case. 6. In background, the initial regeneration triggered in 3. completes correctly. The endpoint is attempted to be transitioned into ready state, but that's skipped (correctly) because another regeneration has been scheduled subsequently (in 5.). At the same time, WaitForFirstRegeneration [2] unblocks (given that the regeneration completed), letting the endpoint creation API request to terminate. This, in turn, causes the associated context to be canceled. 7. At this point, there's still the endpoint regeneration triggered in 5. that needs to be executed; however, depending on the timing, the handling may be aborted in [3], given that the context has just been canceled; while this would normally indicate that the endpoint is terminating, hence not causing any problems, in this case it leaves the endpoint stuck into the waiting-to-regenerate state, from which it will never recover (as all subsequent regeneration triggers would be treated as duplicates). Dropping the duplicate call to ep.RunMetadataResolver prevents this problem, as it ensures that we don't unnecessarily trigger a second regeneration associated with the API context. The fallback regeneration should actually never required when operating in Kubernetes mode, because ep.RunMetadataResolver will always cause a new identity to be assigned to the endpoint, in turn triggering regeneration. [1]: https://github.com/cilium/cilium/blob/a9bcb0cb3858749f584c48cf5922f060c5f5871e/daemon/cmd/endpoint.go#L569-L582 [2]: https://github.com/isovalent/cilium/blob/a9bcb0cb3858749f584c48cf5922f060c5f5871e/daemon/cmd/endpoint.go#L573-L575 [3]: https://github.com/cilium/cilium/blob/a9bcb0cb3858749f584c48cf5922f060c5f5871e/pkg/endpoint/events.go#L64-L71 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

MrFreezeex

Thanks 🙏! all my changes looks good!

rastislavs · 2025-01-21T13:04:49Z

/test

smagnani96

Many thanks, my commits looks good 👍

giorio94

My commits look good, thanks!

nathanjsweet · 2025-01-21T14:42:50Z

Thanks!

pippolo84

Looks good for my PR, thanks! 💯

rectified95

Looks good for my change

gentoo-root and others added 30 commits January 21, 2025 12:55

k8s: Bump k8s/kubectl to v0.32.0

69dd512

[ upstream commit 04bd9a1 ] This was missed in #36534 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

docs: tuning: remove redundant statement on routing mode

460d95e

[ upstream commit ab10262 ] When *any* routing mode is supported, it makes no sense to list this as requirement. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

docs: masquerading: remove mention of kernel 4.19 requirement

ce1f673

[ upstream commit d2c4e0f ] This is redundant, we now require a 5.4-equivalent kernel. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

gha: Bump k8s version to v1.32.0

c980a13

[ upstream commit c6f84b2 ] Relates: 9991bfc Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

ingress: Remove multiple network device limitation

b7e8a51

[ upstream commit 900cb76 ] This limitation can be removed after #35143. Relates: #35143 Relates: #24318 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

build: Remove debug leftover from Makefile

79928cd

[ upstream commit 757a56c ] Fixes: cdecbcb ("build: Filter out --jobserver-auth from MAKEFLAGS") Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

clustermesh: fix MCS-API service export cache not properly deleted

52e6664

[ upstream commit e53c222 ] Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>

squeed and others added 3 commits January 21, 2025 13:48

rastislavs force-pushed the pr/v1.17-backport-2025-01-21-12-55 branch from 56dd121 to 5cbdc10 Compare January 21, 2025 12:53

rastislavs temporarily deployed to release-base-images January 21, 2025 12:53 — with GitHub Actions Inactive

aanm approved these changes Jan 21, 2025

View reviewed changes

MrFreezeex approved these changes Jan 21, 2025

View reviewed changes

gentoo-root approved these changes Jan 21, 2025

View reviewed changes

rastislavs temporarily deployed to release-base-images January 21, 2025 13:01 — with GitHub Actions Inactive

rastislavs marked this pull request as ready for review January 21, 2025 13:05

rastislavs requested review from a team as code owners January 21, 2025 13:05

smagnani96 approved these changes Jan 21, 2025

View reviewed changes

giorio94 approved these changes Jan 21, 2025

View reviewed changes

pchaigno approved these changes Jan 21, 2025

View reviewed changes

nathanjsweet approved these changes Jan 21, 2025

View reviewed changes

pippolo84 approved these changes Jan 21, 2025

View reviewed changes

joestringer approved these changes Jan 21, 2025

View reviewed changes

rastislavs temporarily deployed to release-base-images January 21, 2025 16:41 — with GitHub Actions Inactive

rectified95 approved these changes Jan 21, 2025

View reviewed changes

michi-covalent added this pull request to the merge queue Jan 22, 2025

Merged via the queue into v1.17 with commit 55a45cb Jan 22, 2025
310 checks passed

michi-covalent deleted the pr/v1.17-backport-2025-01-21-12-55 branch January 22, 2025 06:09

cilium-release-bot bot mentioned this pull request Jan 24, 2025

Prepare for release v1.17.0-rc.2 #37256

Merged

pippolo84 mentioned this pull request Jun 26, 2025

v1.17 Backports 2025-06-26 #40222

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.17 Backports 2025-01-21 #37126

v1.17 Backports 2025-01-21 #37126

Uh oh!

rastislavs commented Jan 21, 2025 •

edited by joestringer

Loading

Uh oh!

MrFreezeex left a comment

Uh oh!

rastislavs commented Jan 21, 2025

Uh oh!

smagnani96 left a comment

Uh oh!

giorio94 left a comment

Uh oh!

nathanjsweet commented Jan 21, 2025

Uh oh!

pippolo84 left a comment

Uh oh!

rectified95 left a comment

Uh oh!

Uh oh!

Uh oh!

v1.17 Backports 2025-01-21 #37126

v1.17 Backports 2025-01-21 #37126

Uh oh!

Conversation

rastislavs commented Jan 21, 2025 • edited by joestringer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MrFreezeex left a comment

Choose a reason for hiding this comment

Uh oh!

rastislavs commented Jan 21, 2025

Uh oh!

smagnani96 left a comment

Choose a reason for hiding this comment

Uh oh!

giorio94 left a comment

Choose a reason for hiding this comment

Uh oh!

nathanjsweet commented Jan 21, 2025

Uh oh!

pippolo84 left a comment

Choose a reason for hiding this comment

Uh oh!

rectified95 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rastislavs commented Jan 21, 2025 •

edited by joestringer

Loading