v1.10 backports 2021-10-04 #17531

glibsm · 2021-10-04T21:30:59Z

docs: clarify language on libceph and kernel 5.8 in kubeproxy-free GSG #16969 -- docs: clarify language on libceph and kernel 5.8 in kubeproxy-free GSG (@bluikko)
daemon: Make L2 neighbor discovery configurable. #16974 -- daemon: Make L2 neighbor discovery configurable. (@bjhaid)
proxy: Expose cachedSelectorREEntry type #17341 -- proxy: Expose cachedSelectorREEntry type (@nebril)
test: bump coredns version to 1.7.0 #17489 -- test: bump coredns version to 1.7.0 (@aanm)
Add neighbor discovery behavior docs to kubeproxy-free. #17469 -- Add neighbor discovery behavior docs to kubeproxy-free. (@bjhaid)
datapath: Add a new option to skip socket lb when in pod ns #17154 -- datapath: Add a new option to skip socket lb when in pod ns (@brb)
daemon, ipam, option: Introduce ability to bypass IP availability error #17492 -- daemon, ipam, option: Introduce ability to bypass IP availability error (@christarazi)

Once this PR is merged, you can update the PR labels via:

$ for pr in 16969 16974 17341 17489 17469 17154 17492; do contrib/backporting/set-labels.py $pr done 1.10; done

or with

$ make add-label branch=v1.10 issues=16969,16974,17341,17489,17469,17154,17492

[ upstream commit a8b3480 ] It was not clear if kernel v5.8 has problem with libceph or if 5.8 fixes the problem. Redo the sentence based on feedback to make it more clear and easy to read. Signed-off-by: Ville Ojamo <bluikko@users.noreply.github.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit cee08cd ] This allows users who do not want cilium populating neighbor table with mac addresses of neighbors cilium might have discovered via its discovery process to opt out of cilium's neighbor discovery mechanisms. Signed-off-by: Ayodele Abejide <abejideayodele@gmail.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 5936a5a ] This type will be useful for serializing cache entries for communication between agent and external components Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit fcc345e ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit f6f2406 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit 0410c92 ] xref: #16974 Signed-off-by: Ayodele Abejide <abejideayodele@gmail.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit f7303af ] This is for compatibility with Istio in kube-proxy free mode. Currently, even though Istio would still get all traffic within pod namespace, but the original service VIP is lost during socket lb, causing it to miss all Istio routing chains and therefore bypassing all Istio functionalities. This adds a new option to bypass socket lb in pod namespace. When enabled, service resolution for connection from pod namespaces will be handled in bpf_lxc at veth. For host-namespaced pods, socket lb kept as is. Signed-off-by: Weilong Cui <cuiwl@google.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit c88e06f ] This will be used in the subsequent commit to check for a specific error when allocating an IP. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

[ upstream commit baa9e9a ] In CRD-backed IPAM modes such as ENI mode, some IPs were recently removed from the IPAM pool [1]. In user environments where Cilium was running a version prior to [1], it is possible for endpoints to be assigned "unavailable" IPs. Upon upgrade (to a version which includes [1]), endpoint restoration will fail with [2]. In order to workaround this failure and not disrupt the upgrade, this change introduces a hidden flag (`--bypass-ip-availability-upon-restore`) which will inform Cilium to continue on if the restored endpoint's IP is not available for reallocation, bypassing the specific error "IP is not available". Other errors will not be bypassed, in order to reduce the scope of this stop-gap solution. With the flag set, restored endpoints which had "unavailable" IPs will keep them. Any new endpoints / pods will be assigned fresh, valid IPs from the pool. This flag is only meant to be enabled with CRD-backed IPAM modes such as ENI mode. The reason is because of the change described in [1], where the primary ENI IP was removed from the IPAM pool. In any other mode that this flag is enabled in, the user is warned that the flag is not intended for other modes and will have no effect. This patch is intended to be reverted in the future, as this stop-gap solution will no longer be required, as users of Cilium don't upgrade from versions prior to [1]. I propose that we revert this in the following release that this patch makes it in (N+1). How was this tested? 1) Deployed a Cilium version that doesn't include [1] on EKS cluster 2) Created a Deployment object which I scaled to max out the ENI IPs, such that at least one pod is assigned an "unavailable" IP 3) Upgraded Cilium to a version which does include [1] and observe [2] failures 4) Reset cluster back to state from (2) 5) Upgrade Cilium to the version that contains this commit 6) Observe log msgs from this commit and endpoint restoration succeeding 7) Scale Deployment to 0 and back up, to restart all pods 8) Observe that they all get fresh IPs and none of the "unavailable" IPs [1]: #15453 [2]: ``` { "time":"2021-09-20T16:57:00.400086481Z", "level":"WARN", "origin":"cilium.io/agent", "message":"Unable to restore endpoint, ignoring", "params":{ "endpointID":"992", "error":"Failed to re-allocate IP of endpoint: unable to reallocate 10.0.133.193 IPv4 address: IP 10.0.133.193 is not available", "k8sPodName":"default/pod-1", "subsys":"daemon" } } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

aanm

LGTM for my commit

christarazi

Thanks

brb

My changes LGTM, thanks.

glibsm · 2021-10-06T15:01:57Z

/test-backport-1.10

Job 'Cilium-PR-K8s-1.19-kernel-5.4' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sDatapathConfig Encapsulation Check connectivity with sockops and VXLAN encapsulation

Failure Output

FAIL: Error creating resource /home/jenkins/workspace/Cilium-PR-K8s-1.19-kernel-5.4/src/github.com/cilium/cilium/test/k8sT/manifests/l3-policy-demo.yaml: Cannot retrieve cilium pod cilium-p6w8t policy revision: cannot get revision from json: could not parse JSON from command "kubectl exec -n kube-system cilium-p6w8t -- cilium policy get -o json"

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-5.4 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-Runtime-4.9' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

RuntimePrivilegedUnitTests Run Tests

Failure Output

FAIL: Failed to run privileged unit tests

If it is a flake, comment /mlh new-flake Cilium-PR-Runtime-4.9 so I can create a new GitHub issue to track it.

christarazi · 2021-10-06T17:27:12Z

@glibsm It looks like test-runtime-4.9 hit a known issue but was only fixed in master, so it needs backport to v1.10. You could probably just mark the PR for needs-backport, but not actually include in this one.

And test-1.19-5.4 hit known flake: #17069

glibsm · 2021-10-07T00:02:43Z

Marking ready based on previous comment #17531 (comment)

bluikko and others added 9 commits October 4, 2021 18:38

dnsproxy: unit test GetSelectorRegexMap

9f2ec74

[ upstream commit fcc345e ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

Add neighbor discovery behavior docs to kubeproxy-free.

8bcfa93

[ upstream commit 0410c92 ] xref: #16974 Signed-off-by: Ayodele Abejide <abejideayodele@gmail.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

ipam: Convert IP not available error into sentinel error

c931aa2

[ upstream commit c88e06f ] This will be used in the subsequent commit to check for a specific error when allocating an IP. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

glibsm requested a review from a team as a code owner October 4, 2021 21:30

glibsm requested review from brb, christarazi, nebril and aanm October 4, 2021 21:31

maintainer-s-little-helper bot added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Oct 4, 2021

maintainer-s-little-helper bot assigned aanm, brb, christarazi and nebril and unassigned christarazi Oct 4, 2021

aanm approved these changes Oct 4, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned aanm Oct 4, 2021

christarazi approved these changes Oct 4, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned christarazi Oct 4, 2021

brb approved these changes Oct 5, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned brb Oct 5, 2021

nebril approved these changes Oct 6, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned nebril Oct 6, 2021

glibsm added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 7, 2021

errordeveloper merged commit b4c9323 into v1.10 Oct 7, 2021

errordeveloper deleted the pr/v1.10-backport-2021-10-04 branch October 7, 2021 15:35

joestringer mentioned this pull request Oct 13, 2021

Prepare for release v1.10.5 #17606

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.10 backports 2021-10-04 #17531

v1.10 backports 2021-10-04 #17531

Uh oh!

glibsm commented Oct 4, 2021

Uh oh!

aanm left a comment

Uh oh!

christarazi left a comment

Uh oh!

brb left a comment

Uh oh!

glibsm commented Oct 6, 2021 •

edited by maintainer-s-little-helper bot

Loading

Test Name

Failure Output

Test Name

Failure Output

Uh oh!

christarazi commented Oct 6, 2021 •

edited

Loading

Uh oh!

glibsm commented Oct 7, 2021

Uh oh!

Uh oh!

v1.10 backports 2021-10-04 #17531

v1.10 backports 2021-10-04 #17531

Uh oh!

Conversation

glibsm commented Oct 4, 2021

Uh oh!

aanm left a comment

Choose a reason for hiding this comment

Uh oh!

christarazi left a comment

Choose a reason for hiding this comment

Uh oh!

brb left a comment

Choose a reason for hiding this comment

Uh oh!

glibsm commented Oct 6, 2021 • edited by maintainer-s-little-helper bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Name

Failure Output

Test Name

Failure Output

Uh oh!

christarazi commented Oct 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glibsm commented Oct 7, 2021

Uh oh!

Uh oh!

glibsm commented Oct 6, 2021 •

edited by maintainer-s-little-helper bot

Loading

christarazi commented Oct 6, 2021 •

edited

Loading