-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v1.10 backports 2021-10-04 #17531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.10 backports 2021-10-04 #17531
Conversation
[ upstream commit a8b3480 ] It was not clear if kernel v5.8 has problem with libceph or if 5.8 fixes the problem. Redo the sentence based on feedback to make it more clear and easy to read. Signed-off-by: Ville Ojamo <bluikko@users.noreply.github.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit cee08cd ] This allows users who do not want cilium populating neighbor table with mac addresses of neighbors cilium might have discovered via its discovery process to opt out of cilium's neighbor discovery mechanisms. Signed-off-by: Ayodele Abejide <abejideayodele@gmail.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 5936a5a ] This type will be useful for serializing cache entries for communication between agent and external components Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit fcc345e ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f6f2406 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] coredns/coredns#3587 [2] coredns/coredns#3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f7303af ] This is for compatibility with Istio in kube-proxy free mode. Currently, even though Istio would still get all traffic within pod namespace, but the original service VIP is lost during socket lb, causing it to miss all Istio routing chains and therefore bypassing all Istio functionalities. This adds a new option to bypass socket lb in pod namespace. When enabled, service resolution for connection from pod namespaces will be handled in bpf_lxc at veth. For host-namespaced pods, socket lb kept as is. Signed-off-by: Weilong Cui <cuiwl@google.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit c88e06f ] This will be used in the subsequent commit to check for a specific error when allocating an IP. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit baa9e9a ] In CRD-backed IPAM modes such as ENI mode, some IPs were recently removed from the IPAM pool [1]. In user environments where Cilium was running a version prior to [1], it is possible for endpoints to be assigned "unavailable" IPs. Upon upgrade (to a version which includes [1]), endpoint restoration will fail with [2]. In order to workaround this failure and not disrupt the upgrade, this change introduces a hidden flag (`--bypass-ip-availability-upon-restore`) which will inform Cilium to continue on if the restored endpoint's IP is not available for reallocation, bypassing the specific error "IP is not available". Other errors will not be bypassed, in order to reduce the scope of this stop-gap solution. With the flag set, restored endpoints which had "unavailable" IPs will keep them. Any new endpoints / pods will be assigned fresh, valid IPs from the pool. This flag is only meant to be enabled with CRD-backed IPAM modes such as ENI mode. The reason is because of the change described in [1], where the primary ENI IP was removed from the IPAM pool. In any other mode that this flag is enabled in, the user is warned that the flag is not intended for other modes and will have no effect. This patch is intended to be reverted in the future, as this stop-gap solution will no longer be required, as users of Cilium don't upgrade from versions prior to [1]. I propose that we revert this in the following release that this patch makes it in (N+1). How was this tested? 1) Deployed a Cilium version that doesn't include [1] on EKS cluster 2) Created a Deployment object which I scaled to max out the ENI IPs, such that at least one pod is assigned an "unavailable" IP 3) Upgraded Cilium to a version which does include [1] and observe [2] failures 4) Reset cluster back to state from (2) 5) Upgrade Cilium to the version that contains this commit 6) Observe log msgs from this commit and endpoint restoration succeeding 7) Scale Deployment to 0 and back up, to restart all pods 8) Observe that they all get fresh IPs and none of the "unavailable" IPs [1]: #15453 [2]: ``` { "time":"2021-09-20T16:57:00.400086481Z", "level":"WARN", "origin":"cilium.io/agent", "message":"Unable to restore endpoint, ignoring", "params":{ "endpointID":"992", "error":"Failed to re-allocate IP of endpoint: unable to reallocate 10.0.133.193 IPv4 address: IP 10.0.133.193 is not available", "k8sPodName":"default/pod-1", "subsys":"daemon" } } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes LGTM, thanks.
/test-backport-1.10 Job 'Cilium-PR-K8s-1.19-kernel-5.4' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment Job 'Cilium-PR-Runtime-4.9' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
Marking ready based on previous comment #17531 (comment) |
Once this PR is merged, you can update the PR labels via:
or with