v1.8 backports 2020-07-21 #12600

pchaigno · 2020-07-21T06:51:54Z

Fix RuntimeKVStoreTest flake #12478 -- Fix RuntimeKVStoreTest flake (@pchaigno)
.travis:fix up TestShuffle failure on Arm64 #12515 -- .travis:fix up TestShuffle failure on Arm64 (@Jianlin-lv)
lb: Use more neutral terminology #12206 -- lb: Use more neutral terminology (@brb)
policy: Enable ingress CIDR-dependent L3 policy (FromCIDR + ToPorts) #12482 -- policy: Enable ingress CIDR-dependent L3 policy (FromCIDR + ToPorts) (@christarazi)
Enable modification of config. for endpoints with reserved labels #12510 -- Enable modification of config. for endpoints with reserved labels (@pchaigno)
Clarify egress policy rule documentation #12525 -- Clarify egress policy rule documentation (@joestringer)
Host firewall fixes #12345 -- Host firewall fixes (@pchaigno)
.travis:fix up PodCIDRSuite failure on Arm64 #12504 -- .travis:fix up PodCIDRSuite failure on Arm64 (@Jianlin-lv)
etcd: Fix session renewal controllers #12553 -- etcd: Fix session renewal controllers (@tgraf)
hubble/relay: expose options to configure flows sorting #12572 -- hubble/relay: expose options to configure flows sorting (@rolinh)
k8s,node: Fix init of labels with Operator IPAM #12573 -- k8s,node: Fix init of labels with Operator IPAM (@pchaigno)
Getting started guide for the host firewall #12537 -- Getting started guide for the host firewall (@pchaigno)
Fix etcd failure behavior when user or client context ends #12587 -- Fix etcd failure behavior when user or client context ends (@tgraf)
k8s/watchers: stop init of local endpoint in success case #12595 -- k8s/watchers: stop init of local endpoint in success case (@aanm)
doc: Remove reference to (KNP) policy limitations #12597 -- doc: Remove reference to (KNP) policy limitations (@christarazi)

Once this PR is merged, you can update the PR labels via:

$ for pr in 12478 12515 12206 12482 12510 12525 12345 12504 12553 12572 12573 12537 12587 12595 12597; do contrib/backporting/set-labels.py $pr done 1.8; done

[ upstream commit 53227cd ] For KV store tests, we execute cilium-agent as a standalone binary, in which case logs are printed to stdout. For the Consul KV store, however, we never append stdout to system logs, so logs for that test are not preserved. The Etcd test was fixed in 8864c72, but the Consul test was apparently overlooked. Fixes: f0741a6 ("Tests: Ginkgo test framework") Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit e558100 ] As for most tests, at the end of RuntimeKVStoreTest, we validate the logs don't contain any worrisome messages with: vm.ValidateNoErrorsOnLogs(CurrentGinkgoTestDescription().Duration) CurrentGinkgoTestDescription().Duration includes the execution of all ginkgo.BeforeEach [1]. In RuntimeKVStoreTest, one of our ginkgo.BeforeEach stops the Cilium systemd service (because we run cilium-agent as a standalone binary in the test itself). Stopping Cilium can result in worrisome messages in the logs e.g., if the compilation of BPF programs is terminated abruptly. This in turn makes the tests fail once in a while. To fix this, we can replace CurrentGinkgoTestDescription().Duration with our own "time counter" that doesn't include any of the ginkgo.BeforeEach executions. To validate this fix, I ran the whole RuntimeKVStoreTest with this change 60 times locally and 60 times in the CI (#12419). The tests passed all 120 times. Before applying the fix, the Consul test would fail ~1/30 times, both locally and in CI. 1 - https://github.com/onsi/ginkgo/blob/9c254cb251dc962dc20ca91d0279c870095cfcf9/internal/spec/spec.go#L132-L134 Fixes: #11895 Fixes: 5185789 ("Test: Checks for deadlocks panics in logs per each test.") Related: #12419 Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit b0610a4 ] After shuffling, there is a small probability event that the order of elements has not changed; The results of multiple tests should prevail. Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 308d67f ] This commit replaces "svc master" by "svc frontend" and "slave" by "backend slot". Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 73fc6e8 ] This allows policies such as FromCIDR + ToPorts on ingress. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit c98ed06 ] This is useful for monitoring specific endpoints. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit f90f969 ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 7bb4134 ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 453876f ] This helper is useful for situations where you'd want to preserve the previous Cilium configuration when redeploying. This helper abstracts away the need to manually create a copy of the new merged configuration before calling RedeployCilium. Example usage: ``` daemonCfg := map[string]string{ ... } (...) RedeployCiliumWithMerge(..., daemonCfg, map[string]string{ "flag-previously-set-in-daemonCfg": "1", "new-flag-also": "abc", }) ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 94eb491 ] This test validates that the new ingress FromCIDR+ToPorts policy (CIDR-dependent L4) allows ingress traffic from an allowed IP address range, to specific ports. The setup of this test requires a third node (k8s3) which doesn't have Cilium installed on it by making use of the NO_CILIUM_ON_NODE environment variable of the test suite. The test validates the policy by checking that ingress traffic is: 1) Allowed before any policies 2) Disallowed by applying a default deny ingress policy 3) Allowed again after applying "cnp-ingress-from-cidr-to-ports" Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit e5b1764 ] Master currently disallows users from modifying the configuration of endpoints with reserved labels, except for the reserved:init label. This restriction prevents e.g., the use of the debug mode for the host endpoint. This commit removes that restriction for some of the configuration options: Config | Default | Can modify --------------------------|----------|------------ Conntrack | Enabled | ConntrackAccounting | Enabled | ConntrackLocal | Disabled | Debug | Disabled | x DebugLB | Disabled | x DropNotification | Enabled | x MonitorAggregationLevel | None | x NAT46 | Disabled | PolicyAuditMode | Disabled | x PolicyVerdictNotification | Enabled | x TraceNotification | Enabled | x To summarize, for endpoints with reserved labels, only audit mode and configuration options that decide what logs are sent to cilium monitor can be changed for endpoints with reserved labels. Fixes: #12037 Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit c56ee44 ] This reverts commit 7cfe6f2. Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit ab38d73 ] - ToCIDR + ToPorts has been supported since #3835 (Cilium v1.1) - Combining any of the L3 selectors together in a single rule doesn't make sense. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit ffa49f7 ] The host firewall functions will be called from nodeport code in next commit. A few minor changes were also made in the code moved to please checkpatch. Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit a141eeb ] In handle_ipv{4,6}, we implement several operations: 1. Handle NodePort services 2. (Handle ICMPv6 messages) 3. Swap MAC addresses 4. Enforce host policies However, NodePort services do not process ICMPv6 messages so we can just process those first. In addition, it is useless to swap MAC addresses if the packet is going to be dropped by host policies next. Thus, this commit reorders these operations as follows. 1. (Handle ICMPv6 messages) 2. Handle NodePort services 3. Enforce host policies 4. Swap MAC addresses Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 56a80e7 ] We want the host firewall to see ingress-to-host packets (1) after the DNAT but before the SNAT on the way to the remote backend, and (2) after the reverse SNAT but before the reverse DNAT for replies from remote backend. This way, we always see packets with the backend pod's IP and the client's IP. Packets on their way to a remote backend will be redirect by nodeport_lb{4,6} to the egress of the native device (towards the remote node). Egress host policies will be bypassed for these packets, as they should, because they won't have the mark indicating they come from the host namespace. See subsequent commit in this serie where we introduce the check on the mark in to-netdev. See #12011 for details and discussion on this change. Fixes: 88bf291 ("bpf: Enforce host policies for IPv4") Fixes: 489dbef ("bpf: Enforce host policies for IPv6") Related: #12011 (comment) Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit d20d905 ] When the host firewall and vxlan are enabled, we need to send traffic from pods to remote nodes through the tunnel to preserve the pods' security IDs. If we don't and masquerading is enabled, those packets will be SNATed and we will lose the source security ID. Traffic from pods is automatically sent through the tunnel when the tunnel_endpoint value in the ipcache is set. Thus, this commit ensures that value is set to the node's IP for all remote nodes. Before: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 0.0.0.0 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 After: $ sudo cilium bpf ipcache get 192.168.33.11 192.168.33.11 maps to identity 6 0 192.168.33.11 $ sudo cilium bpf ipcache get 192.168.33.12 192.168.33.12 maps to identity 1 0 0.0.0.0 I tested this change with the dev. VMs, vxlan and the host firewall enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1 was successfully sent through the tunnel to k8s2 and rejected by host policies at k8s2. Connections allowed by policies took the same path and were successfully established. Since the host firewall is enabled in all Jenkins' CIs, passing tests should also ensure this change does not break connectivity in other scenarios. When kube-proxy is enabled, this change makes the host firewall incompatible with externalTrafficPolicy=Local services and portmap chaining. These incompatibilities will require additional fixes. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 576028d ] When traffic from a pod is destined to the local host, on egress from the container, it is passed to the stack and doesn't go through the host device (e.g., cilium_host). This results in a host firewall bypass on ingress. To fix this, we redirect traffic egressing pods to the host device when the host firewall is enabled and the destination ID is that of the host. Fixes: #11507 Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit fc3a7f8 ] The copy of the source IPv6 address into the tuple used for conntrack lookup was dropped during a previous rebase. This bug can cause connections to fail when egress host policies are enforced in IPv6. Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 8b62a2b ] For ICMPv6 messages we allow, we need to bypass the host firewall. We cannot simply pass these messages up the stack because they are not destined to the host. I rebased 5244b68 not long before it was merged and likely introduced the bug at that point. I validated the rebase with an ingress host policy instead of an egress, and missed the regression. This time I validated with an extensive ingress+egress L3L4 host policy included at the end of this series. Fixes: 5244b68 ("bpf: Hande icmpv6 in host firewall") Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit cfd8fbf ] Packets egressing on the native device (i.e., to-netdev in bpf_host) may be SNATed and therefore have the source IP address of the host device. We need to take that into account when resolving the srcID to avoid matching host policies on packets that egress from pods. To that end, we rely on the same MARK_MAGIC_HOST mark (marking packets as they egress the host namespace) as for packets egressing the host namespace on the host device. Fixes: 88bf291 ("bpf: Enforce host policies for IPv4") Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit f55ec90 ] This policy can be used to test the host firewall. It has been tested on the development VMs with kube-proxy-replacement=strict, enable-remote-node-identity, and the devices configured. It attempts to define the most fine-grained possible policies (L3+L4) on both ingress and egress such that the connectivity-check works. The L3-only rules are there to ensure ICMP echo messages between nodes and with the health endpoint are not dropped. The connectivity-check still works if these rules are removed. This policy shouldn't be used outside of the dev. VM context as (1) it is likely more extensive than necessary and (2) some adjustments would be necessary to prevent breakages. Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 0d8dba7 ] Add mutex to protect the situation that invoke c.Assert before ciliumNodesToK8s be deleted. Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit bfac77f ] Due to creating a context with a timeout prior to blocking on the session channel to wait for it to end, the timeout of the context will likely already hvae expired by the time the kvstore operation is performed. It is thus cancelled immediately, requiring a subsequent controller call. This prolongs the time the session is renewed and causes unnecessary controller failures which can be mistaken for etcd issues. Also pass the etcd client context to the controller to bind the lifecycle of a controller run to it. This fixes errors like these: ``` kvstore-etcd-lock-session-renew 20s ago 10s ago 1 unable to renew etcd lock session: context deadline exceeded ``` Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit ba9031b ] The controller functions could potentially block forever while waiting on channels to close. Bind waiting on the channel and check of the etcd version to the controller and etcd client context so these operations get cancelled. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit a6641a3 ] Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 9c02dde ] This commit adds new options to the Hubble-Relay binary so that the per-request sort buffer length and drain timeout can be configured. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 878049b ] Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 9dbf00f ] When using the Operator IPAM, the node labels are not properly initialized simply because the labels are not copied from K8s nodes to CiliumNode objects and vice-versa. This commit fixes it. Fixes: a12c2ee ("k8s,node: Reuse retrieveNodeInformation to retrieve node labels") Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit eae5e41 ] First version of the host firewall GSG. I still want to extend it with a more extensive troubleshooting section than we have in Network Policy > Host Policies (or I'll add a section in Troubleshooting for the host firewall). I may also extend the GSG later to include some example of policies on path hostns <-> pods. Fixes: #12278 Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 2aae988 ] Relying on node labels to apply the host policy in the host firewall getting started guide allows us to include fully working commands throughout the guide. Without this, commands cannot include the 'kubectl exec' part because we don't know on what node and cilium pod the user is executing them. Signed-off-by: Paul Chaignon <paul@cilium.io>

…ncelled [ upstream commit 19ad311 ] Connected() is being used for two use cases: 1. To wait for a particular client to be connected and reach quorum. 2. To wait for the overall agent to connect to a kvstore and reach quorum. The first use case is tied to a particular kvstore client. The second use case is not. The current implementation will block Connected() forever until isConnectedAndHasQuorum() returns success. If the context is cancelled or if the client is closed, this never happens as an error will always be returned. Change etcd's Connected() implementation to check the context and the etcd client context and return an error on the channel. This allows Watch() to return if this condition is met to properly return the Watch() call which is made against a particular client. This fixes a Watch() call never returning in case the etcd client is closed before connectivity was ever achieved. To account for the second use case, introduce an overall Connected() function which will block closing the channel until kvstore connectivity has been achieved. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 148b681 ] When Watch() is called on an etcd client that is closing or the context has been cancelled then the Get() call will fail. The error handler will retry the loop with no chance of success as the context is already cancelled or the client has been closed. The retry will occur forever. The lack of sleep in the retry path will further put massive stress on the etcd rate limiter and thus will reduce the probabilit for other etcd operations to succeed in time. Fix Watch() to break out of the loop if the user or etcd client context has been cancelled. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 26f8cd7 ] Don't rely on timeout if the client is being cancelled. If the client is closing, a lock can never succeed again. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit f42c7d4 ] Don't rely on timeout of user context while waiting for initial connect if the client is closing. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit a2e2fed ] If we can get a local copy of the CEP we no longer need to keep getting it from Kubernetes. Fixes: 469475b ("watchers/endpointsynchronizer: do not delete CEP on initialization") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

[ upstream commit 0cb6c1f ] This comes as a follow up to #12482. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

pchaigno · 2020-07-21T07:07:46Z

test-backport-1.8

brb

LGTM for my changes.

aanm

LGTM for my commit

pchaigno · 2020-07-21T14:04:56Z

Runtime-4.9 test hit known flake #12042 on privileged unit tests. I've successfully run the privileged unit tests locally, so I think this is ready to merge given it already has several reviews.

pchaigno and others added 30 commits July 21, 2020 08:43

lb: Use more neutral terminology

deded54

[ upstream commit 308d67f ] This commit replaces "svc master" by "svc frontend" and "slave" by "backend slot". Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Paul Chaignon <paul@cilium.io>

policy: Allow CIDR-dependent L4 support on ingress

672d0b7

[ upstream commit 73fc6e8 ] This allows policies such as FromCIDR + ToPorts on ingress. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

test/helpers: Add MonitorEndpointStart

9f9ae41

[ upstream commit c98ed06 ] This is useful for monitoring specific endpoints. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

test/helpers: Add adding/deleting route wrappers

3eee3bf

[ upstream commit f90f969 ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

test: Factor out importing policy function

ec64d4d

[ upstream commit 7bb4134 ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

Revert "test: Hide host endpoint from runtime tests"

36dd663

[ upstream commit c56ee44 ] This reverts commit 7cfe6f2. Signed-off-by: Paul Chaignon <paul@cilium.io>

bpf: Move host firewall code to header file

11a77a0

[ upstream commit ffa49f7 ] The host firewall functions will be called from nodeport code in next commit. A few minor changes were also made in the code moved to please checkpatch. Signed-off-by: Paul Chaignon <paul@cilium.io>

.travis:fix up PodCIDRSuite failure on Arm64

5e45362

[ upstream commit 0d8dba7 ] Add mutex to protect the situation that invoke c.Assert before ciliumNodesToK8s be deleted. Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

hubble/relay: rename sort buffer options to SortBuffer...

71766f7

[ upstream commit a6641a3 ] Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Paul Chaignon <paul@cilium.io>

helm: add options to configure Hubble Relay per-request flows buffer

e1aaef2

[ upstream commit 878049b ] Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Paul Chaignon <paul@cilium.io>

pchaigno and others added 7 commits July 21, 2020 08:47

etcd: Cancel isConnectedAndHasQuorum() on client close

43411ea

[ upstream commit f42c7d4 ] Don't rely on timeout of user context while waiting for initial connect if the client is closing. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>

doc: Remove reference to policy limitations

929a00b

[ upstream commit 0cb6c1f ] This comes as a follow up to #12482. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

pchaigno requested review from a team as code owners July 21, 2020 06:51

pchaigno added backport/1.8 kind/backports This PR provides functionality previously merged into master. labels Jul 21, 2020

pchaigno requested review from joestringer, aanm, brb, christarazi, rolinh and tgraf July 21, 2020 07:08

pchaigno added the priority/release-blocker label Jul 21, 2020

rolinh approved these changes Jul 21, 2020

View reviewed changes

brb approved these changes Jul 21, 2020

View reviewed changes

aanm approved these changes Jul 21, 2020

View reviewed changes

pchaigno mentioned this pull request Jul 21, 2020

CI: RuntimePrivilegedUnitTests: timeout in DNSProxyTestSuite.TestRespondViaCorrectProtocol #12042

Closed

pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 21, 2020

rolinh merged commit d6ed26e into v1.8 Jul 21, 2020

rolinh deleted the pr/v1.8-backport-2020-07-21 branch July 21, 2020 14:11

pchaigno mentioned this pull request Jul 23, 2020

Followups for host endpoint and firewall #11799

Closed

34 tasks

pchaigno added release-blocker/1.8 and removed priority/release-blocker labels Jan 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.8 backports 2020-07-21 #12600

v1.8 backports 2020-07-21 #12600

Uh oh!

pchaigno commented Jul 21, 2020

Uh oh!

pchaigno commented Jul 21, 2020

Uh oh!

brb left a comment

Uh oh!

aanm left a comment

Uh oh!

pchaigno commented Jul 21, 2020

Uh oh!

Uh oh!

v1.8 backports 2020-07-21 #12600

v1.8 backports 2020-07-21 #12600

Uh oh!

Conversation

pchaigno commented Jul 21, 2020

Uh oh!

pchaigno commented Jul 21, 2020

Uh oh!

brb left a comment

Choose a reason for hiding this comment

Uh oh!

aanm left a comment

Choose a reason for hiding this comment

Uh oh!

pchaigno commented Jul 21, 2020

Uh oh!

Uh oh!