Skip to content

Conversation

pchaigno
Copy link
Member

Once this PR is merged, you can update the PR labels via:

$ for pr in 12478 12515 12206 12482 12510 12525 12345 12504 12553 12572 12573 12537 12587 12595 12597; do contrib/backporting/set-labels.py $pr done 1.8; done

pchaigno and others added 30 commits July 21, 2020 08:43
[ upstream commit 53227cd ]

For KV store tests, we execute cilium-agent as a standalone binary, in
which case logs are printed to stdout. For the Consul KV store, however,
we never append stdout to system logs, so logs for that test are not
preserved.

The Etcd test was fixed in 8864c72, but the Consul test was apparently
overlooked.

Fixes: f0741a6 ("Tests: Ginkgo test framework")
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit e558100 ]

As for most tests, at the end of RuntimeKVStoreTest, we validate the logs
don't contain any worrisome messages with:

    vm.ValidateNoErrorsOnLogs(CurrentGinkgoTestDescription().Duration)

CurrentGinkgoTestDescription().Duration includes the execution of all
ginkgo.BeforeEach [1]. In RuntimeKVStoreTest, one of our
ginkgo.BeforeEach stops the Cilium systemd service (because we run
cilium-agent as a standalone binary in the test itself). Stopping Cilium
can result in worrisome messages in the logs e.g., if the compilation of
BPF programs is terminated abruptly. This in turn makes the tests fail
once in a while.

To fix this, we can replace CurrentGinkgoTestDescription().Duration with
our own "time counter" that doesn't include any of the
ginkgo.BeforeEach executions.

To validate this fix, I ran the whole RuntimeKVStoreTest with this
change 60 times locally and 60 times in the CI (#12419). The tests
passed all 120 times. Before applying the fix, the Consul test would
fail ~1/30 times, both locally and in CI.

1 - https://github.com/onsi/ginkgo/blob/9c254cb251dc962dc20ca91d0279c870095cfcf9/internal/spec/spec.go#L132-L134
Fixes: #11895
Fixes: 5185789 ("Test: Checks for deadlocks panics in logs per each test.")
Related: #12419
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit b0610a4 ]

After shuffling, there is a small probability event that
the order of elements has not changed;
The results of multiple tests should prevail.

Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 308d67f ]

This commit replaces "svc master" by "svc frontend" and "slave" by
"backend slot".

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 73fc6e8 ]

This allows policies such as FromCIDR + ToPorts on ingress.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit c98ed06 ]

This is useful for monitoring specific endpoints.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit f90f969 ]

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 7bb4134 ]

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 453876f ]

This helper is useful for situations where you'd want to preserve the
previous Cilium configuration when redeploying. This helper abstracts
away the need to manually create a copy of the new merged configuration
before calling RedeployCilium.

Example usage:

```
daemonCfg := map[string]string{ ... }

(...)

RedeployCiliumWithMerge(..., daemonCfg, map[string]string{
  "flag-previously-set-in-daemonCfg": "1",
  "new-flag-also":                    "abc",
})
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 94eb491 ]

This test validates that the new ingress FromCIDR+ToPorts policy
(CIDR-dependent L4) allows ingress traffic from an allowed IP address
range, to specific ports.

The setup of this test requires a third node (k8s3) which doesn't have
Cilium installed on it by making use of the NO_CILIUM_ON_NODE
environment variable of the test suite.

The test validates the policy by checking that ingress traffic is:
  1) Allowed before any policies
  2) Disallowed by applying a default deny ingress policy
  3) Allowed again after applying "cnp-ingress-from-cidr-to-ports"

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit e5b1764 ]

Master currently disallows users from modifying the configuration of
endpoints with reserved labels, except for the reserved:init label. This
restriction prevents e.g., the use of the debug mode for the host
endpoint.

This commit removes that restriction for some of the configuration
options:

Config                    | Default  | Can modify
--------------------------|----------|------------
Conntrack                 | Enabled  |
ConntrackAccounting       | Enabled  |
ConntrackLocal            | Disabled |
Debug                     | Disabled |     x
DebugLB                   | Disabled |     x
DropNotification          | Enabled  |     x
MonitorAggregationLevel   | None     |     x
NAT46                     | Disabled |
PolicyAuditMode           | Disabled |     x
PolicyVerdictNotification | Enabled  |     x
TraceNotification         | Enabled  |     x

To summarize, for endpoints with reserved labels, only audit mode and
configuration options that decide what logs are sent to cilium monitor can
be changed for endpoints with reserved labels.

Fixes: #12037
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit c56ee44 ]

This reverts commit 7cfe6f2.

Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit ab38d73 ]

- ToCIDR + ToPorts has been supported since #3835 (Cilium v1.1)
- Combining any of the L3 selectors together in a single rule doesn't
  make sense.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit ffa49f7 ]

The host firewall functions will be called from nodeport code in next
commit. A few minor changes were also made in the code moved to please
checkpatch.

Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit a141eeb ]

In handle_ipv{4,6}, we implement several operations:

  1. Handle NodePort services
  2. (Handle ICMPv6 messages)
  3. Swap MAC addresses
  4. Enforce host policies

However, NodePort services do not process ICMPv6 messages so we can just
process those first. In addition, it is useless to swap MAC addresses if
the packet is going to be dropped by host policies next. Thus, this
commit reorders these operations as follows.

  1. (Handle ICMPv6 messages)
  2. Handle NodePort services
  3. Enforce host policies
  4. Swap MAC addresses

Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 56a80e7 ]

We want the host firewall to see ingress-to-host packets (1) after the
DNAT but before the SNAT on the way to the remote backend, and (2) after
the reverse SNAT but before the reverse DNAT for replies from remote
backend. This way, we always see packets with the backend pod's IP and the
client's IP.

Packets on their way to a remote backend will be redirect by
nodeport_lb{4,6} to the egress of the native device (towards the remote
node). Egress host policies will be bypassed for these packets, as they
should, because they won't have the mark indicating they come from the
host namespace. See subsequent commit in this serie where we introduce
the check on the mark in to-netdev.

See #12011 for details and discussion on this change.

Fixes: 88bf291 ("bpf: Enforce host policies for IPv4")
Fixes: 489dbef ("bpf: Enforce host policies for IPv6")
Related: #12011 (comment)
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit d20d905 ]

When the host firewall and vxlan are enabled, we need to send traffic from
pods to remote nodes through the tunnel to preserve the pods' security
IDs. If we don't and masquerading is enabled, those packets will be
SNATed and we will lose the source security ID.

Traffic from pods is automatically sent through the tunnel when the
tunnel_endpoint value in the ipcache is set. Thus, this commit ensures
that value is set to the node's IP for all remote nodes.

Before:

    $ sudo cilium bpf ipcache get 192.168.33.11
    192.168.33.11 maps to identity 6 0 0.0.0.0
    $ sudo cilium bpf ipcache get 192.168.33.12
    192.168.33.12 maps to identity 1 0 0.0.0.0

After:

    $ sudo cilium bpf ipcache get 192.168.33.11
    192.168.33.11 maps to identity 6 0 192.168.33.11
    $ sudo cilium bpf ipcache get 192.168.33.12
    192.168.33.12 maps to identity 1 0 0.0.0.0

I tested this change with the dev. VMs, vxlan and the host firewall
enabled, and a host-level L4 policy loaded. Traffic from a pod on the k8s1
was successfully sent through the tunnel to k8s2 and rejected by host
policies at k8s2. Connections allowed by policies took the same path and
were successfully established.
Since the host firewall is enabled in all Jenkins' CIs, passing tests
should also ensure this change does not break connectivity in other
scenarios.

When kube-proxy is enabled, this change makes the host firewall
incompatible with externalTrafficPolicy=Local services and portmap
chaining. These incompatibilities will require additional fixes.

Fixes: #11507
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 576028d ]

When traffic from a pod is destined to the local host, on egress from the
container, it is passed to the stack and doesn't go through the host
device (e.g., cilium_host). This results in a host firewall bypass on
ingress.

To fix this, we redirect traffic egressing pods to the host device when
the host firewall is enabled and the destination ID is that of the host.

Fixes: #11507
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit fc3a7f8 ]

The copy of the source IPv6 address into the tuple used for conntrack
lookup was dropped during a previous rebase. This bug can cause
connections to fail when egress host policies are enforced in IPv6.

Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 8b62a2b ]

For ICMPv6 messages we allow, we need to bypass the host firewall. We
cannot simply pass these messages up the stack because they are not
destined to the host.

I rebased 5244b68 not long before it was merged and likely introduced
the bug at that point. I validated the rebase with an ingress host
policy instead of an egress, and missed the regression. This time I
validated with an extensive ingress+egress L3L4 host policy included at
the end of this series.

Fixes: 5244b68 ("bpf: Hande icmpv6 in host firewall")
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit cfd8fbf ]

Packets egressing on the native device (i.e., to-netdev in bpf_host)
may be SNATed and therefore have the source IP address of the host
device. We need to take that into account when resolving the srcID to
avoid matching host policies on packets that egress from pods.

To that end, we rely on the same MARK_MAGIC_HOST mark (marking packets
as they egress the host namespace) as for packets egressing the host
namespace on the host device.

Fixes: 88bf291 ("bpf: Enforce host policies for IPv4")
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit f55ec90 ]

This policy can be used to test the host firewall. It has been tested on
the development VMs with kube-proxy-replacement=strict,
enable-remote-node-identity, and the devices configured. It attempts to
define the most fine-grained possible policies (L3+L4) on both ingress and
egress such that the connectivity-check works.

The L3-only rules are there to ensure ICMP echo messages between nodes
and with the health endpoint are not dropped. The connectivity-check
still works if these rules are removed.

This policy shouldn't be used outside of the dev. VM context as (1) it
is likely more extensive than necessary and (2) some adjustments would
be necessary to prevent breakages.

Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 0d8dba7 ]

Add mutex to protect the situation that invoke
c.Assert before ciliumNodesToK8s be deleted.

Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit bfac77f ]

Due to creating a context with a timeout prior to blocking on the
session channel to wait for it to end, the timeout of the context will
likely already hvae expired by the time the kvstore operation is
performed. It is thus cancelled immediately, requiring a subsequent
controller call. This prolongs the time the session is renewed and
causes unnecessary controller failures which can be mistaken for etcd
issues.

Also pass the etcd client context to the controller to bind the
lifecycle of a controller run to it.

This fixes errors like these:
```
  kvstore-etcd-lock-session-renew   20s ago        10s ago      1       unable to renew etcd lock session: context deadline exceeded
```

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit ba9031b ]

The controller functions could potentially block forever while waiting
on channels to close. Bind waiting on the channel and check of the etcd
version to the controller and etcd client context so these operations
get cancelled.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit a6641a3 ]

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 9c02dde ]

This commit adds new options to the Hubble-Relay binary so that the
per-request sort buffer length and drain timeout can be configured.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 878049b ]

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 9dbf00f ]

When using the Operator IPAM, the node labels are not properly
initialized simply because the labels are not copied from K8s nodes to
CiliumNode objects and vice-versa. This commit fixes it.

Fixes: a12c2ee ("k8s,node: Reuse retrieveNodeInformation to retrieve node labels")
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit eae5e41 ]

First version of the host firewall GSG. I still want to extend it with a
more extensive troubleshooting section than we have in Network Policy >
Host Policies (or I'll add a section in Troubleshooting for the host
firewall). I may also extend the GSG later to include some example of
policies on path hostns <-> pods.

Fixes: #12278
Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno and others added 7 commits July 21, 2020 08:47
[ upstream commit 2aae988 ]

Relying on node labels to apply the host policy in the host firewall
getting started guide allows us to include fully working commands
throughout the guide.

Without this, commands cannot include the 'kubectl exec' part because we
don't know on what node and cilium pod the user is executing them.

Signed-off-by: Paul Chaignon <paul@cilium.io>
…ncelled

[ upstream commit 19ad311 ]

Connected() is being used for two use cases:

1. To wait for a particular client to be connected and reach quorum.
2. To wait for the overall agent to connect to a kvstore and reach quorum.

The first use case is tied to a particular kvstore client. The second use case
is not.

The current implementation will block Connected() forever until
isConnectedAndHasQuorum() returns success. If the context is cancelled or if
the client is closed, this never happens as an error will always be returned.

Change etcd's Connected() implementation to check the context and the etcd
client context and return an error on the channel. This allows Watch() to
return if this condition is met to properly return the Watch() call which is
made against a particular client. This fixes a Watch() call never returning in
case the etcd client is closed before connectivity was ever achieved.

To account for the second use case, introduce an overall Connected() function
which will block closing the channel until kvstore connectivity has been
achieved.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 148b681 ]

When Watch() is called on an etcd client that is closing or the context has
been cancelled then the Get() call will fail. The error handler will retry the
loop with no chance of success as the context is already cancelled or the
client has been closed. The retry will occur forever. The lack of sleep in
the retry path will further put massive stress on the etcd rate limiter and
thus will reduce the probabilit for other etcd operations to succeed in time.

Fix Watch() to break out of the loop if the user or etcd client context has
been cancelled.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 26f8cd7 ]

Don't rely on timeout if the client is being cancelled. If the client is
closing, a lock can never succeed again.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit f42c7d4 ]

Don't rely on timeout of user context while waiting for initial connect if the
client is closing.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit a2e2fed ]

If we can get a local copy of the CEP we no longer need to keep getting
it from Kubernetes.

Fixes: 469475b ("watchers/endpointsynchronizer: do not delete CEP on initialization")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 0cb6c1f ]

This comes as a follow up to
#12482.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno requested review from a team as code owners July 21, 2020 06:51
@pchaigno pchaigno added backport/1.8 kind/backports This PR provides functionality previously merged into master. labels Jul 21, 2020
@pchaigno
Copy link
Member Author

test-backport-1.8

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my changes.

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my commit

@pchaigno
Copy link
Member Author

Runtime-4.9 test hit known flake #12042 on privileged unit tests. I've successfully run the privileged unit tests locally, so I think this is ready to merge given it already has several reviews.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 21, 2020
@rolinh rolinh merged commit d6ed26e into v1.8 Jul 21, 2020
@rolinh rolinh deleted the pr/v1.8-backport-2020-07-21 branch July 21, 2020 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants