Skip to content

Conversation

tklauser
Copy link
Member

@tklauser tklauser commented Jun 3, 2020

Once this PR is merged, you can update the PR labels via:

$ for pr in 11661 11771 11758 11775 11779 11773 11714 11786 11694 11760 11804 11759 11598 11776 11798 11715 11762 11812 11822 11816 11815 11818 11784 11817 11796 11839 11835 11823 11806 11831 11643 11814 11840 11615 11811 11819 11833 11834; do contrib/backporting/set-labels.py $pr done 1.8; done

dctrwatson and others added 30 commits June 3, 2020 13:22
[ upstream commit 5e5c626 ]

Signed-off-by: John Watson <johnw@planetscale.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit f719893 ]

The test was failing due resulting values being greater then expected,
however it is evident that `minAllocate` is a minimum value and should
be treated as such.

Fixes: #11560

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 6819f11 ]

A following commit will require to wait on the K8sAPIGroupPodV1Core
cache

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 05163ed ]

Due to a performance optimization, validateEndpoint() was changed to use
the pod store instad of contacting the apiserver directly. This required
to make the validation async to wait until the pod cache is available.

This change however caused the endpoint to be exposed and regenerated
before the endpoint was fully validated. This in turn caused the
validation to occur while the regeneration was being performed. When the
validation then failed, the endpoint was deleted while the endpoint was
being regenerated, causing the endpoint to fail with a "JoinEP" message
in the logs.

This commit requires the endpoint to be validated before exposure and
initial regeneration.

Fixes: #11645
Fixes: #11076
Fixes: #10856
Fixes: #10667
Fixes: #10278
Fixes: 9975bba ("pkg/endpoint: fetch pod and namespace labels from local stores")

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit b5ed91d ]

The code erroneously used `log` which is a global variable, that's
unprotected by a mutex. The correct usage is to use `scopedLog` which is
passed into ResyncInterfacesAndIPs(), which already has been locked by
the caller of ResyncInterfacesAndIPs(). The caller is n.recalculate().

Fixes: #11785
Fixes: 3dfd638 ("ipam: Move iterator logic into generic InstanceMap")
Fixes: 9779b09 ("ipam: Support for Azure IPAM")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 5dc904a ]

... which uses jiffies if possible and otherwise falls back to ktime. See
also commit cd7921f ("bpf: implement time source switch and use it in
CT") for more information wrt ktime vs jiffies.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit e838397 ]

For more info see commit cd7921f ("bpf: implement time source switch
and use it in CT") for more information wrt ktime vs jiffies. Given the CT
maps have been switched, we also should do it for NAT to allow for potential
created time correlation of entries.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit a7c6c2b ]

If there is no entry in the affinity map, then it's pointless to fetch
bpf_mono_now() and prep the match key. Only go for it when really needed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit a90e313 ]

The only thing that lb4_update_affinity_by_netns() is doing for existing
entries is updating the last_used timestamp, nothing else. So lets not
hit the BPF map update spinlock from all CPUs. Instead use a READ_ONCE()/
WRITE_ONCE() scheme and update the timestamp at the time we do the lookup
where we have the val already.

Only call the lb4_update_affinity_by_netns() update when the
backend_from_affinity is false which indicates that a backend was just
newly selected.

The lb4_update_affinity_by_netns() is addressed separately to ease review.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 0532f88 ]

Using the kernel's fib lookup is too expensive and not strictly needed
for the case of having a hairpinned load-balancer as in XDP. Given we
already have a faster NODEPORT_NEIGH{4,6} cache that we populate for
inbound connections, just look up the backend mac from there. Avoiding
the fib lookup in XDP bumps performance by ~1.5Mpps in my environment.
We can only make this assumption for the hairpin case where the LB is
exposed to a single public dev. We can extend nodeport_lb_hairpin()
later on also for the tc case. The assumption on why this is okay is
because backend pods have src IP validation in bpf_lx via checking on
is_valid_lxc_src_ip{,v4}() which prevents from spoofing (src IP/MAC)
in general.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 387364d ]

Do the same conversion as done in the DSR case also for SNAT NodePort
which similarly helps to improve the performance by avoiding the fib
lookup in the kernel. Improving by similar numbers as with DSR case.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 531ff9d ]

Given we reuse neigh BPF tables for client and backends, it doesn't make
sense to push the HW addresss down only once upon k8s node events like
nodeUpdate() and nodeDelete() where we ARP probe and update the fib. Since
this is an LRU cache they can get evicted. Therefore update once we did
a successful fib lookup so we don't need to perform another one next time
and have the guarantee that these continue to be cached once evicted.
From control plane, we need to retire old entries however, so deletes are
pushed from there.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 24d566e ]

Enforce retiring of neigh cache entries either on update or deletion of
new nodes. This will enforce the datapath to re-cache the neigh entry
from the fib. This is needed in order to not leave stale entries around.
Only do this when we have acceleration enabled right now since this is
equivalent to what the datapath does as well. We might later extend this
for tc as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 892316d ]

So far eth_store_{s,d}addr() had to fall back to byte-by-byte copies of
src/dst mac addresses from stack into XDP buffer since the case of
fib_lookup is special in that fib_params.{s,d}mac would cause unaligned
access on the BPF stack access when used with our optimized memcpy. For
the neigh table cases we do not have this issue and can therefore reduce
the number of insns needed by using word and half-word wide copies.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit f61bece ]

Small enough that it shouldn't matter much, but add a note here wrt
backend selection. Slighly clean up some of the other comments.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit e43d47e ]

We only ever need to track neigh entries of clients in case of SNAT
when the reply comes back to us or in the case where its local to the
node. For DSR and Hyrbid (DSR for TCP) we don't need to track them
here since the node only ever sees to original requests which improves
scalability for new client mappings as tracking can be avoided altogether.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit b7be1c0 ]

The BPF neigh{4,6} maps are decoupled from SNAT NodePort mode these
days and are used for i) request IP/Mac mappings for remote backends,
ii) for local backends and iii) backend to Mac mappings. Make its size
configurable in order to allow for more flexibility. Default to the
same as NAT map size.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit a24ba16 ]

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit e8a7de6 ]

This commit is to fix the recent changes after rebased. Related PR
is #11685.

Basically, there was a check if Node is nil after calling mutex lock,
which will cause nil pointer dereference. I just refactor the code
to make sure underlying node is not nil before obtaining mutex lock.

Relates to 5259ff7

Signed-off-by: Tam Mach <sayboras@yahoo.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 3a9a353 ]

Before this commit, host policy enforcement was reported as enabled as
soon as policies were loaded for the host, even if the host firewall was
disabled:

    ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                       IPv6                 IPv4          STATUS
               ENFORCEMENT        ENFORCEMENT
    318        Enabled            Enabled           1          reserved:host                                                                        ready
    3423       Disabled           Disabled          4          reserved:health                                   f00d::a0f:0:0:7ba4   10.16.0.148   ready

With this commit, enforcement will remain as disabled as long as the
host firewall is disabled:

    ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                       IPv6                 IPv4          STATUS
               ENFORCEMENT        ENFORCEMENT
    318        Disabled           Disabled          1          reserved:host                                                                        ready
    3423       Disabled           Disabled          4          reserved:health                                   f00d::a0f:0:0:7ba4   10.16.0.148   ready

Fixes: f9c205d ("pkg/policy: Host network policies")
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 1256a13 ]

Attach bpf_host to the second host device (i.e., cilium_net) from
Golang, as part of the host endpoint's datapath loading, instead of
doing it from init.sh.

Doing so has the following benefits:
- We compile one less time because we only need to patch the object file
  on the Go side.
- It simplifies the weird dance around includes in bpf_host.c.
- It moves more code out of init.sh.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit a4d86b2 ]

Test is flaky, let's disable it

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 630df50 ]

Due to jenkins bug: https://issues.jenkins-ci.org/browse/JENKINS-51454
pipeline was failing after first ci boot timeout.

This change changes timeout mechanism to `timeout` cli call instead

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit d961b9d ]

Due to HostPort not being enabled by default, do not perform the check
by default. Require the "connectivity-check-hostport.yaml" to be
deployed.

Fixes: #11563

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 7d26df1 ]

This commit is an attempt to add retry logic to Helm operations in the
Kubernetes test suite.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 1729be2 ]

Open() writes to m.fd and thus must hold m.lock while doing so. Require
m.lock to be held for writing. This also means that we can remove
m.openLock. This simplifies the overall fix as the question of order
when acquiring m.lock and m.openLock does not come up.

Fixes: #10990

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit b28262d ]

Fixes: a24ba16 ("docs: add scalability guide")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit e57ae1c ]

For the case where neither the kernel config can be determined nor derived
through probing, we currently emit a warning to the agent log. To avoid
confusion downgrade to just an info message. It's harmless since this just
means we're using ktime in our datapath instead of jiffies. Given it's
useful for debugging, dump the err to the info message as well.

Reported-by: Rene Zbinden <rene.zbinden@postfinance.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 601852b ]

On the first initialization, the iptables rules may not exist. Remove
the silently and warn on any attempt to remove rules once they are known
to exist.

Also, don't remove iptables rules on additional calls to Reinitialize()
unless iptables is enabled.

Fixes: #11809

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 6fe8d09 ]

- ensure `global.identityAllocationMode=kvstore` is set; the default
  is `crd` which is not what etcd test was meant to be testing
- it looks like there is still a deadlock occuring when default `crd`
  modes is used

Fixes: #11690

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
jedsalazar and others added 10 commits June 3, 2020 13:28
[ upstream commit 3245687 ]

Given that the connectivity testing is referenced in the troubleshooting docs, it's worth deploying them to an empty namespace, `cilium-test`. h/t @joe

Also included an initial effort to inlcude a table describing the coverage that each testing type provides.

Fixes: #11581

Signed-off-by: Jed Salazar <jed@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 3da5bcf ]

Fixes: #11581

Signed-off-by: Jed Salazar <jed@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit f00dc2a ]

We have been running into cpu frequency issues in our ci, this change
will allow us to check whether the build was affected by the problem.

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit d4a968e ]

This way you can specify it on the commandline to generate docs in the
way that RTD would by passing the environment variable, eg:

  $ READTHEDOCS_VERSION="v1.8.0-rc2" make render-docs

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 6bc5c74 ]

As mentioned in #10991, successDuration was not safely access from
eventqueue

Add RWLock to make this struct thread-safe
Using pointer for spanstat to avoid copylocks issue highlighted by govet

Closes #10991

Signed-off-by: Tam Mach <sayboras@yahoo.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 0b49599 ]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit c4553a8 ]

The forward chain rules have been depended on the local delivery
interface which depending on the setting of enable-endpoint-routes is
either `cilium_host` or `lxc+`. This is sufficient for all regular
traffic. For proxy redirection traffic, all traffic still passes through
cilium_host regardless of the value of enable-endpoint-routes.

Eample of existing rules:
```
 -A CILIUM_FORWARD -o lxc+ -m comment --comment "cilium: any->cluster on lxc+ forward accept" -j ACCEPT
 -A CILIUM_FORWARD -i lxc+ -m comment --comment "cilium: cluster->any on lxc+ forward accept (nodeport)" -j ACCEPT
 -A CILIUM_FORWARD -i lxc+ -m comment --comment "cilium: cluster->any on lxc+ forward accept" -j ACCEPT
```

This problem was masked because Kubernetes would install these
wide-reaching rules:
```
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
```

However, more recent versions of Kubernetes would install more
fine-grained rules when the PodCIDR is known too the host:
```
 -A KUBE-FORWARD -s 10.10.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
```

This will still mask the problem if Cilium uses the PodCIDR for IP
allocation. However, in case Cilium does not use the announced PodCIDR
then these rules would no longer allow the proxy redirection traffic
causes proxy redirection to break.

Fixes: #11235
Fixes: #11807

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
…n GKE

[ upstream commit 7e0469d ]

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit fbd7653 ]

GetK8sMetadata was supposed to be used for internal usage of the
package, making this function public requires protection against
parallel access by its internal mutex.

Fixes: 4699840 ("ipcache: Export GetK8sMetadata")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 40b6f58 ]

The node name in the pkg/node is set at `init()` time so there is not
point on setting this package variable all over the code.

Fixes: 3533fd3 ("pkg/node: give priority to nodeName from K8S_NODE_NAME env variable")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
@tklauser tklauser requested a review from a team as a code owner June 3, 2020 11:29
@tklauser tklauser added backport/1.8 kind/backports This PR provides functionality previously merged into master. labels Jun 3, 2020
@tklauser
Copy link
Member Author

tklauser commented Jun 3, 2020

never-tell-me-the-odds

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for my changes. Thanks @tklauser!

@tklauser
Copy link
Member Author

tklauser commented Jun 3, 2020

test-backport-1.8

@errordeveloper
Copy link
Contributor

My commit looks good!

@tklauser
Copy link
Member Author

tklauser commented Jun 4, 2020

test-backport-1.8

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my changes

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see my commits for #11678 and #11739. Were their changes overwritten by other commits, so they were dropped?

@tklauser
Copy link
Member Author

tklauser commented Jun 4, 2020

I don't see my commits for #11678 and #11739. Were their changes overwritten by other commits, so they were dropped?

They were already backported before (see commits 4fa26a4 and 1b95d35 which are in branch v1.8) but weren't marked as backport-done/1.8 thus the script picked them up for the PR description.

@tklauser
Copy link
Member Author

tklauser commented Jun 4, 2020

Not sure why Cilium-Ginkgo-Tests is timing out, see slack thread here: https://cilium.slack.com/archives/C7PE7V806/p1591198945064700

Otherwise this should be good to merge.

@nebril
Copy link
Member

nebril commented Jun 4, 2020

#11858 wasn't marked for backport, this is what caused Cilium-Ginkgo-Test to fail. These tests are run in separated pipelines too and they passed, marking this as ready-to-merge

@nebril nebril added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 4, 2020
@aanm aanm merged commit b28e0b8 into v1.8 Jun 4, 2020
@aanm aanm deleted the pr/v1.8-backport-2020-06-03 branch June 4, 2020 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.