Skip to content

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Oct 30, 2024

Backporting to stable releases as deleting endpoint without taking the locks could lead to redirects being deleted while regeneration is waiting for an ACK from Envoy for a Network Policy update. Depending on how events line up it is possible that the Envoy listener gets deleted before it progresses to send the ACK, which would leave the regeneration to wait until the 5.5 minute timeout and then spam the logs with regeneration failure messages.

Once this PR is merged, a GitHub action will update the labels of these PRs:

 33700

rgo3 and others added 6 commits October 30, 2024 09:39
[ upstream commit da05633 ]

Currently on shutdown we don't remove the health endpoint from the
endpoint manager but do remove the corresponding netlink interface.
Note that validateEndpoint() special cases the health endpoint and
does remove the old state directory if it existed before. But
independent of that it would be better to just streamline cleanup
via cleanupHealthEndpoint() for all cases.

Related: cilium#32689
Signed-off-by: Robin Gögge <r.goegge@isovalent.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 09633b8 ]

We log whenever we start the health endpoint. Also add a log entry
whenever we attempt to clean up.

Also add a small comment in the cleanup routine for device removal.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit f7f455d ]

Do not open code it in the health code.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit a90b980 ]

Otherwise we might run risk of hitting the timeout too early. Just reping
right away upon success.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 60e5080 ]

For the health endpoint the agent needs to remove devices since its not
managed by CNI. Add a comment explaining this.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
…nish

[ upstream commit f59fe4e ]

Endpoint deletions can race with pending builds as detailed in cilium#32689.
This is due to the fact that the Delete() call does not wait for them
to finish. Instead, it tears down the endpoint, downs the device, removes
tc(x), etc. This is not correct since the inflight from regenerate() can
later still attempt to attach something to the given device. This in itself
is not problematic as the CNI will just remove the device and the kernel
removes all objects along with it. However, this can result in sporadic
agent error logs that a given lxc* device has already disappeared when
the agent tries to attach, but Delete() resp the CNI Del finished. We've
seen this in particular in case of lxc_health. The health endpoint is not
managed by the CNI plugin, but the agent instead. What appears to happen
is that when health endpoint is cleaned, it's removed from the ep manager,
and subsequently the device is removed.. however any ongoing regenerate()
later tries to attach to a non-existing device. Given the Delete() updates
the endpoint state, we also need to wrap this with the build mutex lock.
This is to wait for any pending regenerate() to finish before setting the
new ep state, and any new build requests will bail out early as they see
that the endpoint is disconnecting. Kudos to Tom Hadlaw and Robin Goegge!

Fixes: cilium#32689
Co-developed-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@jrajahalme jrajahalme added the kind/backports This PR provides functionality previously merged into master. label Oct 30, 2024
@jrajahalme jrajahalme requested a review from a team as a code owner October 30, 2024 08:43
@maintainer-s-little-helper maintainer-s-little-helper bot added the backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. label Oct 30, 2024
[ upstream commit 2dad63c ]

The endpoints buildMutex is commented as intending to be
held while doing endpoint build as well as delete.

However, this lock is not actually held during (*Endpoint).Delete().
As a result, there is a race condition that can occur where an endpoint
in the process of doing `e.owner.GetCompilationLock().RLock()` may be
deleted while waiting for this lock.

This is made more likely as the write lock on the compilation lock can
be held for several minutes at times.

When the regenerateBPF finally does acquire the lock, it may find that
it's underlying interface has already been deleted.  Specifically, in
the case of the health endpoints lxc_health interface, this is removed
after some ping timeout and endpoint-manager is called to delete the
endpoint.  However if the endpoint waiting for the compilation lock
at the same time it will still attempt to regenerate the endpoint once
the rlock becomes available, leading to errors such as:

```
msg="Error while reloading endpoint BPF program" ciliumEndpointName=/
containerID= containerInterface= datapathPolicyRevision=0
desiredPolicyRevision=1 endpointID=62 error="retrieving device
lxc_health: Link not found" identity=4 ipv4=10.244.0.18
ipv6="fd00:10:244::9647" k8sPodName=/ subsys=endpoint
```

This adds the build mutex locking to (*Endpoint).Delete(), such that
the endpoint cannot deleted while it is being regenerated.

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
@jrajahalme jrajahalme force-pushed the endpoint-leave-locking-1.15 branch from 671e3fb to 71f3700 Compare October 30, 2024 08:51
@jrajahalme
Copy link
Member Author

/test-backport-1.15

@jrajahalme jrajahalme enabled auto-merge (rebase) October 30, 2024 21:05
@jrajahalme
Copy link
Member Author

/ci-clustermesh

@jrajahalme jrajahalme merged commit 6491917 into cilium:v1.15 Oct 31, 2024
59 checks passed
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants