Skip to content

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Jul 29, 2022

A user has reported an issue where it appears that FQDN garbage collection
races with DNS proxy handling for IPs (typically with S3), where occasionally
the traffic towards a destination allowed by ToFQDNs policy is not correctly
allowing the traffic. In this scenario, the ipcache does not contain the target
IP for the connection, however the IP has a corresponding identity, and that
identity is plumbed into the policy packages and even into the BPF policymaps.

The first commit in this PR is intended to close this race condition to resolve
the underlying cause. The second commit improves the resiliency of the code by
ensuring that any time that a DNS request triggers policy updates, we attempt
to push corresponding identities into the IPcache again in case they were
temporarily removed by other code in the mean time. Finally, add some metrics
to help to debug issues in this area of the codebase.

For more details, see the individual commits.

  • ipcache: Fix race in identity/ipcache release
  • fqdn: Upsert all identities to ipcache
  • ipcache: Add metrics for upsert/delete/recover
Fix bug where traffic sent outside the cluster via ToFQDNs policy would be denied despite a policy that allows it

@joestringer joestringer added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Jul 29, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jul 29, 2022
@joestringer joestringer force-pushed the submit/fix-fqdn-race-condition branch from 9a750d3 to 40b55df Compare July 29, 2022 21:48
@joestringer
Copy link
Member Author

/test

@joestringer joestringer force-pushed the submit/fix-fqdn-race-condition branch from 40b55df to b92ce4a Compare July 30, 2022 00:56
@joestringer joestringer marked this pull request as ready for review July 30, 2022 00:56
@joestringer joestringer requested review from a team as code owners July 30, 2022 00:56
@joestringer
Copy link
Member Author

joestringer commented Jul 30, 2022

/test

Job 'Cilium-PR-K8s-1.23-kernel-4.19' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: terminating containers are not deleted after timeout

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.23-kernel-4.19 so I can create one.

Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed:

Click to show.

Test Name

K8sVerifier Runs the kernel verifier against Cilium's BPF datapath

Failure Output

FAIL: Failed to load BPF program bpf_lxc with datapath configuration:

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9 so I can create one.

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, thanks for the helpful commit msgs!

@joestringer
Copy link
Member Author

joestringer commented Jul 31, 2022

CI Triage:

@joestringer
Copy link
Member Author

joestringer commented Jul 31, 2022

/mlh new-flake Cilium-PR-K8s-1.23-kernel-4.19

👍 created #20723

@joestringer
Copy link
Member Author

/mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment as per below :)

Copy link
Contributor

@tommyp1ckles tommyp1ckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some nitpicking on metric labels, otherwise lgtm!

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observed an asymmetry of the new critical sections, but it seems inconsequential.

@@ -72,6 +73,7 @@ func (ipc *IPCache) AllocateCIDRs(
newlyAllocatedIdentities[prefixStr] = id
}
}
ipc.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be much easier to reason about if the forthcoming insertion to the ipcache was also covered by the lock, so that these new critical sections would be more symmetric. However, it appears that it would be safe if the releaseCIDRIdentities() critical section executes right after this lock is released, as in this case the identity would not be "released" and hence not removed from the ipcache.

Copy link
Member Author

@joestringer joestringer Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I meant to comment specifically about this. In the if upsert {...} case this seems reasonable. The problem is, in the target situation I'm trying to fix here the ToFQDNs logic passes upsert=false here and then does the UpsertGeneratedIdentities() much later. I don't think it's viable to hold the lock for that entire period, we'd need to do some significant restructuring to really achieve this.

.. it appears that it would be safe if the releaseCIDRIdentities() critical section executes right after this lock is released, as in this case the identity would not be "released" and hence not removed from the ipcache.

That was pretty much my thinking. We'd either get:

  • A: Lock
  • A: ipc.IdentityAllocator.Release()
  • A: ipc.deleteLocked()
  • A: Unlock
  • B: Lock
  • B: ipc.allocate()
  • B: Unlock
  • B: Lock, Upsert, Unlock

or some variation of the below, potentially with the ipc.Upsert() from B also pushed earlier before A:

  • B: Lock
  • B: ipc.allocate()
  • B: Unlock
  • A: Lock
  • A: ipc.IdentityAllocator.Release()
  • A: (no ipc.deleteLocked() since the identity has a ref from B)
  • A: Unlock
  • B: Lock, Upsert, Unlock

For now since I couldn't cover both cases, I just didn't bother to cover the UpsertGeneratedIdentities() itself with the lock here. I could cover that function with the lock too, but it might just encourage assumptions around the locking consistency on this side which don't hold because the upsert is conditionally done later.

Copy link
Member Author

@joestringer joestringer Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ Maybe longer term the real solution is to have the dnsproxy subsystem go to the identity ipcache subsystem and state "IP X needs an FQDN identity", then the ipcache can call out to the FQDN policy pieces, ensure that is all populated, then populate the ipcache itself. Then NotifyOnDNSMsg() won't need to worry about this detail, it would just need to have a waitgroup or callback to ensure the ipcache is plumbed before releasing the DNS responses. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ That sounds like the ideal design we want to strive for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the above explanation also to the commit message.

@joestringer joestringer force-pushed the submit/fix-fqdn-race-condition branch from b92ce4a to cebe666 Compare August 5, 2022 21:38
@joestringer joestringer merged commit e6ad743 into cilium:master Aug 9, 2022
@joestringer joestringer deleted the submit/fix-fqdn-race-condition branch August 9, 2022 03:51
@christarazi christarazi self-assigned this Aug 9, 2022
@tklauser tklauser added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels Aug 11, 2022
@christarazi christarazi added area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. area/agent Cilium agent related. needs-backport/1.10 labels Oct 3, 2022
@sayboras sayboras added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent Cilium agent related. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants