Skip to content

Conversation

gandro
Copy link
Member

@gandro gandro commented May 29, 2024

This is an implementation of the following CFP. Please refer to the CFP for details on the approach.

It is highly recommended to consult the CFP, as it contains all the high-level design decisions and mechanism found in these commits. The commit messages themselves only consider low-level details. The main commit is rather large, as I was unable to split it into smaller chunks that would the test suite by themselves. However, it contains a commit message explaining each code chunk in the diff.

Follow-up work not intended to be part of this PR:

  • Instead of collecting all matching selectors and every one as a label, extract the least generic available selector (e.g. matchName: cilium.io instead of matchPattern: *) for a given domain and use that one, as it should reduce identity churn. This requires us to teach the selector cache about selector hierarchy.
  • Pre-allocating an identity for a registered selector, rather than waiting for the first IP to be observed fqdn: Pre-allocate ToFQDN selector identities #33068 (comment)
  • Update to documentation regarding upgrade impact
  • Improve upgrade tests to also cover ToFQDN policies
  • Merge [v1.15] fqdn: Forward-compatibility with Cilium 1.16 FQDN identities #32872 needed for dropless upgrades
Improved performance for DNS lookups (up to 5x reduction in tail latency) when using ToFQDN policies. To avoid drops during upgrades in clusters with ToFQDN policies, it is highly recommended to run Cilium v1.15.6 or newer before upgrading to Cilium v1.16 

@gandro gandro added kind/performance There is a performance impact of this. release-note/major This PR introduces major new functionality to Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. upgrade-impact This PR has potential upgrade or downgrade impact. area/fqdn Affects the FQDN policies feature labels May 29, 2024
@gandro gandro force-pushed the pr/gandro/dns-identities branch 2 times, most recently from 60abc6e to 396287b Compare May 29, 2024 16:45
@gandro
Copy link
Member Author

gandro commented May 29, 2024

/test

@squeed
Copy link
Contributor

squeed commented May 30, 2024

Super clean, very impressive. Nice job! Excited for tests and (fingers crossed) performance results.

@gandro gandro force-pushed the pr/gandro/dns-identities branch 3 times, most recently from c842f90 to 33330ac Compare June 4, 2024 16:33
@gandro gandro requested a review from squeed June 4, 2024 16:37
@gandro gandro force-pushed the pr/gandro/dns-identities branch from 33330ac to 7400e34 Compare June 4, 2024 16:39
@gandro gandro marked this pull request as ready for review June 4, 2024 17:08
@gandro gandro requested review from a team as code owners June 4, 2024 17:08
@gandro gandro requested review from nebril and learnitall June 4, 2024 17:08
@gandro
Copy link
Member Author

gandro commented Jun 4, 2024

/test

@gandro gandro force-pushed the pr/gandro/dns-identities branch from 7400e34 to 73c7ca1 Compare June 5, 2024 08:16
@gandro gandro force-pushed the pr/gandro/dns-identities branch from 73c7ca1 to 88b8823 Compare June 5, 2024 08:37
gandro added 6 commits June 10, 2024 14:22
This commit modifies the IPCache restoration to restore all local
identity entries, not just CIDR identities.

This is required because FQDN labels are derived from ToFQDN selectors,
which are only available during endpoint regeneration. To ensure that
identities of prefixes in IPCache don't change during initial
regeneration, we provide the expected `fqdn` labels before regeneration.
The real labels are added during regeneration, therefore the restored
ones can be safely removed in `releaseRestoredIdentities`.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This contians no functional changes and is a drive-by cleanup.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
A subsequent commit will change prefix labels upserted by the name
manager to use `fqdn`-labels instead of `cidr`-labels. Because a CIDR
identity currently always also have the world label, we want to mirror
that logic for identities with an `fqdn` label, as such IPs allowed
by a ToFQDN policy remains selectable by a `reserved:world` selector.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This introduces a wait for the initial IPCache revision after K8s caches
have synced. This ensures that all prefix labels are injected and
available in the new IPCache before restoration starts.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit adds logic to derive identity labels for `(name, IP)` pairs
from selectors. The basic idea is that any ToFQDN selector matching the
qname of the DNS lookup is added to a label to each IP returned by that
DNS lookup.

The functions added here will be used in a subsequent commit.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit implements `CFP-28427: ToFQDN policy performance
improvements`. It is highly recommended to consult the CFP, as it
contains all the high-level design decisions and mechanism found in this
commit. The rest of this commit message therefore only explains the
"what" and "where", and not the "why".

Before this commit, there was circular interaction between the
`SelectorCache` and `NameManager`: `SelectorCache` would tell
`NameManager` about new `ToFQDN` selectors, and `NameManager` would in
turn inform `SelectorCache` about the IPs selected by that `ToFQDN`
selector. This commit simplifies this logic by removing the backlink
from the `NameManager` to the `SelectorCache`. IPs are instead now
labelled with the selector as an `fqdn` identity label in IPCache, thus
not requiring any direct changes to the `SelectorCache` when a new IP is
discovered that shares the identity with an old IP. If there is identity
allocation needed for an observed IP, the `SelectorCache` is still
updated, but only via `IPCache`, and no longer directly from
`NameManager`.

I recommend first looking at the changes to `SelectorCache` in
`pkg/policy`. Note the following changes:

1. The `identityNotifier` interface (implemented by `NameManger`) is
   simplified: We no longer care about IPs selected by a FQDN selector,
   and we no longer need to care about potential deadlocks, as there are no
   calls back from `NameManager` to `SelectorCache` in the invoked
   functions (the indirect backlink from `NameManager` to `SelectorCache`
   via `IPCache` happens in `NameManager.UpdateGenerateDNS`
   - but this function is called by the DNS proxy whenever it observes a
    new DNS lookup and thus is called without the selector cache lock
   held.
2. `UpdateFQDNSelector` (previously invoked by `NameManager`) is removed
   - `SelectorCache` no longer directly needs to know the IPs matched by
   a selector.
3. The `fqdnSelector` type is simplified: Instead of containing the list
   of CIDR identities (one for each selected IP) and checking for the
   CIDR identity in `matches`, we now can simply treat the FQDN selector as
   a label and thus check if the requested identity has the FQDN selector
   label.
4. All the unit test logic around managing the selected IPs is removed,
   as all the responsibility for updating IPs now lies in `NameManager`.

For the `NameManager` in `pkg/fqdn`, the changes are as follows:

1.  Minor changes to for the query functions in `DNSCache`: Instead of
    just listing or checking the existence of an IP, we now want to know
    about `(name, IP)` pairs (needed later for updating `IPCache`).
2.  Similarly, where before we only cared about the mapping between an
    `FQDNSelector` and the selected IPs, we now want to know what
    `(name, IP)` pairs are matched by a particular selector.
    Thus `mapSelectorsToIPsLocked` is replaced with
    `mapSelectorsToNamesLocked` and the unit tests are updated as well.
3.  `RegisterFQDNSelector` now checks if the new selector needs to be
    added to any known `(name, IP)` pairs as an `fqdn` label, and
    `UnregisterFQDNSelector` potentially removes `fqdn` labels from IPs.
4.  `UpdateGenerateDNS` (invoked for DNS lookups) determines the labels
    of any newly discovered IP and now directly spawns the go routine
    to wait for the new `(IP, identity)` pair to be injected into
    `IPCache`. Previously, this waiting was done as part of the call to
    `UpdateSelectors`, previously implemented in `daemon/cmd/fqdn.go`
    (and now removed).
5.  `ForceGenerateDNS` is removed. It was previously called by the
    `NameManager` GC to remove IPs from the `SelectorCache`, but since
    the `SelectorCache` no longer knows about IPs, the function is
    obsolete (note that `IPCache` removals are still performed upon GC)
6.  Changes in `CompleteBootstrap` to deal with the upgrade logic
    when upgrading from Cilium v1.15. See bullet point 9 below for
    details.
7.  `updateDNSIPs` (called from `UpdateGenerateDNS`, i.e. upon new DNS
    lookups) now determines the labels for every newly observed IP based
    on the available FQDN selectors, and no longer upserts CIDR
    identites. Note that we only update the labels matching the looked up
    `dnsName`. If an IP happens to also map to a different domain name and
    uses a different set of selectors for the alternative name, those labels
    in IPCache are unaffected by the call to `updateMetadata`, as every call
    to IPCache uses the DNS name as the resource owner.
8.  The `ipcacheResource`, `updateMetadata`, and `maybeRemoveMetadata`
    contain the calls to `IPCache` to update labels for a given `(name,
    IP)` pair. There are two main differences to before: Instead of
    upserting or removing CIDR prefixes, we now add labels. And instead of
    having one update per prefix, we now have one update per `(name, IP)`
    pair, meaning a single prefix (aka "IP") might have multiple IPCache
    resource owners in the `NameManager` (i.e. one for each `name` mapping
    to that IP).
9.  `RestoreCache` and `CompleteBootstrap` contain the logic to
    initialize `IPCache` when upgrading from Cilium v1.15. This requires
    the previous Cilium instance to have checkpointed the known `ToFQDN`
    selectors, which are read in during upgrade and used to derive and inject the
    `IPCache` labels we expect to have once endpoint regeneration has
    finished. After endpoint regeneration, those restored labels are then
    removed, leaving the real labels in place.
    In contrast to all other `IPCache` updates (where each update to an
    IP is "owned" by the DNS name mapping to that IP, and we rely on
    `IPCache` to merge those labels), the resource owner here is static.
    This is, because they are all added at once (in `RestoreCache`)
    and removed at once (in `CompleteBootstrap`), and no per-name
    tracking is required.
10. Various changes to unit tests. The old unit tests tested the
    interaction between `NameManager` and `SelectorCache`, where as the
    new unit tests now test the interaction between `NameManager` and
    `IPCache`.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/dns-identities branch from 38fd5e4 to 6f892a7 Compare June 10, 2024 12:22
@gandro
Copy link
Member Author

gandro commented Jun 10, 2024

/test

Edit: CI is pretty unstable due to quay throwing 502 Bad Gateway

@gandro
Copy link
Member Author

gandro commented Jun 10, 2024

ci-ipsec-upgrade fails with Error: Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/home/runner/work/cilium/cilium/.github/actions/conn-disrupt-test'. Did you forget to run actions/checkout before running your local action? which is very odd, even though it seems to have passed the actual tests. Likely due to the quay outage. Let me restart the whole test

@gandro
Copy link
Member Author

gandro commented Jun 10, 2024

/ci-ipsec-upgrade

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 10, 2024
@gandro gandro added this pull request to the merge queue Jun 10, 2024
Merged via the queue into cilium:main with commit 719eb4f Jun 10, 2024
@gandro gandro deleted the pr/gandro/dns-identities branch June 10, 2024 16:14
bimmlerd added a commit to bimmlerd/cilium that referenced this pull request Nov 29, 2024
FQDN max ips per host is a circuit-breaker which, in the past, prevented
cilium's policy engine from becoming overwhelmed by the number of
identities allocated due to IPs learned via toFQDN policies. Thanks to
recent performance optimization work[1], the default value for this
doesn't represent a sensible value anymore, since it is no longer the
case that each IP learned via DNS lookup becomes its own identity.

[1] cilium#32769

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
bimmlerd added a commit to bimmlerd/cilium that referenced this pull request Nov 29, 2024
FQDN max ips per host is a circuit-breaker which, in the past, prevented
cilium's policy engine from becoming overwhelmed by the number of
identities allocated due to IPs learned via toFQDN policies. Thanks to
recent performance optimization work[1], the default value for this
doesn't represent a sensible value anymore, since it is no longer the
case that each IP learned via DNS lookup becomes its own identity.

[1] cilium#32769

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
bimmlerd added a commit to bimmlerd/cilium that referenced this pull request Dec 2, 2024
FQDN max ips per host is a circuit-breaker which, in the past, prevented
cilium's policy engine from becoming overwhelmed by the number of
identities allocated due to IPs learned via toFQDN policies. Thanks to
recent performance optimization work[1], the default value for this
doesn't represent a sensible value anymore, since it is no longer the
case that each IP learned via DNS lookup becomes its own identity.

[1] cilium#32769

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
bimmlerd added a commit to bimmlerd/cilium that referenced this pull request Dec 3, 2024
FQDN max ips per host is a circuit-breaker which, in the past, prevented
cilium's policy engine from becoming overwhelmed by the number of
identities allocated due to IPs learned via toFQDN policies. Thanks to
recent performance optimization work[1], the default value for this
doesn't represent a sensible value anymore, since it is no longer the
case that each IP learned via DNS lookup becomes its own identity.

[1] cilium#32769

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
bimmlerd added a commit to bimmlerd/cilium that referenced this pull request Dec 5, 2024
FQDN max ips per host is a circuit-breaker which, in the past, prevented
cilium's policy engine from becoming overwhelmed by the number of
identities allocated due to IPs learned via toFQDN policies. Thanks to
recent performance optimization work[1], the default value for this
doesn't represent a sensible value anymore, since it is no longer the
case that each IP learned via DNS lookup becomes its own identity.

[1] cilium#32769

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Dec 5, 2024
FQDN max ips per host is a circuit-breaker which, in the past, prevented
cilium's policy engine from becoming overwhelmed by the number of
identities allocated due to IPs learned via toFQDN policies. Thanks to
recent performance optimization work[1], the default value for this
doesn't represent a sensible value anymore, since it is no longer the
case that each IP learned via DNS lookup becomes its own identity.

[1] #32769

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/fqdn Affects the FQDN policies feature kind/performance There is a performance impact of this. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/major This PR introduces major new functionality to Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. upgrade-impact This PR has potential upgrade or downgrade impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants