Skip to content

Conversation

squeed
Copy link
Contributor

@squeed squeed commented Feb 12, 2024

This proposes a new label source: dns. This would then tag prefixes with the ToFQDN selector(s) that match this prefix. E.g. 192.0.2.42/32 -> dns:*.cilium.io

Post-merge follow-up items:

  • Consider a flag limiting the number of fqdn: identities in the local allocator
  • Add alerts for excessive local identities
  • Chunk ipcache updates.

This CFP proposes 3 key changes. Together, they preserve correctness while radically restructuring how prefixes are selected.

1: Labels on prefixes propagate downwards. That is to say, if prefix `192.0.0.0/22` has label `foo:bar`, then prefix `192.0.2.0/24` also has label `foo:bar`, unless overridden. The IPCache aggregates labels accordingly.
2: Only selected CIDR labels are inserted in to the ipcache. If a selector references CIDR `192.0.2.0/24`, only the prefix -> label mapping `192.0.2.0/24 -> cidr:192.0.2.0/24` is added to the IPCache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sidenote: To implement CIDR policy, the Identity must be allocated after the CIDR policy is processed and that Identity must be (a) inserted into the bpf ipcache and (b) the Identity must be evaluated against the policies applied to local endpoints and inserted into the corresponding relevant policymaps. This process is unrelated to the FQDNs functionality described in this CFP, but already occurs and must still occur to correctly implement the CIDR policy feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ordering matter? I don't think so; I'm pretty sure the SelectorCache does the right thing. What if a CIDR policy selects a /32 that has already been allocated by FQDN policy? It should work fine.

Copy link
Member

@joestringer joestringer Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few permutations to consider (sample, not comprehensive):

  • No policy exists, create a new CIDR policy for this IP
  • Presuppose a broader CIDR policy, then a new more specific CIDR policy is created
  • Presuppose a more specific CIDR policy, then a new broader CIDR policy is created
  • Presuppose an FQDN policy, then a CIDR policy with the same IP is created

I think that each of these cases more or less boils down to the same set of operations that need to occur in the correct order to prevent connection disruption due to (temporary) packet loss:

  1. Cilium processes the CIDR policy
  2. Cilium plumbs the policy down into the PolicyRepository, triggers Endpoint regeneration, leading to generation of the SelectorPolicy and generally preparing the SelectorCache so that when there is an identity, we can figure out how to plumb the policymaps correctly.
  3. Somewhere along the way Cilium needs to inform the IPCache about the IP<->label mapping
  4. The IPCache will react to the label mapping and allocate an Identity for the IP.
    1. If this happens before (2), then this will plumb the ipcache bpf map without updating the policymaps, and may lead to packet loss. Newly arriving traffic will hit the datapath, hit the ipcache lookup, resolve to the new identity, then subsequently hit the policy lookup and the policymap is not prepared for this new Identity which leads to drops.
      1. Given that this happens first, once (2) eventually does occur, it will automatically calculate the new policy against the latest set of Identities including the one just allocated here in step (4) and the policymaps will converge to be "eventually consistent" and allow the traffic again. However by that point we've already disrupted the connection.
    2. If (2) happens before (4), then the ipcache will ensure correct ordering by first notifying the selectorcache about the newly allocated Identity, and waiting for the corresponding policymaps to be plumbed, then subsequently update the bpf ipcache. There's no disruption.

Looking through these steps, if there is another policy already in place such as a broader CIDR policy or an FQDN policy, then step (2) is already in place and so we can guarantee case (2.ii) and all is fine. The tricky cases are when there is no specific policy for this IP yet, when the newly created policy is broader than the existing one, or for extra spice when there are overlapping properties (consider for instance a policy that is broader on L3 but more specific on L4 with L7 aspects or auth). Let's say for example if there is a /32 policy already, then we create a new /24 policy on port 80 with L7 filtering. Given the dynamic Identity creation proposed in this PR, there will already be Identity X for the /32 (no /24 label), then once all is processed there must be two identities, a new Identity X′ (/24 and /32 label) and Y (/24 label), the SelectorPolicy must be recalculated to account for the distinct subsets of traffic {/32 and port 80, /32 and not port 80, not /32 and port 80}, and of course the new bpf ipcache and policymap content must be calculated based on the new SelectorPolicy and Identities.

I think we actually don't provide this level of ordering guarantee when calculating CIDR policies today; we only guarantee that policy calculation is triggered before informing the ipcache, but don't guarantee that it completes before informing the ipcache:

https://github.com/cilium/cilium/blob/12e3ae9936bd82924a995cbf2a2eb6284d6db0cb/daemon/cmd/policy.go#L464-L471

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For existing CIDR selectors, if a given IP changes identities due to a label being added, that change can be rolled out without dropping traffic. This is because the ipcache is very careful about the order in which it interlaves operations.

For new selectors, traffic may be dropped. This is unrelated to the proposed FQDN refactor. I actually have a branch that fixes this, but it's not yet ready for merging.


This update may be calculated if we use a bitwise LPM trie. Alternatively, scanning the entire ipcache to determine affected prefixes may be more efficient.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you're collecting more lifecycle operations here but another case would be when a new name comes in and gets associated with the same IP. Similar Identity reallocation concern to above.

@gandro gandro self-assigned this Apr 9, 2024
@squeed
Copy link
Contributor Author

squeed commented Apr 9, 2024

I just had another idea: identity pre-allocation.

One of the biggest causes of FQDN tail latency is updating Envoy with new identities for a given selector. What if we pre-allocate an identity for every selector? Then we don't need to update Envoy in most cases! (This is because marking an IP as having an existing identity is just an ipcache update, not a policy update). The only time an incremental policy update would be required is if an IP was newly selected by two selectors.

This would be a huge improvement in tail latency.

@gandro
Copy link
Member

gandro commented Apr 10, 2024

I just had another idea: identity pre-allocation.

One of the biggest causes of FQDN tail latency is updating Envoy with new identities for a given selector. What if we pre-allocate an identity for every selector? Then we don't need to update Envoy in most cases! (This is because marking an IP as having an existing identity is just an ipcache update, not a policy update). The only time an incremental policy update would be required is if an IP was newly selected by two selectors.

This would be a huge improvement in tail latency.

I thought about per-allocation as well. It really depends how much overlap there is between selectors. If you have a matchPattern: * selector, it will be part of every identity. So if you just pre-allocate an identity for dns:cilium.io, that identity itself will never be used. Instead, all observed IPs will have two labels: dns:cilium.io,dns:*

Edit: And it's not just overlapping selectors, it's also overlapping IPs: The IPCache entry for 1.1.1.1 might have dns:one.one.one.one,dns:* from a lookup to one.one.one.one, but then can also be returned for a lookup to cf-dns-test.cilium.io and thus might also gain a dns:*.cilium.io label in addition. It really depends on how much overlap there is for pre-allocation to actually pay off.

@squeed
Copy link
Contributor Author

squeed commented Apr 15, 2024

@gandro

I thought about pre-allocation as well. It really depends how much overlap there is between selectors.

Indeed, and we may with wish to be clever when there is a wildcard selector. Pre-allocation, however, doesn't have any relevant downsides (1 identity per FQDN selector, so O(N) = 100), and it optimizes the policy update case when there is no overlap. Which would be a huge win for tail latency; anything we can do to prevent a policymap + envoy update is an improvement.

gandro added a commit to gandro/cilium that referenced this pull request May 16, 2024
This commit introduces a new feature in the IPCache metadata subsystem:
The ability for prefixes with local identities to inherit CIDR labels
from encompassing prefixes. This is needed to implement
cilium/design-cfps#17 where we want to introduce a new way to manage
identities for IPs allowlisted by DNS lookups. Without going into too
much detail, the key point of that CFP is that IPs returned by DNS
lookups will be associated with a `fqdn:` label.

Those IPs returned by a DNS lookup however might also be selected by a
CIDR prefix, and thus might also optionally need a `cidr:` label if (but
only if) a selecting CIDR policy is present.

To give an example: If we perform a DNS lookup on `one.one.one.one` on a
pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus
associated that IP with a `fqdn:` label. If at the same time, we also
have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that
IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both
the FQDN and the CIDR policy.

To implement this, the IPCache metadata subsystem is slightly augmented:

1. Whenever metadata for a prefix is upserted or removed, we no longer
   enqueue only the prefix it self, but also all potentially affected
   prefixes or IPs contained by it. This allows `InjectLabels` to then
   down-propagate the changes to any child prefixes.
2. In `resolveIdentity` (called by `InjectLabels`), we check if there
   are any parent prefixes with a CIDR label that we want to inherit.

At the moment, only local identities without a `reserved:` label may
inherit a CIDR label from a parent prefix. This currently therefore only
applies to identities containing `fqdn:` labels. Going forward, this
infrastructure might however also be useful to implement non-K8s label
sources.

A note on the implementation: To find affected child prefixes, we
currently perform a linear scan over the metadata map. While this is a
`O(n)` operation, finding CIDR parents in a trie could be done in `O(log
n)`, but at the moment we expect `n` to be in hundreds to the low
thousands at most. For most real world cases, we therefore expect the
current implementation to perform just as well if not better than a
trie. Also note that the linear scan is skipped if the prefix is a
single IP and thus cannot contain any other prefixes besides itself.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request May 16, 2024
This commit introduces a new feature in the IPCache metadata subsystem:
The ability for prefixes with local identities to inherit CIDR labels
from encompassing prefixes. This is needed to implement
cilium/design-cfps#17 where we want to introduce a new way to manage
identities for IPs allowlisted by DNS lookups. Without going into too
much detail, the key point of that CFP is that IPs returned by DNS
lookups will be associated with a `fqdn:` label.

Those IPs returned by a DNS lookup however might also be selected by a
CIDR prefix, and thus might also optionally need a `cidr:` label if (but
only if) a selecting CIDR policy is present.

To give an example: If we perform a DNS lookup on `one.one.one.one` on a
pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus
associated that IP with a `fqdn:` label. If at the same time, we also
have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that
IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both
the FQDN and the CIDR policy.

To implement this, the IPCache metadata subsystem is slightly augmented:

1. Whenever metadata for a prefix is upserted or removed, we no longer
   enqueue only the prefix it self, but also all potentially affected
   prefixes or IPs contained by it. This allows `InjectLabels` to then
   down-propagate the changes to any child prefixes.
2. In `resolveIdentity` (called by `InjectLabels`), we check if there
   are any parent prefixes with a CIDR label that we want to inherit.

At the moment, only local identities without a `reserved:` label may
inherit a CIDR label from a parent prefix. This currently therefore only
applies to identities containing `fqdn:` labels. Going forward, this
infrastructure might however also be useful to implement non-K8s label
sources.

A note on the implementation: To find affected child prefixes, we
currently perform a linear scan over the metadata map. While this is a
`O(n)` operation, finding CIDR parents in a trie could be done in `O(log
n)`, but at the moment we expect `n` to be in hundreds to the low
thousands at most. For most real world cases, we therefore expect the
current implementation to perform just as well if not better than a
trie. Also note that the linear scan is skipped if the prefix is a
single IP and thus cannot contain any other prefixes besides itself.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request May 16, 2024
This commit introduces a new feature in the IPCache metadata subsystem:
The ability for prefixes with local identities to inherit CIDR labels
from encompassing prefixes. This is needed to implement
cilium/design-cfps#17 where we want to introduce a new way to manage
identities for IPs allowlisted by DNS lookups. Without going into too
much detail, the key point of that CFP is that IPs returned by DNS
lookups will be associated with a `fqdn:` label.

Those IPs returned by a DNS lookup however might also be selected by a
CIDR prefix, and thus might also optionally need a `cidr:` label if (but
only if) a selecting CIDR policy is present.

To give an example: If we perform a DNS lookup on `one.one.one.one` on a
pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus
associated that IP with a `fqdn:` label. If at the same time, we also
have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that
IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both
the FQDN and the CIDR policy.

To implement this, the IPCache metadata subsystem is slightly augmented:

1. Whenever metadata for a prefix is upserted or removed, we no longer
   enqueue only the prefix it self, but also all potentially affected
   prefixes or IPs contained by it. This allows `InjectLabels` to then
   down-propagate the changes to any child prefixes.
2. In `resolveIdentity` (called by `InjectLabels`), we check if there
   are any parent prefixes with a CIDR label that we want to inherit.

At the moment, only local identities without a `reserved:` label may
inherit a CIDR label from a parent prefix. This currently therefore only
applies to identities containing `fqdn:` labels. Going forward, this
infrastructure might however also be useful to implement non-K8s label
sources.

A note on the implementation: To find affected child prefixes, we
currently perform a linear scan over the metadata map. While this is a
`O(n)` operation, finding CIDR parents in a trie could be done in `O(log
n)`, but at the moment we expect `n` to be in hundreds to the low
thousands at most. For most real world cases, we therefore expect the
current implementation to perform just as well if not better than a
trie. Also note that the linear scan is skipped if the prefix is a
single IP and thus cannot contain any other prefixes besides itself.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request May 21, 2024
This commit introduces a new feature in the IPCache metadata subsystem:
The ability for prefixes with local identities to inherit CIDR labels
from encompassing prefixes. This is needed to implement
cilium/design-cfps#17 where we want to introduce a new way to manage
identities for IPs allowlisted by DNS lookups. Without going into too
much detail, the key point of that CFP is that IPs returned by DNS
lookups will be associated with a `fqdn:` label.

Those IPs returned by a DNS lookup however might also be selected by a
CIDR prefix, and thus might also optionally need a `cidr:` label if (but
only if) a selecting CIDR policy is present.

To give an example: If we perform a DNS lookup on `one.one.one.one` on a
pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus
associated that IP with a `fqdn:` label. If at the same time, we also
have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that
IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both
the FQDN and the CIDR policy.

To implement this, the IPCache metadata subsystem is slightly augmented:

1. Whenever metadata for a prefix is upserted or removed, we no longer
   enqueue only the prefix it self, but also all potentially affected
   prefixes or IPs contained by it. This allows `InjectLabels` to then
   down-propagate the changes to any child prefixes.
2. In `resolveIdentity` (called by `InjectLabels`), we check if there
   are any parent prefixes with a CIDR label that we want to inherit.

At the moment, only local identities without a `reserved:` label may
inherit a CIDR label from a parent prefix. This currently therefore only
applies to identities containing `fqdn:` labels. Going forward, this
infrastructure might however also be useful to implement non-K8s label
sources.

A note on the implementation: To find affected child prefixes, we
currently perform a linear scan over the metadata map. While this is a
`O(n)` operation, finding CIDR parents in a trie could be done in `O(log
n)`, but at the moment we expect `n` to be in hundreds to the low
thousands at most. For most real world cases, we therefore expect the
current implementation to perform just as well if not better than a
trie. Also note that the linear scan is skipped if the prefix is a
single IP and thus cannot contain any other prefixes besides itself.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request May 27, 2024
This commit introduces a new feature in the IPCache metadata subsystem:
The ability for prefixes with local identities to inherit CIDR labels
from encompassing prefixes. This is needed to implement
cilium/design-cfps#17 where we want to introduce a new way to manage
identities for IPs allowlisted by DNS lookups. Without going into too
much detail, the key point of that CFP is that IPs returned by DNS
lookups will be associated with a `fqdn:` label.

Those IPs returned by a DNS lookup however might also be selected by a
CIDR prefix, and thus might also optionally need a `cidr:` label if (but
only if) a selecting CIDR policy is present.

To give an example: If we perform a DNS lookup on `one.one.one.one` on a
pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus
associated that IP with a `fqdn:` label. If at the same time, we also
have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that
IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both
the FQDN and the CIDR policy.

To implement this, the IPCache metadata subsystem is slightly augmented:

1. Whenever metadata for a prefix is upserted or removed, we no longer
   enqueue only the prefix it self, but also all potentially affected
   prefixes or IPs contained by it. This allows `InjectLabels` to then
   down-propagate the changes to any child prefixes.
2. In `resolveIdentity` (called by `InjectLabels`), we check if there
   are any parent prefixes with a CIDR label that we want to inherit.

At the moment, only local identities without a `reserved:` label may
inherit a CIDR label from a parent prefix. This currently therefore only
applies to identities containing `fqdn:` labels. Going forward, this
infrastructure might however also be useful to implement non-K8s label
sources.

A note on the implementation: To find affected child prefixes, we
currently perform a linear scan over the metadata map. While this is a
`O(n)` operation, finding CIDR parents in a trie could be done in `O(log
n)`, but at the moment we expect `n` to be in hundreds to the low
thousands at most. For most real world cases, we therefore expect the
current implementation to perform just as well if not better than a
trie. Also note that the linear scan is skipped if the prefix is a
single IP and thus cannot contain any other prefixes besides itself.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request May 28, 2024
This commit introduces a new feature in the IPCache metadata subsystem:
The ability for prefixes with local identities to inherit CIDR labels
from encompassing prefixes. This is needed to implement
cilium/design-cfps#17 where we want to introduce a new way to manage
identities for IPs allowlisted by DNS lookups. Without going into too
much detail, the key point of that CFP is that IPs returned by DNS
lookups will be associated with a `fqdn:` label.

Those IPs returned by a DNS lookup however might also be selected by a
CIDR prefix, and thus might also optionally need a `cidr:` label if (but
only if) a selecting CIDR policy is present.

To give an example: If we perform a DNS lookup on `one.one.one.one` on a
pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus
associated that IP with a `fqdn:` label. If at the same time, we also
have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that
IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both
the FQDN and the CIDR policy.

To implement this, the IPCache metadata subsystem is slightly augmented:

1. Whenever metadata for a prefix is upserted or removed, we no longer
   enqueue only the prefix it self, but also all potentially affected
   prefixes or IPs contained by it. This allows `InjectLabels` to then
   down-propagate the changes to any child prefixes.
2. In `resolveIdentity` (called by `InjectLabels`), we check if there
   are any parent prefixes with a CIDR label that we want to inherit.

At the moment, only local identities without a `reserved:` label may
inherit a CIDR label from a parent prefix. This currently therefore only
applies to identities containing `fqdn:` labels. Going forward, this
infrastructure might however also be useful to implement non-K8s label
sources.

A note on the implementation: To find affected child prefixes, we
currently perform a linear scan over the metadata map. While this is a
`O(n)` operation, finding CIDR parents in a trie could be done in `O(log
n)`, but at the moment we expect `n` to be in hundreds to the low
thousands at most. For most real world cases, we therefore expect the
current implementation to perform just as well if not better than a
trie. Also note that the linear scan is skipped if the prefix is a
single IP and thus cannot contain any other prefixes besides itself.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to cilium/cilium that referenced this pull request May 29, 2024
This commit introduces a new feature in the IPCache metadata subsystem:
The ability for prefixes with local identities to inherit CIDR labels
from encompassing prefixes. This is needed to implement
cilium/design-cfps#17 where we want to introduce a new way to manage
identities for IPs allowlisted by DNS lookups. Without going into too
much detail, the key point of that CFP is that IPs returned by DNS
lookups will be associated with a `fqdn:` label.

Those IPs returned by a DNS lookup however might also be selected by a
CIDR prefix, and thus might also optionally need a `cidr:` label if (but
only if) a selecting CIDR policy is present.

To give an example: If we perform a DNS lookup on `one.one.one.one` on a
pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus
associated that IP with a `fqdn:` label. If at the same time, we also
have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that
IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both
the FQDN and the CIDR policy.

To implement this, the IPCache metadata subsystem is slightly augmented:

1. Whenever metadata for a prefix is upserted or removed, we no longer
   enqueue only the prefix it self, but also all potentially affected
   prefixes or IPs contained by it. This allows `InjectLabels` to then
   down-propagate the changes to any child prefixes.
2. In `resolveIdentity` (called by `InjectLabels`), we check if there
   are any parent prefixes with a CIDR label that we want to inherit.

At the moment, only local identities without a `reserved:` label may
inherit a CIDR label from a parent prefix. This currently therefore only
applies to identities containing `fqdn:` labels. Going forward, this
infrastructure might however also be useful to implement non-K8s label
sources.

A note on the implementation: To find affected child prefixes, we
currently perform a linear scan over the metadata map. While this is a
`O(n)` operation, finding CIDR parents in a trie could be done in `O(log
n)`, but at the moment we expect `n` to be in hundreds to the low
thousands at most. For most real world cases, we therefore expect the
current implementation to perform just as well if not better than a
trie. Also note that the linear scan is skipped if the prefix is a
single IP and thus cannot contain any other prefixes besides itself.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@squeed squeed force-pushed the fqdn-selector-label branch from a97ba8c to 27b0978 Compare May 29, 2024 14:21
@squeed
Copy link
Contributor Author

squeed commented May 29, 2024

To all reviewers: this has been updated based on initial prototype implementation. It is now ready for final review.

@squeed squeed marked this pull request as ready for review May 29, 2024 14:28
@squeed
Copy link
Contributor Author

squeed commented May 29, 2024

We've not been able to come up with a circuit breaker, since the scope for identity allocation is already so small. The existing limitation on the maximum number of names per endpoint is probably sufficient.

@squeed
Copy link
Contributor Author

squeed commented Jun 3, 2024

@joestringer I think this CFP is probably ready to merge :-). If you'd like to take a final pass, feel free.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks pretty comprehensive, I like the way that various cases were called out like the combinations of potentially overlapping label sets, as well as upgrade/downgrade consequences. I dropped a few notes and discussion points below, but I agree this is good enough to merge (and implement) 👍


There is currently a circuit-breaker, `--tofqdns-endpoint-max-ip-per-hostname`, which is intended to prevent names such as S3 causing identity churn. In this proposal, large names such as S3 should not allocate more than one or two identities, making this less critical.

However, this is still a security-sensitive area, since we are allocating resources based on external, untrusted packets. We will need a sane circuit-breaker to ensure we do not allocate a large number of identities based on malicious input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fair to say that the trust in the DNS server is also quite critical in general with the ToFQDNs feature, in that if the path to the DNS server or the DNS server itself is somehow compromised, or if the user allows DNS requests to arbitrary destinations and used these as input to the FQDN engine, then these situations could each cause adverse behavior.

Then there's the 'selector combinations' side of things which is also subject to the network policies configured by the k8s namespace users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking very hard about this, I don't think this proposal significantly changes the security scope or attack surface of the DNS proxy. If an FQDN selector selects an attacker-controlled domain, then the attacker can insert any IPs they want in to policy (waves hand).

Perhaps this is worth a better writeup (@ferozsalam 😸 ).


However, this is still a security-sensitive area, since we are allocating resources based on external, untrusted packets. We will need a sane circuit-breaker to ensure we do not allocate a large number of identities based on malicious input.

The maximum number of identities allocated by ToFQDN selectors is the number of unique selector combinations seen for given IPs. The theoretical maximum is, therefore, $2^N-1$[^1]. 10 distinct selectors would be OK (1024). More than 16 selectors could *theoretically* exceed identity space.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's something here we could export as a metric to at least measure the risk / likelihood of running into an issue with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a metric for the total number of local identities. We can easily add an alert for agents with a large number of local identities. This would be a very good start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that is probably good enough. I was wondering about this "multiple selectors selecting the same IP" aspect since that can end up generating double the number of identities for each new selector selecting the IP. But this is probably getting too much into the weeds for metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an end-user perspective, the interesting thing is memory usage and policy map size. It doesn't matter what leads to those identities, just that they're there -- 10,000 fqdn selectors vs. a pathological combination of 16 of them, that's all the same to the admin.

AIUI we already have alerts for overfull policy maps (I hope!). An alert for excessive local identities would be a good addition to this, I'll look in to adding this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add an optional follow-up item to think about if there are metrics we can add. Having some metric that indicates that the user chose selectors that are risking combinatorial explosion would be nice to have, as it would be a more targeted signal than "policy map is full" or "allocator is out of identities"

sayboras pushed a commit to cilium/cilium that referenced this pull request Jun 4, 2024
This commit introduces a new feature in the IPCache metadata subsystem:
The ability for prefixes with local identities to inherit CIDR labels
from encompassing prefixes. This is needed to implement
cilium/design-cfps#17 where we want to introduce a new way to manage
identities for IPs allowlisted by DNS lookups. Without going into too
much detail, the key point of that CFP is that IPs returned by DNS
lookups will be associated with a `fqdn:` label.

Those IPs returned by a DNS lookup however might also be selected by a
CIDR prefix, and thus might also optionally need a `cidr:` label if (but
only if) a selecting CIDR policy is present.

To give an example: If we perform a DNS lookup on `one.one.one.one` on a
pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus
associated that IP with a `fqdn:` label. If at the same time, we also
have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that
IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both
the FQDN and the CIDR policy.

To implement this, the IPCache metadata subsystem is slightly augmented:

1. Whenever metadata for a prefix is upserted or removed, we no longer
   enqueue only the prefix it self, but also all potentially affected
   prefixes or IPs contained by it. This allows `InjectLabels` to then
   down-propagate the changes to any child prefixes.
2. In `resolveIdentity` (called by `InjectLabels`), we check if there
   are any parent prefixes with a CIDR label that we want to inherit.

At the moment, only local identities without a `reserved:` label may
inherit a CIDR label from a parent prefix. This currently therefore only
applies to identities containing `fqdn:` labels. Going forward, this
infrastructure might however also be useful to implement non-K8s label
sources.

A note on the implementation: To find affected child prefixes, we
currently perform a linear scan over the metadata map. While this is a
`O(n)` operation, finding CIDR parents in a trie could be done in `O(log
n)`, but at the moment we expect `n` to be in hundreds to the low
thousands at most. For most real world cases, we therefore expect the
current implementation to perform just as well if not better than a
trie. Also note that the linear scan is skipped if the prefix is a
single IP and thus cannot contain any other prefixes besides itself.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the fqdn-selector-label branch from 306720c to e097c6d Compare June 5, 2024 08:27
Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@squeed squeed force-pushed the fqdn-selector-label branch from e097c6d to d88e4eb Compare June 5, 2024 16:03
@squeed squeed merged commit fe6dd4f into cilium:main Jun 6, 2024
@squeed squeed deleted the fqdn-selector-label branch July 8, 2024 09:30
@cccsss01
Copy link

Still having problems w/ v1.16.0

@joestringer
Copy link
Member

@cccsss01 could you please file a full issue on https://github.com/cilium/cilium ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

5 participants