-
Notifications
You must be signed in to change notification settings - Fork 36
Select ToFQDN prefixes via a special label #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cilium/CFP-28427-fqdn-selectors.md
Outdated
This CFP proposes 3 key changes. Together, they preserve correctness while radically restructuring how prefixes are selected. | ||
|
||
1: Labels on prefixes propagate downwards. That is to say, if prefix `192.0.0.0/22` has label `foo:bar`, then prefix `192.0.2.0/24` also has label `foo:bar`, unless overridden. The IPCache aggregates labels accordingly. | ||
2: Only selected CIDR labels are inserted in to the ipcache. If a selector references CIDR `192.0.2.0/24`, only the prefix -> label mapping `192.0.2.0/24 -> cidr:192.0.2.0/24` is added to the IPCache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sidenote: To implement CIDR policy, the Identity must be allocated after the CIDR policy is processed and that Identity must be (a) inserted into the bpf ipcache and (b) the Identity must be evaluated against the policies applied to local endpoints and inserted into the corresponding relevant policymaps. This process is unrelated to the FQDNs functionality described in this CFP, but already occurs and must still occur to correctly implement the CIDR policy feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ordering matter? I don't think so; I'm pretty sure the SelectorCache does the right thing. What if a CIDR policy selects a /32
that has already been allocated by FQDN policy? It should work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few permutations to consider (sample, not comprehensive):
- No policy exists, create a new CIDR policy for this IP
- Presuppose a broader CIDR policy, then a new more specific CIDR policy is created
- Presuppose a more specific CIDR policy, then a new broader CIDR policy is created
- Presuppose an FQDN policy, then a CIDR policy with the same IP is created
I think that each of these cases more or less boils down to the same set of operations that need to occur in the correct order to prevent connection disruption due to (temporary) packet loss:
- Cilium processes the CIDR policy
- Cilium plumbs the policy down into the PolicyRepository, triggers Endpoint regeneration, leading to generation of the SelectorPolicy and generally preparing the SelectorCache so that when there is an identity, we can figure out how to plumb the policymaps correctly.
- Somewhere along the way Cilium needs to inform the IPCache about the IP<->label mapping
- The IPCache will react to the label mapping and allocate an Identity for the IP.
- If this happens before (2), then this will plumb the ipcache bpf map without updating the policymaps, and may lead to packet loss. Newly arriving traffic will hit the datapath, hit the ipcache lookup, resolve to the new identity, then subsequently hit the policy lookup and the policymap is not prepared for this new Identity which leads to drops.
- Given that this happens first, once (2) eventually does occur, it will automatically calculate the new policy against the latest set of Identities including the one just allocated here in step (4) and the policymaps will converge to be "eventually consistent" and allow the traffic again. However by that point we've already disrupted the connection.
- If (2) happens before (4), then the ipcache will ensure correct ordering by first notifying the selectorcache about the newly allocated Identity, and waiting for the corresponding policymaps to be plumbed, then subsequently update the bpf ipcache. There's no disruption.
- If this happens before (2), then this will plumb the ipcache bpf map without updating the policymaps, and may lead to packet loss. Newly arriving traffic will hit the datapath, hit the ipcache lookup, resolve to the new identity, then subsequently hit the policy lookup and the policymap is not prepared for this new Identity which leads to drops.
Looking through these steps, if there is another policy already in place such as a broader CIDR policy or an FQDN policy, then step (2) is already in place and so we can guarantee case (2.ii) and all is fine. The tricky cases are when there is no specific policy for this IP yet, when the newly created policy is broader than the existing one, or for extra spice when there are overlapping properties (consider for instance a policy that is broader on L3 but more specific on L4 with L7 aspects or auth). Let's say for example if there is a /32 policy already, then we create a new /24 policy on port 80 with L7 filtering. Given the dynamic Identity creation proposed in this PR, there will already be Identity X for the /32 (no /24 label), then once all is processed there must be two identities, a new Identity X′ (/24 and /32 label) and Y (/24 label), the SelectorPolicy must be recalculated to account for the distinct subsets of traffic {/32 and port 80, /32 and not port 80, not /32 and port 80}, and of course the new bpf ipcache and policymap content must be calculated based on the new SelectorPolicy and Identities.
I think we actually don't provide this level of ordering guarantee when calculating CIDR policies today; we only guarantee that policy calculation is triggered before informing the ipcache, but don't guarantee that it completes before informing the ipcache:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For existing CIDR selectors, if a given IP changes identities due to a label being added, that change can be rolled out without dropping traffic. This is because the ipcache is very careful about the order in which it interlaves operations.
For new selectors, traffic may be dropped. This is unrelated to the proposed FQDN refactor. I actually have a branch that fixes this, but it's not yet ready for merging.
|
||
This update may be calculated if we use a bitwise LPM trie. Alternatively, scanning the entire ipcache to determine affected prefixes may be more efficient. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you're collecting more lifecycle operations here but another case would be when a new name comes in and gets associated with the same IP. Similar Identity reallocation concern to above.
I just had another idea: identity pre-allocation. One of the biggest causes of FQDN tail latency is updating Envoy with new identities for a given selector. What if we pre-allocate an identity for every selector? Then we don't need to update Envoy in most cases! (This is because marking an IP as having an existing identity is just an ipcache update, not a policy update). The only time an incremental policy update would be required is if an IP was newly selected by two selectors. This would be a huge improvement in tail latency. |
I thought about per-allocation as well. It really depends how much overlap there is between selectors. If you have a Edit: And it's not just overlapping selectors, it's also overlapping IPs: The IPCache entry for |
Indeed, and we may with wish to be clever when there is a wildcard selector. Pre-allocation, however, doesn't have any relevant downsides (1 identity per FQDN selector, so O(N) = 100), and it optimizes the policy update case when there is no overlap. Which would be a huge win for tail latency; anything we can do to prevent a policymap + envoy update is an improvement. |
This commit introduces a new feature in the IPCache metadata subsystem: The ability for prefixes with local identities to inherit CIDR labels from encompassing prefixes. This is needed to implement cilium/design-cfps#17 where we want to introduce a new way to manage identities for IPs allowlisted by DNS lookups. Without going into too much detail, the key point of that CFP is that IPs returned by DNS lookups will be associated with a `fqdn:` label. Those IPs returned by a DNS lookup however might also be selected by a CIDR prefix, and thus might also optionally need a `cidr:` label if (but only if) a selecting CIDR policy is present. To give an example: If we perform a DNS lookup on `one.one.one.one` on a pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus associated that IP with a `fqdn:` label. If at the same time, we also have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both the FQDN and the CIDR policy. To implement this, the IPCache metadata subsystem is slightly augmented: 1. Whenever metadata for a prefix is upserted or removed, we no longer enqueue only the prefix it self, but also all potentially affected prefixes or IPs contained by it. This allows `InjectLabels` to then down-propagate the changes to any child prefixes. 2. In `resolveIdentity` (called by `InjectLabels`), we check if there are any parent prefixes with a CIDR label that we want to inherit. At the moment, only local identities without a `reserved:` label may inherit a CIDR label from a parent prefix. This currently therefore only applies to identities containing `fqdn:` labels. Going forward, this infrastructure might however also be useful to implement non-K8s label sources. A note on the implementation: To find affected child prefixes, we currently perform a linear scan over the metadata map. While this is a `O(n)` operation, finding CIDR parents in a trie could be done in `O(log n)`, but at the moment we expect `n` to be in hundreds to the low thousands at most. For most real world cases, we therefore expect the current implementation to perform just as well if not better than a trie. Also note that the linear scan is skipped if the prefix is a single IP and thus cannot contain any other prefixes besides itself. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit introduces a new feature in the IPCache metadata subsystem: The ability for prefixes with local identities to inherit CIDR labels from encompassing prefixes. This is needed to implement cilium/design-cfps#17 where we want to introduce a new way to manage identities for IPs allowlisted by DNS lookups. Without going into too much detail, the key point of that CFP is that IPs returned by DNS lookups will be associated with a `fqdn:` label. Those IPs returned by a DNS lookup however might also be selected by a CIDR prefix, and thus might also optionally need a `cidr:` label if (but only if) a selecting CIDR policy is present. To give an example: If we perform a DNS lookup on `one.one.one.one` on a pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus associated that IP with a `fqdn:` label. If at the same time, we also have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both the FQDN and the CIDR policy. To implement this, the IPCache metadata subsystem is slightly augmented: 1. Whenever metadata for a prefix is upserted or removed, we no longer enqueue only the prefix it self, but also all potentially affected prefixes or IPs contained by it. This allows `InjectLabels` to then down-propagate the changes to any child prefixes. 2. In `resolveIdentity` (called by `InjectLabels`), we check if there are any parent prefixes with a CIDR label that we want to inherit. At the moment, only local identities without a `reserved:` label may inherit a CIDR label from a parent prefix. This currently therefore only applies to identities containing `fqdn:` labels. Going forward, this infrastructure might however also be useful to implement non-K8s label sources. A note on the implementation: To find affected child prefixes, we currently perform a linear scan over the metadata map. While this is a `O(n)` operation, finding CIDR parents in a trie could be done in `O(log n)`, but at the moment we expect `n` to be in hundreds to the low thousands at most. For most real world cases, we therefore expect the current implementation to perform just as well if not better than a trie. Also note that the linear scan is skipped if the prefix is a single IP and thus cannot contain any other prefixes besides itself. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit introduces a new feature in the IPCache metadata subsystem: The ability for prefixes with local identities to inherit CIDR labels from encompassing prefixes. This is needed to implement cilium/design-cfps#17 where we want to introduce a new way to manage identities for IPs allowlisted by DNS lookups. Without going into too much detail, the key point of that CFP is that IPs returned by DNS lookups will be associated with a `fqdn:` label. Those IPs returned by a DNS lookup however might also be selected by a CIDR prefix, and thus might also optionally need a `cidr:` label if (but only if) a selecting CIDR policy is present. To give an example: If we perform a DNS lookup on `one.one.one.one` on a pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus associated that IP with a `fqdn:` label. If at the same time, we also have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both the FQDN and the CIDR policy. To implement this, the IPCache metadata subsystem is slightly augmented: 1. Whenever metadata for a prefix is upserted or removed, we no longer enqueue only the prefix it self, but also all potentially affected prefixes or IPs contained by it. This allows `InjectLabels` to then down-propagate the changes to any child prefixes. 2. In `resolveIdentity` (called by `InjectLabels`), we check if there are any parent prefixes with a CIDR label that we want to inherit. At the moment, only local identities without a `reserved:` label may inherit a CIDR label from a parent prefix. This currently therefore only applies to identities containing `fqdn:` labels. Going forward, this infrastructure might however also be useful to implement non-K8s label sources. A note on the implementation: To find affected child prefixes, we currently perform a linear scan over the metadata map. While this is a `O(n)` operation, finding CIDR parents in a trie could be done in `O(log n)`, but at the moment we expect `n` to be in hundreds to the low thousands at most. For most real world cases, we therefore expect the current implementation to perform just as well if not better than a trie. Also note that the linear scan is skipped if the prefix is a single IP and thus cannot contain any other prefixes besides itself. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit introduces a new feature in the IPCache metadata subsystem: The ability for prefixes with local identities to inherit CIDR labels from encompassing prefixes. This is needed to implement cilium/design-cfps#17 where we want to introduce a new way to manage identities for IPs allowlisted by DNS lookups. Without going into too much detail, the key point of that CFP is that IPs returned by DNS lookups will be associated with a `fqdn:` label. Those IPs returned by a DNS lookup however might also be selected by a CIDR prefix, and thus might also optionally need a `cidr:` label if (but only if) a selecting CIDR policy is present. To give an example: If we perform a DNS lookup on `one.one.one.one` on a pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus associated that IP with a `fqdn:` label. If at the same time, we also have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both the FQDN and the CIDR policy. To implement this, the IPCache metadata subsystem is slightly augmented: 1. Whenever metadata for a prefix is upserted or removed, we no longer enqueue only the prefix it self, but also all potentially affected prefixes or IPs contained by it. This allows `InjectLabels` to then down-propagate the changes to any child prefixes. 2. In `resolveIdentity` (called by `InjectLabels`), we check if there are any parent prefixes with a CIDR label that we want to inherit. At the moment, only local identities without a `reserved:` label may inherit a CIDR label from a parent prefix. This currently therefore only applies to identities containing `fqdn:` labels. Going forward, this infrastructure might however also be useful to implement non-K8s label sources. A note on the implementation: To find affected child prefixes, we currently perform a linear scan over the metadata map. While this is a `O(n)` operation, finding CIDR parents in a trie could be done in `O(log n)`, but at the moment we expect `n` to be in hundreds to the low thousands at most. For most real world cases, we therefore expect the current implementation to perform just as well if not better than a trie. Also note that the linear scan is skipped if the prefix is a single IP and thus cannot contain any other prefixes besides itself. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit introduces a new feature in the IPCache metadata subsystem: The ability for prefixes with local identities to inherit CIDR labels from encompassing prefixes. This is needed to implement cilium/design-cfps#17 where we want to introduce a new way to manage identities for IPs allowlisted by DNS lookups. Without going into too much detail, the key point of that CFP is that IPs returned by DNS lookups will be associated with a `fqdn:` label. Those IPs returned by a DNS lookup however might also be selected by a CIDR prefix, and thus might also optionally need a `cidr:` label if (but only if) a selecting CIDR policy is present. To give an example: If we perform a DNS lookup on `one.one.one.one` on a pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus associated that IP with a `fqdn:` label. If at the same time, we also have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both the FQDN and the CIDR policy. To implement this, the IPCache metadata subsystem is slightly augmented: 1. Whenever metadata for a prefix is upserted or removed, we no longer enqueue only the prefix it self, but also all potentially affected prefixes or IPs contained by it. This allows `InjectLabels` to then down-propagate the changes to any child prefixes. 2. In `resolveIdentity` (called by `InjectLabels`), we check if there are any parent prefixes with a CIDR label that we want to inherit. At the moment, only local identities without a `reserved:` label may inherit a CIDR label from a parent prefix. This currently therefore only applies to identities containing `fqdn:` labels. Going forward, this infrastructure might however also be useful to implement non-K8s label sources. A note on the implementation: To find affected child prefixes, we currently perform a linear scan over the metadata map. While this is a `O(n)` operation, finding CIDR parents in a trie could be done in `O(log n)`, but at the moment we expect `n` to be in hundreds to the low thousands at most. For most real world cases, we therefore expect the current implementation to perform just as well if not better than a trie. Also note that the linear scan is skipped if the prefix is a single IP and thus cannot contain any other prefixes besides itself. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit introduces a new feature in the IPCache metadata subsystem: The ability for prefixes with local identities to inherit CIDR labels from encompassing prefixes. This is needed to implement cilium/design-cfps#17 where we want to introduce a new way to manage identities for IPs allowlisted by DNS lookups. Without going into too much detail, the key point of that CFP is that IPs returned by DNS lookups will be associated with a `fqdn:` label. Those IPs returned by a DNS lookup however might also be selected by a CIDR prefix, and thus might also optionally need a `cidr:` label if (but only if) a selecting CIDR policy is present. To give an example: If we perform a DNS lookup on `one.one.one.one` on a pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus associated that IP with a `fqdn:` label. If at the same time, we also have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both the FQDN and the CIDR policy. To implement this, the IPCache metadata subsystem is slightly augmented: 1. Whenever metadata for a prefix is upserted or removed, we no longer enqueue only the prefix it self, but also all potentially affected prefixes or IPs contained by it. This allows `InjectLabels` to then down-propagate the changes to any child prefixes. 2. In `resolveIdentity` (called by `InjectLabels`), we check if there are any parent prefixes with a CIDR label that we want to inherit. At the moment, only local identities without a `reserved:` label may inherit a CIDR label from a parent prefix. This currently therefore only applies to identities containing `fqdn:` labels. Going forward, this infrastructure might however also be useful to implement non-K8s label sources. A note on the implementation: To find affected child prefixes, we currently perform a linear scan over the metadata map. While this is a `O(n)` operation, finding CIDR parents in a trie could be done in `O(log n)`, but at the moment we expect `n` to be in hundreds to the low thousands at most. For most real world cases, we therefore expect the current implementation to perform just as well if not better than a trie. Also note that the linear scan is skipped if the prefix is a single IP and thus cannot contain any other prefixes besides itself. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit introduces a new feature in the IPCache metadata subsystem: The ability for prefixes with local identities to inherit CIDR labels from encompassing prefixes. This is needed to implement cilium/design-cfps#17 where we want to introduce a new way to manage identities for IPs allowlisted by DNS lookups. Without going into too much detail, the key point of that CFP is that IPs returned by DNS lookups will be associated with a `fqdn:` label. Those IPs returned by a DNS lookup however might also be selected by a CIDR prefix, and thus might also optionally need a `cidr:` label if (but only if) a selecting CIDR policy is present. To give an example: If we perform a DNS lookup on `one.one.one.one` on a pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus associated that IP with a `fqdn:` label. If at the same time, we also have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both the FQDN and the CIDR policy. To implement this, the IPCache metadata subsystem is slightly augmented: 1. Whenever metadata for a prefix is upserted or removed, we no longer enqueue only the prefix it self, but also all potentially affected prefixes or IPs contained by it. This allows `InjectLabels` to then down-propagate the changes to any child prefixes. 2. In `resolveIdentity` (called by `InjectLabels`), we check if there are any parent prefixes with a CIDR label that we want to inherit. At the moment, only local identities without a `reserved:` label may inherit a CIDR label from a parent prefix. This currently therefore only applies to identities containing `fqdn:` labels. Going forward, this infrastructure might however also be useful to implement non-K8s label sources. A note on the implementation: To find affected child prefixes, we currently perform a linear scan over the metadata map. While this is a `O(n)` operation, finding CIDR parents in a trie could be done in `O(log n)`, but at the moment we expect `n` to be in hundreds to the low thousands at most. For most real world cases, we therefore expect the current implementation to perform just as well if not better than a trie. Also note that the linear scan is skipped if the prefix is a single IP and thus cannot contain any other prefixes besides itself. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
a97ba8c
to
27b0978
Compare
To all reviewers: this has been updated based on initial prototype implementation. It is now ready for final review. |
We've not been able to come up with a circuit breaker, since the scope for identity allocation is already so small. The existing limitation on the maximum number of names per endpoint is probably sufficient. |
27b0978
to
306720c
Compare
@joestringer I think this CFP is probably ready to merge :-). If you'd like to take a final pass, feel free. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks pretty comprehensive, I like the way that various cases were called out like the combinations of potentially overlapping label sets, as well as upgrade/downgrade consequences. I dropped a few notes and discussion points below, but I agree this is good enough to merge (and implement) 👍
|
||
There is currently a circuit-breaker, `--tofqdns-endpoint-max-ip-per-hostname`, which is intended to prevent names such as S3 causing identity churn. In this proposal, large names such as S3 should not allocate more than one or two identities, making this less critical. | ||
|
||
However, this is still a security-sensitive area, since we are allocating resources based on external, untrusted packets. We will need a sane circuit-breaker to ensure we do not allocate a large number of identities based on malicious input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fair to say that the trust in the DNS server is also quite critical in general with the ToFQDNs feature, in that if the path to the DNS server or the DNS server itself is somehow compromised, or if the user allows DNS requests to arbitrary destinations and used these as input to the FQDN engine, then these situations could each cause adverse behavior.
Then there's the 'selector combinations' side of things which is also subject to the network policies configured by the k8s namespace users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking very hard about this, I don't think this proposal significantly changes the security scope or attack surface of the DNS proxy. If an FQDN selector selects an attacker-controlled domain, then the attacker can insert any IPs they want in to policy (waves hand).
Perhaps this is worth a better writeup (@ferozsalam 😸 ).
|
||
However, this is still a security-sensitive area, since we are allocating resources based on external, untrusted packets. We will need a sane circuit-breaker to ensure we do not allocate a large number of identities based on malicious input. | ||
|
||
The maximum number of identities allocated by ToFQDN selectors is the number of unique selector combinations seen for given IPs. The theoretical maximum is, therefore, $2^N-1$[^1]. 10 distinct selectors would be OK (1024). More than 16 selectors could *theoretically* exceed identity space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there's something here we could export as a metric to at least measure the risk / likelihood of running into an issue with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a metric for the total number of local identities. We can easily add an alert for agents with a large number of local identities. This would be a very good start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that is probably good enough. I was wondering about this "multiple selectors selecting the same IP" aspect since that can end up generating double the number of identities for each new selector selecting the IP. But this is probably getting too much into the weeds for metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an end-user perspective, the interesting thing is memory usage and policy map size. It doesn't matter what leads to those identities, just that they're there -- 10,000 fqdn selectors vs. a pathological combination of 16 of them, that's all the same to the admin.
AIUI we already have alerts for overfull policy maps (I hope!). An alert for excessive local identities would be a good addition to this, I'll look in to adding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add an optional follow-up item to think about if there are metrics we can add. Having some metric that indicates that the user chose selectors that are risking combinatorial explosion would be nice to have, as it would be a more targeted signal than "policy map is full" or "allocator is out of identities"
This commit introduces a new feature in the IPCache metadata subsystem: The ability for prefixes with local identities to inherit CIDR labels from encompassing prefixes. This is needed to implement cilium/design-cfps#17 where we want to introduce a new way to manage identities for IPs allowlisted by DNS lookups. Without going into too much detail, the key point of that CFP is that IPs returned by DNS lookups will be associated with a `fqdn:` label. Those IPs returned by a DNS lookup however might also be selected by a CIDR prefix, and thus might also optionally need a `cidr:` label if (but only if) a selecting CIDR policy is present. To give an example: If we perform a DNS lookup on `one.one.one.one` on a pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus associated that IP with a `fqdn:` label. If at the same time, we also have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both the FQDN and the CIDR policy. To implement this, the IPCache metadata subsystem is slightly augmented: 1. Whenever metadata for a prefix is upserted or removed, we no longer enqueue only the prefix it self, but also all potentially affected prefixes or IPs contained by it. This allows `InjectLabels` to then down-propagate the changes to any child prefixes. 2. In `resolveIdentity` (called by `InjectLabels`), we check if there are any parent prefixes with a CIDR label that we want to inherit. At the moment, only local identities without a `reserved:` label may inherit a CIDR label from a parent prefix. This currently therefore only applies to identities containing `fqdn:` labels. Going forward, this infrastructure might however also be useful to implement non-K8s label sources. A note on the implementation: To find affected child prefixes, we currently perform a linear scan over the metadata map. While this is a `O(n)` operation, finding CIDR parents in a trie could be done in `O(log n)`, but at the moment we expect `n` to be in hundreds to the low thousands at most. For most real world cases, we therefore expect the current implementation to perform just as well if not better than a trie. Also note that the linear scan is skipped if the prefix is a single IP and thus cannot contain any other prefixes besides itself. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
306720c
to
e097c6d
Compare
Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
e097c6d
to
d88e4eb
Compare
Still having problems w/ v1.16.0 |
@cccsss01 could you please file a full issue on https://github.com/cilium/cilium ? |
This proposes a new label source:
dns
. This would then tag prefixes with the ToFQDN selector(s) that match this prefix. E.g.192.0.2.42/32 -> dns:*.cilium.io
Post-merge follow-up items:
fqdn:
identities in the local allocator