-
Notifications
You must be signed in to change notification settings - Fork 3.4k
ipcache: Fix orphaned ipcache entries when mixing Upsert and Inject #33120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When a prefix is initially created by the synchronous Upsert() API, it is flagged as such so that InjectLabels() knows it is shared. However, this flag is not removed if the legacy caller releases all references to this prefix. Thus, the timeline 1. AllocateCIDRs("1.1.1.1/32") 2. UpsertPrefixes("1.1.1.1/32") 3. ReleaseCIDRIdentities("1.1.1.1/32") 4. RemovePrefixes("1.1.1.1/32") leaves us with the prefix still in the ipcache, but the identity fully released. This leads to traffic drops, as the identity is unknown to the policy system and thus not present in the BPF policymaps. The fix is to forcibly remove the prefix if the identity reference reaches zero and the prefix is not in the metadata layer. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
gandro
reviewed
Jun 13, 2024
gandro
approved these changes
Jun 13, 2024
/test |
1 task
1 task
This was referenced Jul 10, 2024
3 tasks
gandro
added a commit
that referenced
this pull request
Sep 5, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. The v1.14 version of this commit contains additional unit tests to assert the new behavior. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource"): refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource"): refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource"): refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A `tainted` field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `tainted` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry is tainted, it will never become untainted until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
to gandro/cilium
that referenced
this pull request
Sep 5, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See cilium#27327 or cilium#33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. The v1.14 version of this commit contains additional unit tests to assert the new behavior. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR cilium#27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A `tainted` field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `tainted` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry is tainted, it will never become untainted until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
to gandro/cilium
that referenced
this pull request
Sep 5, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See cilium#27327 or cilium#33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. The v1.14 version of this commit contains additional unit tests to assert the new behavior. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR cilium#27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A `tainted` field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `tainted` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry is tainted, it will never become untainted until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
to gandro/cilium
that referenced
this pull request
Sep 9, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See cilium#27327 or cilium#33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. The v1.14 version of this commit contains additional unit tests to assert the new behavior. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR cilium#27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
to gandro/cilium
that referenced
this pull request
Sep 10, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See cilium#27327 or cilium#33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. The v1.14 version of this commit contains additional unit tests to assert the new behavior. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR cilium#27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
to gandro/cilium
that referenced
this pull request
Sep 16, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See cilium#27327 or cilium#33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR cilium#27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
to gandro/cilium
that referenced
this pull request
Sep 16, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See cilium#27327 or cilium#33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR cilium#27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
that referenced
this pull request
Sep 16, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
to gandro/cilium
that referenced
this pull request
Sep 16, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See cilium#27327 or cilium#33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR cilium#27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
that referenced
this pull request
Sep 17, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Sep 17, 2024
The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
that referenced
this pull request
Sep 17, 2024
[ upstream commit a67667b ] The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
that referenced
this pull request
Sep 17, 2024
[ upstream commit a67667b ] The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added a commit
to gandro/cilium
that referenced
this pull request
Sep 17, 2024
[ upstream commit a67667b ] [ backporter notes: This contains additional changes to cidr.go to make it compile.The second half of the commit description contains the commit message for that change (which is a separate commit in v1.14) ] The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See cilium#27327 or cilium#33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR cilium#27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. --- ipcache: Mark generated identities as co-owned by legacy API While a previous commit made sure the metadata API correctly co-owns IPCache entry if there were previously exclusively owned by the legacy API, this commit now ensures that the `UpsertGeneratedIdentities` function also updates ownership correctly. This is needeed, because we skip the `Upsert` for `usedIdentities` (which is fine, the entry does already exist), but we still need to mark the existing entry as co-owned by both APIs. This ensures that we can restore the legacy API source if the metadata owner is disassociated. This fixes a bug that can be reproduced with the following sequence of events: 1. Create a netpol allowing ToFQDN one.one.one.one 2. Create a netpol allowing ToCIDR 1.1.1.1/32 3. Perform a DNS lookup to one.one.one.one, observe 1.1.1.1/32 and 1.0.0.1/32 getting added to IPCache 4. Delete the ToCIDR netpol 5. Observe the IPCache entry for 1.1.1.1/32 vanish, while 1.0.0.1/32 remains The bug would occur because step 1 creates a higher-precedence metadata entry, where as step 3 will attempt to also create lower-precedence legacy API owners. This commit ensures that the entry 1.1.1.1/32 is marked as co-owned and thus is not deleted in step 4. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
nebril
pushed a commit
that referenced
this pull request
Sep 18, 2024
[ upstream commit a67667b ] The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
nebril
pushed a commit
that referenced
this pull request
Sep 18, 2024
[ upstream commit a67667b ] [ backporter notes: This contains additional changes to cidr.go to make it compile.The second half of the commit description contains the commit message for that change (which is a separate commit in v1.14) ] The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. --- ipcache: Mark generated identities as co-owned by legacy API While a previous commit made sure the metadata API correctly co-owns IPCache entry if there were previously exclusively owned by the legacy API, this commit now ensures that the `UpsertGeneratedIdentities` function also updates ownership correctly. This is needeed, because we skip the `Upsert` for `usedIdentities` (which is fine, the entry does already exist), but we still need to mark the existing entry as co-owned by both APIs. This ensures that we can restore the legacy API source if the metadata owner is disassociated. This fixes a bug that can be reproduced with the following sequence of events: 1. Create a netpol allowing ToFQDN one.one.one.one 2. Create a netpol allowing ToCIDR 1.1.1.1/32 3. Perform a DNS lookup to one.one.one.one, observe 1.1.1.1/32 and 1.0.0.1/32 getting added to IPCache 4. Delete the ToCIDR netpol 5. Observe the IPCache entry for 1.1.1.1/32 vanish, while 1.0.0.1/32 remains The bug would occur because step 1 creates a higher-precedence metadata entry, where as step 3 will attempt to also create lower-precedence legacy API owners. This commit ensures that the entry 1.1.1.1/32 is marked as co-owned and thus is not deleted in step 4. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
pippolo84
pushed a commit
that referenced
this pull request
Sep 25, 2024
[ upstream commit a67667b ] The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84
pushed a commit
that referenced
this pull request
Sep 26, 2024
[ upstream commit a67667b ] The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84
pushed a commit
that referenced
this pull request
Sep 27, 2024
[ upstream commit a67667b ] The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Oct 1, 2024
[ upstream commit a67667b ] The internal use of IPCache has two APIs: - An older synchronous API (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the caller to balance allocate and release calls. - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which delegates the reference counting responsibility to the ipcache itself. Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities, which historically has led to issues. See #27327 or #33120 for example. Cilium v1.16 and newer do not have these issues anymore, but the codepaths still exist. To ensure consistency across branches, this commit will be applied to all branches. This commit fixes two particular issues with the asynchronous API when an entry has been created by the legacy API. Issue 1: Legacy API leaks mixed-API IPCache entry ------------------------------------------------- This issue can happen if an entry is first created via the legacy API, then overwritten by the metadata API. Consider the following sequence of events for a prefix `p`: 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p` with source `generated`. The identity has now a refcount of one. 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity refcount to two and overwrites the IPCache entry for prefix `p` with source `custom-resource` 3. `RemovePrefixes(p, "custom-resource")`: This decreases the identity refcount down to one again, but retains the existing IPCache entry with source `custom-resource` (it does not delete the entry if the identity refcount is non-zero). 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it reach zero. It attempts to remove the IPCache entry for source `generated`. But because `generated` has lower precedence than `custom-resource`, the deletion fails and the IPCache entry is leaked. Issue 2: Metadata API causes early release of mixed-API entry identities ------------------------------------------------------------------------ Assuming all entries are managed by the metadata API, `InjectLabels` attempts to balance the identity reference count for managed identities by perform an identity allocation call for any to-be-upserted identity, followed by an allocation release call for any to-be-overwritten identity. However, if the existing entry was created by the legacy API, then `InjectLabels` must not release the old identity (as this is the responsibility of the legacy API owner). PR #27327 added such logic which will skip identity release if two conditions are satisfied: 1. The old entry was not created via the metadata API 2. The old entry is about to be replaced by a new one (e.g. it has a new source). However, because `RemovePrefixes` does not restore the legacy source as describe in Issue 1 above, the second condition is insufficient. Consider the following sequence of events 1. `AllocateCIDRs(p)`: refcount=1, source=generated 2. `UpsertPrefixes(p, "custom-resource")`: refcount=2, source=custom-resource 3. `RemovePrefixes(p, "custom-resource")`: refcount=1, source=custom-resource (!) 4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because condition 2 is not hit), source=custom-resource 5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it should have been kept alive due to it having associated metadata from step 3), while the IPCache entry is leaked (because the source of the entry is `custom-resource`, see Issue 1). Proposed Solution ----------------- Both issues stem from the fact that our tracking of mixed API entries via `createdFromMetadata` is insufficient. A boolean cannot distinguish between the three states we actually have: 1. The entry is exclusively owned by the legacy API 2. The entry is exclusively owned by the metadata API 3. The entry is shared both APIs These three states are kept track by two private fields in the IPCache entry: - A ``modifiedByLegacyAPI field which is set to true if the entry was ever touched by the legacy API. - A `overwrittenLegacySource` field which tracks the original source of the legacy entry before it was overwritten by a metadata API source If neither fields are set, the entry is assumed to be exclusively owned by the metadata API. If only `modifiedByLegacyAPI` is set, then it is assumed to be exclusively owned by the legacy API. If both fields are set, then it is assumed to be shared by both APIs. Note that once an entry has modifiedByLegacyAPI=true, it will never become false until deleted. This simplifies the code, as the metadata API already has the ability to clean up unmanaged entries. The state diagram and actions between the state can then be described as follows: ```mermaid stateDiagram-v2 both: Shared Ownership metadata: Owned by Metadata API legacy : Owned by Legacy API released: Released [*] --> metadata: UpsertPrefixes() [*] --> legacy: AllocateCIDRs() legacy --> both: UpsertPrefixes() - bump refcount, use metadata source both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0 both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0 ``` Issue 1 is fixed thanks to the transition `both --> legacy` which now restores the legacy source in the call to `RemovePrefixes`. Issue 2 is fixed thanks to the fact that we can now detect the transition `both --> legacy --> both`, which was previously impossible. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
affects/v1.13
This issue affects v1.13 branch
backport/author
The backport will be carried out by the author of the PR.
backport-done/1.14
The backport for Cilium 1.14.x for this PR is done.
backport-done/1.15
The backport for Cilium 1.15.x for this PR is done.
release-note/misc
This PR makes changes that have no direct user impact.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: v1.16 does not have this problem, though the code paths are still extant. So, I filed this PR against
main
rather than against v1.15 directly.When a prefix is initially created by the synchronous Upsert() API, it is flagged as such so that InjectLabels() knows it is shared. However, this flag is not removed if the legacy caller releases all references to this prefix.
Thus, the timeline
leaves us with the prefix still in the ipcache, but the identity fully released. This leads to traffic drops, as the identity is unknown to the policy system and thus not present in the BPF policymaps.
The fix is to forcibly remove the prefix if the identity reference reaches zero and the prefix is not in the metadata layer.