Skip to content

Conversation

squeed
Copy link
Contributor

@squeed squeed commented Jun 13, 2024

Note: v1.16 does not have this problem, though the code paths are still extant. So, I filed this PR against main rather than against v1.15 directly.

When a prefix is initially created by the synchronous Upsert() API, it is flagged as such so that InjectLabels() knows it is shared. However, this flag is not removed if the legacy caller releases all references to this prefix.

Thus, the timeline

  1. AllocateCIDRs("1.1.1.1/32") --> refcount == 1
  2. UpsertPrefixes("1.1.1.1/32") --> refcount == 2
  3. ReleaseCIDRIdentities("1.1.1.1/32") --> refcount == 1
  4. RemovePrefixes("1.1.1.1/32") --> refcount == 0

leaves us with the prefix still in the ipcache, but the identity fully released. This leads to traffic drops, as the identity is unknown to the policy system and thus not present in the BPF policymaps.

The fix is to forcibly remove the prefix if the identity reference reaches zero and the prefix is not in the metadata layer.

When a prefix is initially created by the synchronous Upsert() API, it
is flagged as such so that InjectLabels() knows it is shared. However,
this flag is not removed if the legacy caller releases all references to
this prefix.

Thus, the timeline

1. AllocateCIDRs("1.1.1.1/32")
2. UpsertPrefixes("1.1.1.1/32")
3. ReleaseCIDRIdentities("1.1.1.1/32")
4. RemovePrefixes("1.1.1.1/32")

leaves us with the prefix still in the ipcache, but the identity fully
released. This leads to traffic drops, as the identity is unknown to the
policy system and thus not present in the BPF policymaps.

The fix is to forcibly remove the prefix if the identity reference
reaches zero and the prefix is not in the metadata layer.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@squeed squeed requested a review from a team as a code owner June 13, 2024 11:47
@squeed squeed requested a review from doniacld June 13, 2024 11:47
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 13, 2024
@squeed squeed added the release-note/misc This PR makes changes that have no direct user impact. label Jun 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 13, 2024
@squeed squeed added backport/author The backport will be carried out by the author of the PR. needs-backport/1.13 labels Jun 13, 2024
@squeed
Copy link
Contributor Author

squeed commented Jun 13, 2024

/test

@squeed squeed added this pull request to the merge queue Jun 14, 2024
Merged via the queue into cilium:main with commit b30a3a9 Jun 14, 2024
@squeed squeed deleted the ipcache-orphaned-prefix branch June 14, 2024 10:03
@squeed squeed added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 labels Jun 14, 2024
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels Jun 19, 2024
@julianwiedmann julianwiedmann added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 labels Jun 20, 2024
@github-actions github-actions bot added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Jun 24, 2024
@julianwiedmann julianwiedmann added affects/v1.13 This issue affects v1.13 branch and removed needs-backport/1.13 labels Jul 25, 2024
gandro added a commit that referenced this pull request Sep 5, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches. The v1.14 version of this commit contains additional unit
tests to assert the new behavior.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource"): refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource"): refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource"): refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A `tainted` field which is set to true if the entry was ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `tainted` is set, then it is assumed to be
exclusively owned by the legacy API. If both fields are set, then it is
assumed to be shared by both APIs.

Note that once an entry is tainted, it will never become untainted until
deleted. This simplifies the code, as the metadata API already has the
ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request Sep 5, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See cilium#27327 or
cilium#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches. The v1.14 version of this commit contains additional unit
tests to assert the new behavior.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR cilium#27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A `tainted` field which is set to true if the entry was ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `tainted` is set, then it is assumed to be
exclusively owned by the legacy API. If both fields are set, then it is
assumed to be shared by both APIs.

Note that once an entry is tainted, it will never become untainted until
deleted. This simplifies the code, as the metadata API already has the
ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request Sep 5, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See cilium#27327 or
cilium#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches. The v1.14 version of this commit contains additional unit
tests to assert the new behavior.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR cilium#27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A `tainted` field which is set to true if the entry was ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `tainted` is set, then it is assumed to be
exclusively owned by the legacy API. If both fields are set, then it is
assumed to be shared by both APIs.

Note that once an entry is tainted, it will never become untainted until
deleted. This simplifies the code, as the metadata API already has the
ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request Sep 9, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See cilium#27327 or
cilium#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches. The v1.14 version of this commit contains additional unit
tests to assert the new behavior.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR cilium#27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request Sep 10, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See cilium#27327 or
cilium#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches. The v1.14 version of this commit contains additional unit
tests to assert the new behavior.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR cilium#27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request Sep 16, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See cilium#27327 or
cilium#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR cilium#27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request Sep 16, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See cilium#27327 or
cilium#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR cilium#27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Sep 16, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request Sep 16, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See cilium#27327 or
cilium#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR cilium#27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Sep 17, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Sep 17, 2024
The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Sep 17, 2024
[ upstream commit a67667b ]

The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Sep 17, 2024
[ upstream commit a67667b ]

The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit to gandro/cilium that referenced this pull request Sep 17, 2024
[ upstream commit a67667b ]
[ backporter notes: This contains additional changes to cidr.go to make
  it compile.The second half of the commit description contains the
  commit message for that change (which is a separate commit in v1.14) ]

The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See cilium#27327 or
cilium#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR cilium#27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

---

ipcache: Mark generated identities as co-owned by legacy API

While a previous commit made sure the metadata API correctly co-owns
IPCache entry if there were previously exclusively owned by the legacy
API, this commit now ensures that the `UpsertGeneratedIdentities`
function also updates ownership correctly. This is needeed, because we
skip the `Upsert` for `usedIdentities` (which is fine, the
entry does already exist), but we still need to mark the existing entry
as co-owned by both APIs. This ensures that we can restore the legacy
API source if the metadata owner is disassociated. This fixes a bug that
can be reproduced with the following sequence of events:

  1. Create a netpol allowing ToFQDN one.one.one.one
  2. Create a netpol allowing ToCIDR 1.1.1.1/32
  3. Perform a DNS lookup to one.one.one.one, observe 1.1.1.1/32 and
     1.0.0.1/32 getting added to IPCache
  4. Delete the ToCIDR netpol
  5. Observe the IPCache entry for 1.1.1.1/32 vanish, while 1.0.0.1/32
     remains

The bug would occur because step 1 creates a higher-precedence metadata
entry, where as step 3 will attempt to also create lower-precedence
legacy API owners. This commit ensures that the entry 1.1.1.1/32 is
marked as co-owned and thus is not deleted in step 4.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
nebril pushed a commit that referenced this pull request Sep 18, 2024
[ upstream commit a67667b ]

The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
nebril pushed a commit that referenced this pull request Sep 18, 2024
[ upstream commit a67667b ]
[ backporter notes: This contains additional changes to cidr.go to make
  it compile.The second half of the commit description contains the
  commit message for that change (which is a separate commit in v1.14) ]

The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

---

ipcache: Mark generated identities as co-owned by legacy API

While a previous commit made sure the metadata API correctly co-owns
IPCache entry if there were previously exclusively owned by the legacy
API, this commit now ensures that the `UpsertGeneratedIdentities`
function also updates ownership correctly. This is needeed, because we
skip the `Upsert` for `usedIdentities` (which is fine, the
entry does already exist), but we still need to mark the existing entry
as co-owned by both APIs. This ensures that we can restore the legacy
API source if the metadata owner is disassociated. This fixes a bug that
can be reproduced with the following sequence of events:

  1. Create a netpol allowing ToFQDN one.one.one.one
  2. Create a netpol allowing ToCIDR 1.1.1.1/32
  3. Perform a DNS lookup to one.one.one.one, observe 1.1.1.1/32 and
     1.0.0.1/32 getting added to IPCache
  4. Delete the ToCIDR netpol
  5. Observe the IPCache entry for 1.1.1.1/32 vanish, while 1.0.0.1/32
     remains

The bug would occur because step 1 creates a higher-precedence metadata
entry, where as step 3 will attempt to also create lower-precedence
legacy API owners. This commit ensures that the entry 1.1.1.1/32 is
marked as co-owned and thus is not deleted in step 4.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Sep 25, 2024
[ upstream commit a67667b ]

The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Sep 26, 2024
[ upstream commit a67667b ]

The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Sep 27, 2024
[ upstream commit a67667b ]

The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Oct 1, 2024
[ upstream commit a67667b ]

The internal use of IPCache has two APIs:

  - An older synchronous API
    (`AllocateCIDRs`/`ReleaseCIDRIdentitiesByCIDR`), which requires the
    caller to balance allocate and release calls.
  - A newer asynchronous API (`UpsertPrefixes`/`RemovePrefixes`), which
    delegates the reference counting responsibility to the ipcache
    itself.

Cilium v1.14 and v1.15 use a mix of the two APIs for CIDR identities,
which historically has led to issues. See #27327 or
#33120 for example. Cilium v1.16 and newer do not have
these issues anymore, but the codepaths still exist. To ensure
consistency across branches, this commit will be applied to all
branches.

This commit fixes two particular issues with the asynchronous API when an
entry has been created by the legacy API.

Issue 1: Legacy API leaks mixed-API IPCache entry
-------------------------------------------------

This issue can happen if an entry is first created via the legacy API,
then overwritten by the metadata API. Consider the following sequence of
events for a prefix `p`:

 1. `AllocateCIDRs(p)`: This inserts an IPCache entry for prefix `p`
    with source `generated`. The identity has now a refcount of one.
 2. `UpsertPrefixes(p, "custom-resource")`: This bumps the identity
    refcount to two and overwrites the IPCache entry for prefix `p` with
    source `custom-resource`
 3. `RemovePrefixes(p, "custom-resource")`: This decreases the
    identity refcount down to one again, but retains the existing
    IPCache entry with source `custom-resource` (it does not delete the
    entry if the identity refcount is non-zero).
 4. `ReleaseCIDRs(p)`: This decreases the identity refcount, making it
    reach zero. It attempts to remove the IPCache entry for source
    `generated`. But because `generated` has lower precedence than
    `custom-resource`, the deletion fails and the IPCache entry is
    leaked.

Issue 2: Metadata API causes early release of mixed-API entry identities
------------------------------------------------------------------------

Assuming all entries are managed by the metadata API, `InjectLabels`
attempts to balance the identity reference count for managed identities
by perform an identity allocation call for any to-be-upserted identity,
followed by an allocation release call for any to-be-overwritten
identity.

However, if the existing entry was created by the legacy API, then
`InjectLabels` must not release the old identity (as this is the
responsibility of the legacy API owner). PR #27327 added
such logic which will skip identity release if two conditions are
satisfied:

  1. The old entry was not created via the metadata API
  2. The old entry is about to be replaced by a new one (e.g. it has a
     new source).

However, because `RemovePrefixes` does not restore the legacy source as
describe in Issue 1 above, the second condition is insufficient. Consider
the following sequence of events

  1. `AllocateCIDRs(p)`: refcount=1, source=generated
  2. `UpsertPrefixes(p, "custom-resource")`: refcount=2,
     source=custom-resource
  3. `RemovePrefixes(p, "custom-resource")`: refcount=1,
     source=custom-resource (!)
  4. `UpsertPrefixes(p, "custom-resource")`: refcount=1 (because
     condition 2 is not hit), source=custom-resource
  5. `ReleaseCIDRs(p)`: refcount=0. The identity is released early (it
     should have been kept alive due to it having associated metadata
     from step 3), while the IPCache entry is leaked (because the source of
     the entry is `custom-resource`, see Issue 1).

Proposed Solution
-----------------

Both issues stem from the fact that our tracking of mixed API entries
via `createdFromMetadata` is insufficient. A boolean cannot distinguish
between the three states we actually have:

 1. The entry is exclusively owned by the legacy API
 2. The entry is exclusively owned by the metadata API
 3. The entry is shared both APIs

These three states are kept track by two private fields in the IPCache entry:

 - A ``modifiedByLegacyAPI field which is set to true if the entry was
   ever touched by the legacy API.
 - A `overwrittenLegacySource` field which tracks the original source of
   the legacy entry before it was overwritten by a metadata API source

If neither fields are set, the entry is assumed to be exclusively owned
by the metadata API. If only `modifiedByLegacyAPI` is set, then it is
assumed to be exclusively owned by the legacy API. If both fields are
set, then it is assumed to be shared by both APIs.

Note that once an entry has modifiedByLegacyAPI=true, it will never
become false until deleted. This simplifies the code, as the metadata
API already has the ability to clean up unmanaged entries.

The state diagram and actions between the state can then be described as
follows:

```mermaid
stateDiagram-v2

    both: Shared Ownership
    metadata: Owned by Metadata API
    legacy : Owned by Legacy API
    released: Released

    [*] --> metadata: UpsertPrefixes()
    [*] --> legacy: AllocateCIDRs()
    legacy --> both: UpsertPrefixes() - bump refcount, use metadata source
    both --> legacy: RemovePrefixes() - decrease refcount, restore legacy source
    metadata --> both: AllocateCIDR() - bump refcount, taint entry, use higher precedence source
    legacy --> released: ReleaseCIDRs() - decrease refcount, remove entry if refcount=0
    both --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
    metadata --> released: RemovePrefixes() - decrease refcount, remove entry if refcount=0
```

Issue 1 is fixed thanks to the transition `both --> legacy` which now
restores the legacy source in the call to `RemovePrefixes`.

Issue 2 is fixed thanks to the fact that we can now detect the
transition `both --> legacy --> both`, which was previously impossible.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch backport/author The backport will be carried out by the author of the PR. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
Status: Backport done to v1.14
Status: Backport done to v1.15
Development

Successfully merging this pull request may close these issues.

3 participants