Skip to content

Optimistic locking of CNs causes failures in updates on high load #18366

@codablock

Description

@codablock

As noted in #18352, doing only 2 attempts (here and here) when trying updates on CNs might easily fail when the apiserver is on high load.

The original assumption seems to be that this is fine, as regular background updates would fix the situation after some time. It however leads to inconsistencies in case the status update succeeds while the spec update fails. These inconsistencies in turn cause issue like the one described in #18352.

Also, even if both updates succeed, there is a short time period with inconsistencies between status and spec, as these are not updated in an atomic fashion. So this is something that should be reconsidered as well (doing the update in one step instead of splitting it into two).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentCilium agent related.area/k8sImpacts the kubernetes API, or kubernetes -> cilium internals translation layers.area/operatorImpacts the cilium-operator componentgood-first-issueGood starting point for new developers, which requires minimal understanding of Cilium.help-wantedPlease volunteer for this by adding yourself as an assignee!kind/enhancementThis would improve or streamline existing functionality.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions