Skip to content

Conversation

tkna
Copy link
Contributor

@tkna tkna commented May 27, 2024

This PR adds requireK8sConnectivity flag (default: true) to exclude the K8s probe from the result of health probe.
This flag can be set by a newly added http header require-k8s-connectivity of /healthz cilium API and pod's /healthz endpoint.

Fixes: #31702

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 27, 2024
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label May 27, 2024
@tkna tkna force-pushed the exclude-k8s-probe branch from 9cfd13e to 595e536 Compare May 27, 2024 08:16
@tkna tkna changed the title Make cilium status independent from k8s status daemon: Make cilium status independent from k8s status May 29, 2024
@tkna tkna marked this pull request as ready for review May 29, 2024 02:10
@tkna tkna requested review from a team as code owners May 29, 2024 02:10
@tkna tkna requested review from ysksuzuki, ldelossa and learnitall May 29, 2024 02:10
@ldelossa ldelossa added the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label May 29, 2024
@ldelossa
Copy link
Contributor

@tkna I'm going to mark this for more discussion. While I see your issue, I'm not sure if disabling liveliness probes is ideal from Cilium's point of view.

I'm going to pull in some folks in your linked CFP for further discussion. Lets go over this in that issue.

@YutaroHayakawa
Copy link
Member

I think this solution makes sense. We have discussed about this issue in the APAC community meeting and concluded that killing the whole Cilium agent on k8s disconnection is too aggressive. We might want to kill the certain subsystem, but maybe not entire agent.

For example, the real impact of k8s disconnection for BGP Control Plane is the user's configuration change may not be propagated in timely manner. BGP works without any problem otherwise. I'd assume most of the time, users want to avoid connectivity loss instead of configuration delay.

We might have a subsystem which this behavior is desirable, so opt-in might be a safer option. However, in my personal opinion, we can just completely remove k8s connectivity from health criteria and fix such a subsystem to stop individually.

Let me CC @joestringer as a person who was involved in the original discussion and also as a core maintainer.

@aanm aanm requested a review from marseel May 31, 2024 10:38
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking for a while that the cilium-agent connectivity probably shouldn't depend on k8s-apiserver availability, so I think the idea here seems reasonable. I'm not sure if there is a specific argument to keep kube-apiserver connectivity as part of the base health check.

If we are concerned that we may need to retain the previous behavior, then it makes sense to keep it optional. I'm not sure whether we need this as part of the API, in which case we should add a flag to the cilium-dbg status CLI in order to control this API parameter. Alternatively, we could just create a cilium-agent daemon flag to control this optional behaviour.

Another question I would have is: With this implementation, how can a user detect that Cilium is not connecting to the kube-apiserver? Previously it would be obvious at some point because the cilium-agent would crash. I suppose there may be metrics like cilium_event_ts for the most recent event from the control plane.. but should we add another metric to make this more obvious?

@marseel
Copy link
Contributor

marseel commented Jun 5, 2024

I've been thinking about this PR too 😅

AFAIK controller that updates connectivity to apiserver is still running here:

Name: "kubernetes",

So I would at least expect cilium-dbg status to show whether connectivity to k8s works, but a metric about the general status of Cilium connectivity to control-plane (K8s control plane / kvstore) would be nice too.

FYI, there is also a controller that is supposed to close all connections if the health check of the k8s apiserver fails:

c.controller.UpdateController("k8s-heartbeat",

so if this part works, there is also that fail-safe mechanism.

I'm not sure if there is a specific argument to keep kube-apiserver connectivity as part of the base health check.

There was some bug in the past with connectivity to kube-apiserver, that was only resolved with the restart of the agent: #20915
So while I also lean towards enabling it by default too, we still definitely need an option to retain previous behavior, just in case, for a transition period.

CC @hemanthmalla we talked about it before a little bit, I am guessing you might be interested in this too and probably have some thoughts as well. I guess you might want something similar for kvstore so potentially we might want to make it a bit more generic.

@learnitall
Copy link
Contributor

Isn't one of the assumptions users can make about the restart behavior of the cilium-agent is that connectivity will continue to work even as the cilium-agent is killed or restarted? I'm confused by the comments here because they assume there is a coupling between cilium agent availability and Pod network connectivity. Are we referring to "connectivity loss" as connectivity that is out of sync with user configured state in the k8s apiserver?

@rouke-broersma
Copy link

Isn't one of the assumptions users can make about the restart behavior of the cilium-agent is that connectivity will continue to work even as the cilium-agent is killed or restarted? I'm confused by the comments here because they assume there is a coupling between cilium agent availability and Pod network connectivity. Are we referring to "connectivity loss" as connectivity that is out of sync with user configured state in the k8s apiserver?

The issue is bgp advertisement. If cilium agent is offline when kube-api is unavailable, cilium also can't advertise routes to bgp peers so this effectively means kube-api availability now has impact on loadbalancer availability, for which there is no good reason.

@joestringer
Copy link
Member

Isn't one of the assumptions users can make about the restart behavior of the cilium-agent is that connectivity will continue to work even as the cilium-agent is killed or restarted?

The other aspect I'll point out is that this PR suggests that if the kube-apiserver goes down, that will force every cilium-agent instance in the cluster to also go down. Similar to how you're asserting that the dataplane should continue without the local control plane, I think it makes sense for the local control plane to continue without the central control plane.

@joestringer joestringer added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jul 18, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 18, 2024
@joestringer
Copy link
Member

What's the next step on this PR? I see a few comments suggesting to change some things. @tkna are you still working on this? Do you have any questions for reviewers?

@tkna tkna force-pushed the exclude-k8s-probe branch from 595e536 to a742568 Compare July 18, 2024 05:27
@tkna tkna requested a review from a team as a code owner July 18, 2024 05:27
@tkna tkna requested a review from pchaigno July 18, 2024 05:27
@tkna
Copy link
Contributor Author

tkna commented Jul 18, 2024

@joestringer @marseel @YutaroHayakawa @ysksuzuki @ldelossa @learnitall

I made some minor corrections.
As far as I know, the remaining discussions are as follows:

  1. whether --require-k8s-connectivity should be implemented as an API parameter or a daemon config
    I just implemented it as an API parameter for now.
    I think implementing it as an API parameter is more flexible and easy to apply/rollback since it is configured by HTTP header and we can change it per request.

  2. whether --require-k8s-connectivity should be false or true by default
    I just implemented it as default:false for now, but the discussion is still open.

So while I also lean towards enabling it by default too, we still definitely need an option to retain previous behavior, just in case, for a transition period.

In my understanding, in @marseel's opinion --require-k8s-connevtivity should be true by default at this point.

  1. whether we should generalize this mechanism (to exclude kvstore probe etc.)
    I don't think the generalization is necessary unless there is a requirement.

  2. whether we should introduce a new metric
    As @marseel mentioned, connectivity to kube-apiserver can be checked by cilium-dbg status and logs from "k8s-heartbeat" mechanism, so I don't think the new metric is necessary at least for this PR.
    (can be another PR)

@tkna tkna force-pushed the exclude-k8s-probe branch 2 times, most recently from 9ae8b67 to 44c8957 Compare July 18, 2024 07:33
@tkna tkna force-pushed the exclude-k8s-probe branch from e0fd664 to 52d8b96 Compare August 20, 2024 06:26
@ldelossa ldelossa added the area/agent Cilium agent related. label Aug 21, 2024
@tkna tkna force-pushed the exclude-k8s-probe branch from 52d8b96 to 8efc0c8 Compare August 22, 2024 05:29
@ovidiutirla
Copy link
Contributor

/test

@tkna tkna requested a review from learnitall August 27, 2024 05:32
Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@tkna
Copy link
Contributor Author

tkna commented Sep 3, 2024

@ldelossa Is there anything else I should do about this PR?

@ysksuzuki ysksuzuki removed their request for review September 17, 2024 13:29
@joestringer joestringer removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Sep 20, 2024
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this. I think it's almost ready to merge.

Paul was assigned for CLI but he has moved on to other areas, so I gave feedback from a @cilium/cli perspective.

@joestringer joestringer removed the request for review from pchaigno September 20, 2024 17:40
@tkna tkna force-pushed the exclude-k8s-probe branch from 8efc0c8 to ec24d5a Compare September 25, 2024 06:15
This PR adds `requireK8sConnectivity` flag (default: true) to exclude the K8s probe from the result of health probe.
This flag can be set by a newly added http header `require-k8s-connectivity` of /healthz cilium API and pod's /healthz endpoint.

Signed-off-by: tkna <naoki-take@cybozu.co.jp>
@tkna tkna force-pushed the exclude-k8s-probe branch from ec24d5a to 2947e19 Compare September 25, 2024 06:27
@tkna tkna requested a review from joestringer September 25, 2024 09:01
@YutaroHayakawa
Copy link
Member

/test

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@joestringer
Copy link
Member

/ci-ginkgo

@joestringer
Copy link
Member

/ci-runtime

@joestringer
Copy link
Member

/ci-integration

@joestringer joestringer merged commit 72f481f into cilium:main Sep 26, 2024
65 checks passed
@joestringer
Copy link
Member

Thanks for the contribution @tkna !

@tkna
Copy link
Contributor Author

tkna commented Sep 27, 2024

@joestringer
Do you think this comment should be modified since it is out of date?
#32724 (comment)

withoutK8sConnectivity

@joestringer
Copy link
Member

@tkna yes, that would be helpful. When this PR becomes part of a release, the release notes will point back to this PR. If a user is interested in the feature, they may open this PR and read the description.

@tkna
Copy link
Contributor Author

tkna commented Sep 30, 2024

@joestringer
#32724 (comment)
modified to requireK8sConnectivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent Cilium agent related. backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. kind/community-contribution This was a contribution made by a community member. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CFP: Avoid killing a cilium-agent when all kube-apiservers are down
9 participants