Skip to content

Conversation

thejosephstevens
Copy link
Contributor

@thejosephstevens thejosephstevens commented Mar 29, 2022

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Thanks for contributing!

As described in this issue, if you scale a node pool down to zero, agent-not-ready taints can prevent the cluster autoscaler from scaling the node pool up from zero nodes because it thinks nothing will be able to run on the node (or at least the pending app pods won't be able to).

This PR adds the ability to pass a prefix that will be prepended to the agent-not-ready taint in order for it to be ignored by the cluster autoscaler when generating sanitized node templates in order to make scaling decisions. More specifically, this allows scaling node pools up from zero in GKE, where you can not customize the command line arguments of the cluster autoscaler, and where a multi-zone node pool is made up of multiple node pools under the hood, making it very easy to end up with a "node pool" with no nodes in it if you scale below 1 node per zone.

Fixes: #19241

Add config flag to add a prefix to AgentNotReadyNodeTaint value in order to enable the taint being ignored by cluster autoscaler.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 29, 2022
@thejosephstevens thejosephstevens marked this pull request as ready for review March 29, 2022 21:22
@thejosephstevens thejosephstevens requested a review from a team March 29, 2022 21:22
@thejosephstevens thejosephstevens requested a review from a team as a code owner March 29, 2022 21:22
@thejosephstevens thejosephstevens force-pushed the support-ca-ignore-taint-prefix branch from a3de006 to 4a02e27 Compare March 30, 2022 00:33
@sayboras sayboras added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Mar 30, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 30, 2022
@sayboras sayboras changed the title add support to override agent not ready taint to be ignored by the cl… add support to override not ready taint to be ignored by the CA Mar 30, 2022
Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small questions as per below, but LGTM in general, thanks.

@@ -621,6 +622,14 @@ const (
// K8sNamespaceName is the name of the K8sNamespace option
K8sNamespaceName = "k8s-namespace"

// AgentNotReadyNodeTaintPrefixDefault hold the default value of
// AgentNotReadyNodeTaintPrefix
AgentNotReadyNodeTaintPrefixDefault = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this default param used anywhere ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant to use that since it didn't look like viper.GetString will return a default empty-string value. I'm a bit confused though as I'm trying to figure out the behaviors. For example, k8s-namespace doesn't appear to have a default value set anywhere except in bugtool. I'll start work on an edit to use that default if the config value isn't set, but if you happen to know if there's a standard way to set those defaults I'd welcome a pointer.

@@ -2999,6 +3020,7 @@ func (c *DaemonConfig) Populate() {
c.HTTP403Message = viper.GetString(HTTP403Message)
c.DisableEnvoyVersionCheck = viper.GetBool(DisableEnvoyVersionCheck)
c.K8sNamespace = viper.GetString(K8sNamespaceName)
c.AgentNotReadyNodeTaintPrefix = viper.GetString(AgentNotReadyNodeTaintPrefixName)
Copy link
Member

@sayboras sayboras Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this flag is hidden, I reckon you need to use helm extraConfig param to set value, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's right from reading, but I'm not actually sure whether that's desirable behavior. I'm still working on a test of this on my own infra to verify, so I may make changes based on how that goes (was building images on top of the 1.11.2 tag to test with).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, appreciate if you can share your testing later on.

@sayboras sayboras added the area/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. label Mar 30, 2022
@aanm
Copy link
Member

aanm commented Apr 11, 2022

let's wait until #19074 and one other PR is not open yet is merged first.

@aanm aanm self-assigned this Apr 11, 2022
@aanm aanm added the dont-merge/waiting-for-upstream Only merge once upstream library or kernel features have landed label Apr 11, 2022
@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Apr 26, 2022
@thejosephstevens thejosephstevens requested a review from a team as a code owner April 27, 2022 00:02
@thejosephstevens thejosephstevens requested a review from nebril April 27, 2022 00:02
@maintainer-s-little-helper

This comment was marked as resolved.

@sayboras sayboras requested a review from bmcustodio April 27, 2022 02:02
@aanm
Copy link
Member

aanm commented Apr 27, 2022

@thejosephstevens the PR #19074 was merged, can you rebase and squash this PR against master? Thank you

@aanm aanm removed the dont-merge/waiting-for-upstream Only merge once upstream library or kernel features have landed label Apr 27, 2022
Copy link
Contributor

@bmcustodio bmcustodio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @thejosephstevens! I have a few comments:

  1. As it currently stands, users will have to resort to extraConfig in Helm to specify agent-not-ready-taint-prefix, because a corresponding Helm value is currently not being added. It would be great if you could add this (I suggest agentNotReadyTaintPrefix so it matched the name of the option).
  2. Also, the choice being given to users seems to be between the old taint key (agent.cilium.io/not-ready) and <custom-prefix>/cilium-agent-not-ready. While this is fine and works, I wonder if we should instead give users full control over the taint key (i.e., insteadl of agent-not-ready-taint-prefix, we'd support something like agent-not-ready-taint-key). It seems to me the work is essentially done, so this shouldn't imply a big change.
  3. There is no documentation being added/updated as part of the PR. I suggest adding the Helm value and document it there, as that should be picked up by helm-docs, and at least making a mention to the new property and the reason for introducing it in this page.
  4. I also believe that we should update CI to exercise the new code path. This should amount to updating this line (and the corresponding ones in the -1.10 and -1.11 files), and setting the new option somewhere around here.

@thejosephstevens
Copy link
Contributor Author

Thanks for the PR @thejosephstevens! I have a few comments:

  1. As it currently stands, users will have to resort to extraConfig in Helm to specify agent-not-ready-taint-prefix, because a corresponding Helm value is currently not being added. It would be great if you could add this (I suggest agentNotReadyTaintPrefix so it matched the name of the option).
  2. Also, the choice being given to users seems to be between the old taint key (agent.cilium.io/not-ready) and <custom-prefix>/cilium-agent-not-ready. While this is fine and works, I wonder if we should instead give users full control over the taint key (i.e., insteadl of agent-not-ready-taint-prefix, we'd support something like agent-not-ready-taint-key). It seems to me the work is essentially done, so this shouldn't imply a big change.
  3. There is no documentation being added/updated as part of the PR. I suggest adding the Helm value and document it there, as that should be picked up by helm-docs, and at least making a mention to the new property and the reason for introducing it in this page.
  4. I also believe that we should update CI to exercise the new code path. This should amount to updating this line (and the corresponding ones in the -1.10 and -1.11 files), and setting the new option somewhere around here.

Thanks for the review @bmcustodio ! We're working through the suggestions (they look pretty straightforward - I expect we'll have something ready later today). Just a note, @XiaoxiHuang21 is taking over lead on this PR. Feel free to tag me, but you'll likely get quicker responses tagging her directly.

@thejosephstevens thejosephstevens requested review from a team as code owners April 27, 2022 21:52
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 1, 2022
@XiaoxiHuang21 XiaoxiHuang21 force-pushed the support-ca-ignore-taint-prefix branch from d4213cd to ee59e5f Compare June 1, 2022 18:59
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 1, 2022
@sayboras sayboras force-pushed the support-ca-ignore-taint-prefix branch 2 times, most recently from 682ed53 to f53e1c1 Compare June 2, 2022 03:50
@sayboras
Copy link
Member

sayboras commented Jun 2, 2022

I have forced pushed your commit here after resolving conflicts with master, as well as minor formatting changes to make CI happy. Thanks for your support.

@sayboras
Copy link
Member

sayboras commented Jun 2, 2022

/test

Job 'Cilium-PR-K8s-1.22-kernel-5.4' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Unable to download helm chart v1.11 from GitHub

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-5.4 so I can create one.

Job 'Cilium-PR-K8s-GKE' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Timed out after 33.828s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create one.

Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Timed out after 30.001s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9 so I can create one.

@sayboras sayboras requested a review from christarazi June 2, 2022 04:18
@sayboras sayboras force-pushed the support-ca-ignore-taint-prefix branch from f53e1c1 to df893fe Compare June 2, 2022 09:37
@sayboras
Copy link
Member

sayboras commented Jun 2, 2022

/test

Job 'Cilium-PR-K8s-1.22-kernel-5.4' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Timed out after 30.001s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-5.4 so I can create one.

Job 'Cilium-PR-K8s-GKE' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Timed out after 30.000s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create one.

@sayboras sayboras force-pushed the support-ca-ignore-taint-prefix branch from df893fe to 162316c Compare June 6, 2022 08:21
@sayboras
Copy link
Member

sayboras commented Jun 6, 2022

/test

Job 'Cilium-PR-K8s-1.22-kernel-5.4' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Timed out after 30.057s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-5.4 so I can create one.

If you scale a node pool down to zero nodes the Cluster Autoscaler won't
add new nodes because pending pods don't tolerate the cilium
agent-not-ready taint. If you're managing your own CA you can ignore
the cilium taint with a command line flag on the CA. However if you're
running on managed k8s (e.g. GKE), you may not have access to configure
the CA, and will need to use a custom taint prefix to be ignored by the
autoscaler when scaling up from zero nodes in a pool.

This commit adds config to add a prefix to the agent-not-ready taint so
it can be ignored by the cluster-autoscaler.

Fixes: cilium#19241

Co-authored-by: Xiaoxi Huang <Xiaoxi@ascend.io>
Signed-off-by: Joseph Stevens <thejosephstevens@gmail.com>
@sayboras sayboras force-pushed the support-ca-ignore-taint-prefix branch from 162316c to c2a455a Compare June 6, 2022 10:28
@sayboras
Copy link
Member

sayboras commented Jun 6, 2022

Forced pushed after rebase with master and fixed upgrade tests.

@sayboras
Copy link
Member

sayboras commented Jun 6, 2022

/test

@sayboras
Copy link
Member

sayboras commented Jun 6, 2022

/test-runtime

Job 'Cilium-PR-K8s-1.23-kernel-net-next' failed:

Click to show.

Test Name

K8sEgressGatewayTest tunnel vxlan with endpointRoutes enabled no egress gw policy connectivity works

Failure Output

FAIL: Expected command: kubectl exec -n kube-system log-gatherer-68qd2 -- curl --path-as-is -s -D /dev/stderr --fail --connect-timeout 5 --max-time 20 http://192.168.56.11:20080 -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'" 

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.23-kernel-net-next so I can create one.

@sayboras
Copy link
Member

sayboras commented Jun 6, 2022

/test-1.23-net-next

@jibi jibi removed their request for review June 6, 2022 13:27
@sayboras
Copy link
Member

sayboras commented Jun 6, 2022

reviews are in, required CI tests are passed. I am marking this ready to merge, please let me know if you think otherwise.

@sayboras sayboras added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 6, 2022
@joamaki joamaki merged commit c1c9716 into cilium:master Jun 7, 2022
@thejosephstevens
Copy link
Contributor Author

Thanks so much for taking this across the line @sayboras !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support custom node taint prefix to support scaling to zero nodes in GKE