set explicit liveness/readiness probe timeout for deny connectivity checks #10581

danwent · 2020-03-14T22:44:42Z

examples/kubernetes/connectivity-check.yaml includes a test that is expected to result in L3 denies if Cilium is operating correctly. It validates this with liveness/readiness problems that use bash to return the negation of the value returned by curl (i.e., if curl exits with an "error", this is the correct result, and so the bash command returns 0 and the readiness/liveness probe succeeds).

The curl command has an explicit 5 second timeout, however, the liveness + readiness problems have a default 1 second timeout. This means that if curl does not exit within 1 second, kubernetes will give up on the readiness/liveness probe and declare it to have failed.

With this patch we explicitly set the readiness/liveness probe timeouts to 7 seconds, so that curl has time to have its timeout timer (set to 5 seconds) to trigger. This allows the probe to keep running long enough for the curl command to return a non-zero exit code, which because of the bash negation, will cause the probe to succeed.

Note: it is not clear to me why the lack of this explicit timeout does not cause issues in all k8s environments, but it seems like the failures only happen in specific environments. However, in these specific environments, the failures happen reliably. It may be due to differences in DNS or other configuration in those environments. For examples, EKS with bottlerocket OS (https://github.com/weaveworks/eksctl/blob/master/examples/20-bottlerocket.yaml) shows this behavior.

Signed-off-by: Dan Wendlandt dan@covalent.io

This change is

…y checks that expect an L3 deny Signed-off-by: Dan Wendlandt <dan@covalent.io>

maintainer-s-little-helper · 2020-03-14T22:44:44Z

Release note label not set, please set the appropriate release note.

coveralls · 2020-03-14T23:12:50Z

Coverage decreased (-0.05%) to 45.655% when pulling 93d38d2 on danwent:connectivity-check-deny-timeout into a5c0488 on cilium:master.

set explicit livenessProbe and readinessProbe timeout for connectivit…

93d38d2

…y checks that expect an L3 deny Signed-off-by: Dan Wendlandt <dan@covalent.io>

danwent requested a review from a team March 14, 2020 22:44

maintainer-s-little-helper bot added the dont-merge/needs-release-note label Mar 14, 2020

tgraf added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Mar 16, 2020

maintainer-s-little-helper bot removed the dont-merge/needs-release-note label Mar 16, 2020

tgraf merged commit 0bbb6c2 into cilium:master Mar 16, 2020

danwent added backport-done/1.7 and removed backport-done/1.7 labels Apr 29, 2020

christarazi mentioned this pull request May 8, 2020

v1.7 backports 2020-05-08 #11441

Merged

christarazi added backport-pending/1.7 and removed needs-backport/1.7 labels May 8, 2020

aanm added backport-done/1.7 and removed backport-pending/1.7 labels May 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

set explicit liveness/readiness probe timeout for deny connectivity checks #10581

set explicit liveness/readiness probe timeout for deny connectivity checks #10581

Uh oh!

danwent commented Mar 14, 2020 •

edited by tgraf

Loading

Uh oh!

maintainer-s-little-helper bot commented Mar 14, 2020

Uh oh!

coveralls commented Mar 14, 2020

Uh oh!

Uh oh!

set explicit liveness/readiness probe timeout for deny connectivity checks #10581

set explicit liveness/readiness probe timeout for deny connectivity checks #10581

Uh oh!

Conversation

danwent commented Mar 14, 2020 • edited by tgraf Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maintainer-s-little-helper bot commented Mar 14, 2020

Uh oh!

coveralls commented Mar 14, 2020

Uh oh!

Uh oh!

danwent commented Mar 14, 2020 •

edited by tgraf

Loading