Skip to content

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Apr 20, 2020

kubectl is guaranteed to be compatible with limited number of earlier
releases:

kubectl is supported within one minor version (older or newer) of
kube-apiserver.

So far we have been using the latest kubectl in the host (now 1.18) to
control clusters from K8s 1.11 to 1.18. This did not work any more
with 'istioctl', which complained about "kubectl not being found in
$PATH". When pairing istioctl with the same version of kubectl as the
cluster this started working again.

Downgrading kubectl in the test hosts may not be practical, but the CI
infra also supports running kubectl in the cluster's master node
(k8s1). This is triggered via the value of the Ginkgo
'cilium.kubeconfig' option. When 'cilium.kubeconfig' is non-empty, it
is assumed to be a path to a valid kubeconfig for connecting kubectl
in the host to the test cluster. When 'cilium.kubeconfig' is empty, CI
Ginkgo helpers assume that kubectl should be run on
"k8s1". 'test/vagrant-ci-start.sh' expects KUBECONFIG environment
variable to be set to the path to the file into which the kubeconfig
fetched from the test cluster should be stored. Modify
'test/vagrant-ci-start.sh' to accept undefined KUBECONFIG to signify
the need to run kubectl in the test cluster's master node (k8s1).

Finally, remove setting of both the KUBECONFIG environment variable
before calling 'vagrant-ci-start.sh' and the Ginkgo option
'cilium.kubeconfig' in 'ginkgo-kubernetes-all.Jenkinsfile' which is
used to run the CI K8s test suite on k8s versions from 1.11 to
1.17. This way we always use the kubectl installed as part of the
testing cluster itself, in the master node. This solves the
compatibility problem with istioctl and should help guard that we have
not introduced any kubectl syntax that would not be compatible with
the target k8s version.

Signed-off-by: Jarno Rajahalme jarno@covalent.io

@jrajahalme jrajahalme added wip release-note/ci This PR makes changes to the CI. labels Apr 20, 2020
@jrajahalme jrajahalme requested review from a team as code owners April 20, 2020 16:38
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/k8s-kubectl-not-found-debug branch 2 times, most recently from 061b97b to 720cb3d Compare April 20, 2020 16:46
@coveralls
Copy link

coveralls commented Apr 20, 2020

Coverage Status

Coverage increased (+0.01%) to 44.783% when pulling f755217 on pr/jrajahalme/k8s-kubectl-not-found-debug into 822a64e on master.

@aanm aanm marked this pull request as draft April 23, 2020 10:12
@aanm aanm removed the wip label Apr 23, 2020
@aanm
Copy link
Member

aanm commented Apr 23, 2020

this PR has been marked as a draft PR since it had a WIP label. Please click in "Ready for review" [below vvv ] once the PR is ready to be reviewed. CI will still run for draft PRs.

@jrajahalme
Copy link
Member Author

@aanm istioctl failing with K8s 1.14 and 1.15 may be due to the installed kubectl being version 1.18. When testing locally on Mac I had to downgrade kubectl to match the server version to make it work. Do you know how I could downgrade kubectl, or maybe we should do it universally for older k8s versions?

Ref:

kubectl
kubectl is supported within one minor version (older or newer) of kube-apiserver.

Example:

kube-apiserver is at 1.18
kubectl is supported at 1.19, 1.18, and 1.17

@aanm
Copy link
Member

aanm commented Apr 23, 2020

@aanm istioctl failing with K8s 1.14 and 1.15 may be due to the installed kubectl being version 1.18. When testing locally on Mac I had to downgrade kubectl to match the server version to make it work. Do you know how I could downgrade kubectl, or maybe we should do it universally for older k8s versions?

Ref:

kubectl
kubectl is supported within one minor version (older or newer) of kube-apiserver.
Example:
kube-apiserver is at 1.18
kubectl is supported at 1.19, 1.18, and 1.17

Downgrade kubectl where? in istioctl?

@jrajahalme
Copy link
Member Author

istioctl is complaining that it can't find kubectl in $PATH. My hunch is that the installed kubectl version (1.18) is found, but then fails due to some compatibility reason. I'm assuming istioctl is trying to invoke kubectl from the $PATH, but the installed kubectl version (at /usr/local/bin/kubectl) is version 1.18, even though the VM is booted with K8S_VERSION=1.1[45].

This is the output of the commands I added in this PR for debugging purposes:

time="2020-04-20T19:07:53Z" level=debug msg="running command: which kubectl"
cmd: "which kubectl" exitCode: 0 duration: 1.683695ms stdout:
/usr/local/bin/kubectl

time="2020-04-20T19:07:53Z" level=debug msg="running command: echo $PATH"
cmd: "echo $PATH" exitCode: 0 duration: 892.9µs stdout:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

time="2020-04-20T19:07:53Z" level=debug msg="running command: `which kubectl` version"
cmd: "`which kubectl` version" exitCode: 0 duration: 51.389712ms stdout:
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.10", GitCommit:"575467a0eaf3ca1f20eb86215b3bde40a5ae617a", GitTreeState:"clean", BuildDate:"2019-12-11T12:32:32Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

@jrajahalme
Copy link
Member Author

@aanm Do you know why the kubectl reports client version 1.18 on a test VM with K8S_VERSION=1.14??

@jrajahalme
Copy link
Member Author

VM start scripts shows:

11:58:16      k8s1-1.14: Setting up kubectl (1.14.10-00) ...

But then when running tests kubectl version reports client version 1.18???

@aanm
Copy link
Member

aanm commented Apr 23, 2020

VM start scripts shows:

11:58:16      k8s1-1.14: Setting up kubectl (1.14.10-00) ...

But then when running tests kubectl version reports client version 1.18???

IIRC @jrajahalme that is the version of the kubectl installed in the nodes (not VMs)

@jrajahalme
Copy link
Member Author

@aanm Exactly, istioctl is run in the host, just like any kubectl command run in the test?

@jrajahalme
Copy link
Member Author

So I conflated the VM k8s install to the host kubectl above, sorry.

@jrajahalme
Copy link
Member Author

This PR adds:

		res = kubectl.ExecShort("`which kubectl` version")

Which reports client version 1.18. It would make sense that this is coming from the host, but kubectl is initialized as:

		kubectl = helpers.CreateKubectl(helpers.K8s1VMName(), logger)

@aanm
Copy link
Member

aanm commented Apr 23, 2020

This PR adds:

		res = kubectl.ExecShort("`which kubectl` version")

Which reports client version 1.18. It would make sense that this is coming from the host, but kubectl is initialized as:

		kubectl = helpers.CreateKubectl(helpers.K8s1VMName(), logger)

@jrajahalme and isn't that VM running k8s 1.18?

@jrajahalme
Copy link
Member Author

jrajahalme commented Apr 23, 2020

@aanm I don't know, is it different from the ones started under K8S_VERSION=1.14? If so, is there a way to start K8s1VM with K8s 1.14 instead?

@jrajahalme
Copy link
Member Author

And why would we configure k8s1 and k8s2 with different versions of Kubernetes?

@jrajahalme jrajahalme force-pushed the pr/jrajahalme/k8s-kubectl-not-found-debug branch from 720cb3d to 76d2ee0 Compare April 23, 2020 23:20
@jrajahalme
Copy link
Member Author

test-missed-k8s

@jrajahalme jrajahalme force-pushed the pr/jrajahalme/k8s-kubectl-not-found-debug branch 6 times, most recently from 6abed98 to 62d5c00 Compare April 24, 2020 01:34
@jrajahalme
Copy link
Member Author

test-missed-k8s

@jrajahalme jrajahalme changed the title CI: Istio debugging test: Run kubectl in test VM for older K8s releases Apr 24, 2020
Add the missing file suffix (.sh) to print-node-ip calls in
Jenkinsfile. This prevents unnecessary Cilium compilation and helps
speed up test runs.

Add OSX support to 'test/print-node-ip.sh'. Use simpler 'cut' instead
of 'awk' for Linux.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
kubectl is guaranteed to be compatible with limited number of earlier
releases:

> kubectl is supported within one minor version (older or newer) of
> kube-apiserver.

So far we have been using the latest kubectl in the host (now 1.18) to
control clusters from K8s 1.11 to 1.18. This did not work any more
with 'istioctl', which complained about "kubectl not being found in
$PATH". When pairing istioctl with the same version of kubectl as the
cluster this started working again.

Downgrading kubectl in the test hosts may not be practical, but the CI
infra also supports running kubectl in the cluster's master node
(k8s1). This is triggered via the value of the Ginkgo
'cilium.kubeconfig' option. When 'cilium.kubeconfig' is non-empty, it
is assumed to be a path to a valid kubeconfig for connecting kubectl
in the host to the test cluster. When 'cilium.kubeconfig' is empty, CI
Ginkgo helpers assume that kubectl should be run on
"k8s1". 'test/vagrant-ci-start.sh' expects KUBECONFIG environment
variable to be set to the path to the file into which the kubeconfig
fetched from the test cluster should be stored. Modify
'test/vagrant-ci-start.sh' to accept undefined KUBECONFIG to signify
the need to run kubectl in the test cluster's master node (k8s1).

Finally, remove setting of both the KUBECONFIG environment variable
before calling 'vagrant-ci-start.sh' and the Ginkgo option
'cilium.kubeconfig' in 'ginkgo-kubernetes-all.Jenkinsfile' which is
used to run the CI K8s test suite on k8s versions from 1.11 to
1.17. This way we always use the kubectl installed as part of the
testing cluster itself, in the master node. This solves the
compatibility problem with istioctl and should help guard that we have
not introduced any kubectl syntax that would not be compatible with
the target k8s version.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/k8s-kubectl-not-found-debug branch from cb2b100 to f755217 Compare April 24, 2020 06:04
@jrajahalme
Copy link
Member Author

test-me-please

@jrajahalme
Copy link
Member Author

never-tell-me-the-odds

@jrajahalme jrajahalme added kind/bug/CI This is a bug in the testing code. priority/release-blocker labels Apr 24, 2020
@jrajahalme jrajahalme marked this pull request as ready for review April 24, 2020 06:07
Copy link
Member

@nebril nebril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job with finding the root cause of the issue!

While this change fixes the problem, I believe that going back to "the old way" of running kubectl via SSH in isn't the way to go.

SSH-specific execution code is left in the test framework exclusively to support Runtime tests. Historically, SSH to vagrant VM was unreliable and made debugging harder. As we retire Runtime tests, we should be able to delete this code with minimal pain.

I would strongly prefer fixing this by adding multiple versions of kubectl binaries on CI nodes and manipulating PATH. I can add downloading of all needed kubectl binaries into node provisioning scripts, after that this change would just need symlinking to them appropriately and setting the PATH.

Also, thanks for fixing print-node-ip bugs, I am not sure how these could have persisted for so long

@jrajahalme
Copy link
Member Author

test-me-please

@jrajahalme
Copy link
Member Author

@nebril Running K8s CI tests locally on a Mac host without Jenkins, when started with test/vagrant-local-start.sh depends on running the Ginkgo Execs in via SSH. I'd like to keep this functionality if possible.

I'll revise this PR after you have made the old kubectl versions available. We probably should use an old version for the k8s 1.11 tests in the regular ginkgo k8s test suite as well.

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!

@nebril
Copy link
Member

nebril commented Apr 27, 2020

after discussing offline let's merge this as an immediate mitigation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug/CI This is a bug in the testing code. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants