Skip to content

Conversation

jrfastab
Copy link
Contributor

@jrfastab jrfastab commented Apr 22, 2020

Backport notes: numerous conflicts mostly in docs, testing and bpf (skb->ctx) datapath.

Pushed bpf.sha changes into commits to allow for bisecting.

Popped, * #10926 -- bpf: Preserve source identity for hairpin via stack (@tgraf) while we debug.

PR 10928 split into follow up series do not close until #11239 is also merged.
#10928 -- datapath/iptables: Masquerade hairpin traffic that traversed the stack

$ for pr in 10902 10961 10918 11008 11021 11040 10984 11015 11057 11072 10378; do contrib/backporting/set-labels.py $pr done 1.7; done

@jrfastab jrfastab requested a review from a team as a code owner April 22, 2020 14:32
@jrfastab jrfastab added backport/1.7 kind/backports This PR provides functionality previously merged into master. labels Apr 22, 2020
@jrfastab
Copy link
Contributor Author

test-me-please

@jrfastab jrfastab force-pushed the pr/v1.7-backport-2020-04-22 branch from d634254 to cc07443 Compare April 22, 2020 14:42
@jrfastab
Copy link
Contributor Author

test-me-please

@jrfastab jrfastab force-pushed the pr/v1.7-backport-2020-04-22 branch from cc07443 to 32ae305 Compare April 22, 2020 15:05
@jrfastab
Copy link
Contributor Author

test-me-please

@jrfastab jrfastab force-pushed the pr/v1.7-backport-2020-04-22 branch from 32ae305 to 4975d64 Compare April 22, 2020 15:27
@jrfastab
Copy link
Contributor Author

test-me-please

@jrfastab jrfastab force-pushed the pr/v1.7-backport-2020-04-22 branch from 4975d64 to 10a2286 Compare April 22, 2020 15:51
@jrfastab
Copy link
Contributor Author

test-me-please

@jrfastab jrfastab force-pushed the pr/v1.7-backport-2020-04-22 branch from 10a2286 to b7a86c7 Compare April 22, 2020 15:58
@jrfastab
Copy link
Contributor Author

test-me-please

@joestringer
Copy link
Member

Seems like one of the patches is relying on a code refactor:
https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/19066/execution/node/55/log/

09:02:00  build github.com/cilium/cilium/daemon: cannot load github.com/cilium/cilium/pkg/node/types: open /go/src/github.com/cilium/cilium/pkg/node/types: no such file or directory
[2020-04-22T16:02:00.010Z] make[1]: *** [clean] Error 1
[2020-04-22T16:02:00.010Z] Makefile:29: recipe for target 'clean' failed
09:02:00  make[1]: Leaving directory '/go/src/github.com/cilium/cilium/daemon'
09:02:00  make: [clean-container] Error 2 (ignored)

If it helps, we can try to trim down the backports to just the release-blocker PRs right now as we would like to get a release out with the minimal set of changes that are immediately affecting users.

@jrfastab jrfastab force-pushed the pr/v1.7-backport-2020-04-22 branch from b7a86c7 to c774079 Compare April 22, 2020 20:55
@jrfastab
Copy link
Contributor Author

test-me-please

@jrfastab
Copy link
Contributor Author

@joestringer fixed up that specific error lets see if we get anymore.

@christarazi
Copy link
Member

Looks like unit tests failed: https://travis-ci.com/github/cilium/cilium/builds/161515964

# github.com/cilium/cilium/test/k8sT
test/k8sT/Conformance.go:56:16: undefined: helpers.GetBadLogMessages
test/k8sT/Conformance.go:57:10: kubectl.ValidateListOfErrorsInLogs undefined (type *helpers.Kubectl has no field or method ValidateListOfErrorsInLogs)
make[1]: *** [govet] Error 2
make[1]: Leaving directory `/home/travis/gopath/src/github.com/cilium/cilium'
make: *** [unit-tests] Error 2

@jrfastab jrfastab force-pushed the pr/v1.7-backport-2020-04-22 branch from c774079 to f2057b6 Compare April 22, 2020 21:38
@jrfastab
Copy link
Contributor Author

test-me-please

@jrfastab jrfastab force-pushed the pr/v1.7-backport-2020-04-22 branch from f2057b6 to e724f95 Compare April 22, 2020 21:48
@jrfastab
Copy link
Contributor Author

test-me-please

@joestringer
Copy link
Member

joestringer commented Apr 22, 2020

/tmp/go-build039424075/b001/operator.test flag redefined: log_dir
panic: /tmp/go-build039424075/b001/operator.test flag redefined: log_dir
goroutine 1 [running]:
flag.(*FlagSet).Var(0xc0000d8120, 0x2758ec0, 0x3bdddd0, 0x233f717, 0x7, 0x2398cda, 0x2f)
	/home/travis/.gimme/versions/go1.13.10.linux.amd64/src/flag/flag.go:848 +0x4ae
flag.(*FlagSet).StringVar(...)
	/home/travis/.gimme/versions/go1.13.10.linux.amd64/src/flag/flag.go:751
k8s.io/klog.InitFlags(0x0)
	/home/travis/gopath/src/github.com/cilium/cilium/vendor/k8s.io/klog/klog.go:420 +0x7f
github.com/cilium/cilium/operator.init.1()
	/home/travis/gopath/src/github.com/cilium/cilium/operator/main.go:179 +0x110d
FAIL	github.com/cilium/cilium/operator	0.044s
FAIL
make: *** [unit-tests] Error 1

@jrfastab jrfastab force-pushed the pr/v1.7-backport-2020-04-22 branch from e724f95 to 0a7c567 Compare April 23, 2020 02:32
@jrfastab
Copy link
Contributor Author

test-me-please

1 similar comment
@jrfastab
Copy link
Contributor Author

test-me-please

@jrfastab
Copy link
Contributor Author

direct routing + encryption failed. Lets retry and see if its repeatable.

https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/19080/

@jrfastab
Copy link
Contributor Author

test-me-please

@jrajahalme
Copy link
Member

Known CI flake #9902

@jrajahalme
Copy link
Member

test-me-please

@jrajahalme
Copy link
Member

test-missed-k8s

@jrajahalme
Copy link
Member

Issued #11213 for a likely test flake

@jrajahalme
Copy link
Member

Hit #10256

@jrajahalme
Copy link
Member

Istio tests fail on older k8s releases until #11072 is also backported.

[ upstream commit 1958a4c ]

Add the missing file suffix (.sh) to print-node-ip calls in
Jenkinsfile. This prevents unnecessary Cilium compilation and helps
speed up test runs.

Add OSX support to 'test/print-node-ip.sh'. Use simpler 'cut' instead
of 'awk' for Linux.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 29f0b34 ]

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 5cdde10 ]

kubectl is guaranteed to be compatible with limited number of earlier
releases:

> kubectl is supported within one minor version (older or newer) of
> kube-apiserver.

So far we have been using the latest kubectl in the host (now 1.18) to
control clusters from K8s 1.11 to 1.18. This did not work any more
with 'istioctl', which complained about "kubectl not being found in
$PATH". When pairing istioctl with the same version of kubectl as the
cluster this started working again.

Downgrading kubectl in the test hosts may not be practical, but the CI
infra also supports running kubectl in the cluster's master node
(k8s1). This is triggered via the value of the Ginkgo
'cilium.kubeconfig' option. When 'cilium.kubeconfig' is non-empty, it
is assumed to be a path to a valid kubeconfig for connecting kubectl
in the host to the test cluster. When 'cilium.kubeconfig' is empty, CI
Ginkgo helpers assume that kubectl should be run on
"k8s1". 'test/vagrant-ci-start.sh' expects KUBECONFIG environment
variable to be set to the path to the file into which the kubeconfig
fetched from the test cluster should be stored. Modify
'test/vagrant-ci-start.sh' to accept undefined KUBECONFIG to signify
the need to run kubectl in the test cluster's master node (k8s1).

Finally, remove setting of both the KUBECONFIG environment variable
before calling 'vagrant-ci-start.sh' and the Ginkgo option
'cilium.kubeconfig' in 'ginkgo-kubernetes-all.Jenkinsfile' which is
used to run the CI K8s test suite on k8s versions from 1.11 to
1.17. This way we always use the kubectl installed as part of the
testing cluster itself, in the master node. This solves the
compatibility problem with istioctl and should help guard that we have
not introduced any kubectl syntax that would not be compatible with
the target k8s version.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme
Copy link
Member

test-me-please

@jrajahalme
Copy link
Member

test-missed-k8s

@jrajahalme
Copy link
Member

test-upstream-k8s

@jrajahalme
Copy link
Member

Added backport of #11072

@joestringer
Copy link
Member

Upstream test stalled downloading vagrant box:
https://jenkins.cilium.io/job/Cilium-PR-Kubernetes-Upstream/2065/execution/node/33/log/

@joestringer
Copy link
Member

test-upstream-k8s

@joestringer
Copy link
Member

joestringer commented Apr 29, 2020

Out of the four checks that are marked failing:

  1. Cilium-Ginkgo-GKE is known to fail due to [v1.7] CI: GKE target is broken #11204
  2. Cilium-Ginkgo-Test-k8s failed due to two new issues listed below.
  3. Cilium-PR-Ginkgo-Tests-K8s is the same as (2)
  4. K8s-1.17-Kernel-4.19 actually just timed out while downloading the VM images.

k8s issues hit:

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master:

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-K8s/1.13-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:430
Unable to install helm repository
Expected command: helm repo add cilium https://helm.cilium.io 
To succeed, but it failed:
Exitcode: 1 
Stdout:
 	 
Stderr:
 	 Error: Couldn't load repositories file (/home/vagrant/.helm/repository/repositories.yaml).
	 You might need to run `helm init` (or `helm init --client-only` if tiller is already installed)
	 

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-K8s/1.13-gopath/src/github.com/cilium/cilium/test/k8sT/Updates.go:121

Potentially related to the recently backported changes? This failed for both k8s runs.

EDIT: Seems similar to #10374

K8sFQDNTest Restart Cilium validate that FQDN is still working

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-K8s/1.12-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:384
Test suite timed out after 1h38m0s
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-K8s/1.12-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:539

Did the latest changes modify the timeouts?

@jrajahalme
Copy link
Member

@joestringer Need to backport #10378 as well to install Helm 3 instead of 2.x.y. I'll do it.

[ upstream commit 207a63f ]

Helm 3 is required so install it rather than Helm 2.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 19c3f5a ]

Running preflight daemonset can fail due to image pull error, if trying to pull the "cilium" image:

$ kubectl get pods --all-namespaces
kube-system   cilium-pre-flight-check-8d4qw     0/1     Init:ErrImagePull   0          56s
kube-system   cilium-pre-flight-check-p9945     0/1     Init:ErrImagePull   0          56s

$ kubectl describe ds cilium-pre-flight-check -n kube-system
Name:           cilium-pre-flight-check
Pod Template:
  Init Containers:
   clean-cilium-state:
    Image:      k8s1:5000/cilium/cilium:latest

Dev builds of the latest cilium images are named "cilium-dev":

$ docker image ls
REPOSITORY                               TAG                                        IMAGE ID            CREATED             SIZE
k8s1:5000/cilium/cilium-dev              latest                                     b8fc4648be0f        32 minutes ago      658MB

Fix this by using "cilium-dev" also for 'preflight.image' by default.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme
Copy link
Member

Timeout for test-missed-k8s is the same as in master:

        GINKGO_TIMEOUT="98m"

@jrajahalme
Copy link
Member

test-me-please

@jrajahalme
Copy link
Member

test-missed-k8s

@jrajahalme
Copy link
Member

test-upstream-k8s

@joestringer
Copy link
Member

test-docs-please

@joestringer
Copy link
Member

Only failures are known flakes #11204 , #11126. Neither are required to merge backports.

Merging. 🎉

Thanks @jrfastab @jrajahalme !

@joestringer joestringer merged commit 9bb1770 into v1.7 Apr 29, 2020
@joestringer joestringer deleted the pr/v1.7-backport-2020-04-22 branch April 29, 2020 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.