Skip to content

[Flaky Test] Test cleanup after E2E tests is flaky #12785

@plkokanov

Description

@plkokanov

How to categorize this issue?

/area testing
/kind flake

Which test(s)/suite(s) are flaking:
The cleanup after all e2e tests succeed can timeout and fail. This seems to be independent of test suites.

CI link:
https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12759/pull-gardener-e2e-kind-ha-multi-node/1958445651062689792
https://prow.gardener.cloud/view/gs/gardener-prow/pr-logs/pull/gardener_gardener/12759/pull-gardener-e2e-kind-ipv6/1958465366808072192

Reason for failure:

Cleaning up...
 - secret "seed-local" deleted
 - gardenlet.seedmanagement.gardener.cloud "local" deleted
seed.core.gardener.cloud "local" deleted
error: timed out waiting for the condition on seeds/local
make[1]: *** [Makefile:418: seed-down] Error 1
Example logs from pull-gardener-e2e-kind-ha-multi-node
make[1]: Leaving directory '/home/prow/go/src/github.com/gardener/gardener'
+ export KUBECONFIG=/home/prow/go/src/github.com/gardener/gardener/example/gardener-local/kind/multi-zone/kubeconfig
+ KUBECONFIG=/home/prow/go/src/github.com/gardener/gardener/example/gardener-local/kind/multi-zone/kubeconfig
+ export_artifacts gardener-operator-local
+ cluster_name=gardener-operator-local
+ echo '> Exporting logs of kind cluster '\''gardener-operator-local'\'''
> Exporting logs of kind cluster 'gardener-operator-local'
+ kind export logs /logs/artifacts/gardener-operator-local --name gardener-operator-local
Exporting logs for cluster "gardener-operator-local" to:
/logs/artifacts/gardener-operator-local
+ echo '> Exporting events of kind cluster '\''gardener-operator-local'\'' > '\''/logs/artifacts/gardener-operator-local'\'''
> Exporting events of kind cluster 'gardener-operator-local' > '/logs/artifacts/gardener-operator-local'
+ export_events_for_cluster /logs/artifacts/gardener-operator-local
+ local dir=/logs/artifacts/gardener-operator-local/events
+ mkdir -p /logs/artifacts/gardener-operator-local/events
+ IFS=
+ read -r namespace
++ kubectl get ns -oname
++ cut -d/ -f2
+ kubectl -n default get event --sort-by=lastTimestamp
+ IFS=
+ read -r namespace
+ kubectl -n garden get event --sort-by=lastTimestamp
+ IFS=
+ read -r namespace
+ kubectl -n istio-system get event --sort-by=lastTimestamp
+ IFS=
+ read -r namespace
+ kubectl -n kube-node-lease get event --sort-by=lastTimestamp
+ IFS=
+ read -r namespace
+ kubectl -n kube-public get event --sort-by=lastTimestamp
+ IFS=
+ read -r namespace
+ kubectl -n kube-system get event --sort-by=lastTimestamp
+ IFS=
+ read -r namespace
+ kubectl -n local-path-storage get event --sort-by=lastTimestamp
+ IFS=
+ read -r namespace
+ kubectl -n registry get event --sort-by=lastTimestamp
+ IFS=
+ read -r namespace
+ kubectl -n runtime-extension-provider-local get event --sort-by=lastTimestamp
+ IFS=
+ read -r namespace
+ kubectl -n virtual-garden-istio-ingress get event --sort-by=lastTimestamp
+ IFS=
+ read -r namespace
+ export_resource_yamls_for seeds shoots bastions.operations.gardener.cloud etcds leases
+ mkdir -p /logs/artifacts
+ for resource_type in "$@"
+ echo '> Exporting Resource '\''seeds'\'' yaml > /logs/artifacts/seeds.yaml'
> Exporting Resource 'seeds' yaml > /logs/artifacts/seeds.yaml
+ echo -e '---\n# cluster name: '\''gardener-operator-local'\'''
+ kubectl get seeds -A -o yaml
error: the server doesn't have a resource type "seeds"
+ true
+ for resource_type in "$@"
+ echo '> Exporting Resource '\''shoots'\'' yaml > /logs/artifacts/shoots.yaml'
> Exporting Resource 'shoots' yaml > /logs/artifacts/shoots.yaml
+ echo -e '---\n# cluster name: '\''gardener-operator-local'\'''
+ kubectl get shoots -A -o yaml
error: the server doesn't have a resource type "shoots"
+ true
+ for resource_type in "$@"
+ echo '> Exporting Resource '\''bastions.operations.gardener.cloud'\'' yaml > /logs/artifacts/bastions.operations.gardener.cloud.yaml'
> Exporting Resource 'bastions.operations.gardener.cloud' yaml > /logs/artifacts/bastions.operations.gardener.cloud.yaml
+ echo -e '---\n# cluster name: '\''gardener-operator-local'\'''
+ kubectl get bastions.operations.gardener.cloud -A -o yaml
error: the server doesn't have a resource type "bastions"
+ true
+ for resource_type in "$@"
+ echo '> Exporting Resource '\''etcds'\'' yaml > /logs/artifacts/etcds.yaml'
> Exporting Resource 'etcds' yaml > /logs/artifacts/etcds.yaml
+ echo -e '---\n# cluster name: '\''gardener-operator-local'\'''
+ kubectl get etcds -A -o yaml
+ for resource_type in "$@"
+ echo '> Exporting Resource '\''leases'\'' yaml > /logs/artifacts/leases.yaml'
> Exporting Resource 'leases' yaml > /logs/artifacts/leases.yaml
+ echo -e '---\n# cluster name: '\''gardener-operator-local'\'''
+ kubectl get leases -A -o yaml
+ export_events_for_shoots
+ IFS=
+ read -r shoot
++ kubectl get shoot -A -o=custom-columns=namespace:metadata.namespace,name:metadata.name,id:status.technicalID --no-headers
error: the server doesn't have a resource type "shoot"
+ IFS=
+ read -r namespace
++ kubectl get ns -l gardener.cloud/role=shoot -oname
++ cut -d/ -f2
++ kubectl get ns -l export-artifacts=true -oname
++ cut -d/ -f2
+ echo '> Exporting /etc/hosts'
> Exporting /etc/hosts
+ cp /etc/hosts /logs/artifacts/gardener-operator-local/hosts
+ export_resource_yamls_for gardens extop
+ mkdir -p /logs/artifacts
+ for resource_type in "$@"
+ echo '> Exporting Resource '\''gardens'\'' yaml > /logs/artifacts/gardens.yaml'
> Exporting Resource 'gardens' yaml > /logs/artifacts/gardens.yaml
+ echo -e '---\n# cluster name: '\''gardener-operator-local'\'''
+ kubectl get gardens -A -o yaml
+ for resource_type in "$@"
+ echo '> Exporting Resource '\''extop'\'' yaml > /logs/artifacts/extop.yaml'
> Exporting Resource 'extop' yaml > /logs/artifacts/extop.yaml
+ echo -e '---\n# cluster name: '\''gardener-operator-local'\'''
+ kubectl get extop -A -o yaml
+ export KUBECONFIG=/home/prow/go/src/github.com/gardener/gardener/dev-setup/kubeconfigs/virtual-garden/kubeconfig
+ KUBECONFIG=/home/prow/go/src/github.com/gardener/gardener/dev-setup/kubeconfigs/virtual-garden/kubeconfig
+ export cluster_name=virtual-garden
+ cluster_name=virtual-garden
+ export_resource_yamls_for seeds shoots
+ mkdir -p /logs/artifacts
+ for resource_type in "$@"
+ echo '> Exporting Resource '\''seeds'\'' yaml > /logs/artifacts/seeds.yaml'
> Exporting Resource 'seeds' yaml > /logs/artifacts/seeds.yaml
+ echo -e '---\n# cluster name: '\''virtual-garden'\'''
+ kubectl get seeds -A -o yaml
+ for resource_type in "$@"
+ echo '> Exporting Resource '\''shoots'\'' yaml > /logs/artifacts/shoots.yaml'
> Exporting Resource 'shoots' yaml > /logs/artifacts/shoots.yaml
+ echo -e '---\n# cluster name: '\''virtual-garden'\'''
+ kubectl get shoots -A -o yaml
+ export_events_for_shoots
+ IFS=
+ read -r shoot
++ kubectl get shoot -A -o=custom-columns=namespace:metadata.namespace,name:metadata.name,id:status.technicalID --no-headers
+ make kind-multi-node-down
make[1]: Entering directory '/home/prow/go/src/github.com/gardener/gardener'
./hack/kind-down.sh \
	--cluster-name gardener-operator-local \
	--path-kubeconfig /home/prow/go/src/github.com/gardener/gardener/dev-setup/gardenlet/components/kubeconfigs/seed-local/kubeconfig
Deleting cluster "gardener-operator-local" ...
Deleted nodes: ["gardener-operator-local-control-plane" "gardener-operator-local-worker2" "gardener-operator-local-worker"]
# We need root privileges to clean the backup bucket directory, see https://github.com/gardener/gardener/issues/6752
docker run --rm --user root:root -v /home/prow/go/src/github.com/gardener/gardener/dev/local-backupbuckets:/dev/local-backupbuckets alpine rm -rf /dev/local-backupbuckets/garden-*
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
9824c27679d3: Pulling fs layer
9824c27679d3: Download complete
9824c27679d3: Pull complete
Digest: sha256:4bcff63911fcb4448bd4fdacec207030997caf25e9bea4045fa6c8c44de311d1
Status: Downloaded newer image for alpine:latest
make[1]: Leaving directory '/home/prow/go/src/github.com/gardener/gardener'
make: *** [Makefile:474: ci-e2e-kind-ha-multi-node] Error 2
wrapper.sh] [TEST] Test Command exit code: 2
wrapper.sh] [CLEANUP] Waiting 30 seconds for pods stopped with terminationGracePeriod:30
wrapper.sh] [CLEANUP] Cleaning up after Docker in Docker ...
72e29375b10c
e6cd3e0bbc57
b1d71d03b1bc
281c0a6a6e05
c348569b2b27
5edc1abf72dd
dd24ce24b6fb
wrapper.sh] [CLEANUP] Waiting for docker to stop for 30 seconds
Stopping Docker: dockerProgram process in pidfile '/var/run/docker-ssd.pid', 1 process(es), refused to die.
wrapper.sh] [CLEANUP] Done cleaning up after Docker in Docker.
================================================================================
wrapper.sh] Exiting 2
Logs from pull-gardener-e2e-kind-ipv6 are a bit different after a similar error occurs
Cleaning up...
 - secret "seed-local" deleted
 - gardenlet.seedmanagement.gardener.cloud "local" deleted
seed.core.gardener.cloud "local" deleted
error: timed out waiting for the condition on seeds/local
make[1]: *** [Makefile:418: seed-down] Error 1
make[1]: Leaving directory '/home/prow/go/src/github.com/gardener/gardener'
> Exporting logs of kind cluster 'gardener-operator-local'
Exporting logs for cluster "gardener-operator-local" to:
/logs/artifacts/gardener-operator-local
> Exporting events of kind cluster 'gardener-operator-local' > '/logs/artifacts/gardener-operator-local'
E0821 10:58:27.402665  969172 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.404506  969172 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.406127  969172 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.407692  969172 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.409277  969172 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
> Exporting Resource 'seeds' yaml > /logs/artifacts/seeds.yaml
E0821 10:58:27.452174  969187 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.453987  969187 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.455880  969187 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.457662  969187 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.459328  969187 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
> Exporting Resource 'shoots' yaml > /logs/artifacts/shoots.yaml
E0821 10:58:27.504707  969200 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.506327  969200 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.508136  969200 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.509666  969200 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.511181  969200 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
> Exporting Resource 'bastions.operations.gardener.cloud' yaml > /logs/artifacts/bastions.operations.gardener.cloud.yaml
E0821 10:58:27.552866  969216 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.554637  969216 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.556126  969216 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.557652  969216 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.559174  969216 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.560701  969216 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.562165  969216 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
> Exporting Resource 'etcds' yaml > /logs/artifacts/etcds.yaml
E0821 10:58:27.607676  969231 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.609450  969231 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.610989  969231 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.612534  969231 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.614150  969231 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
> Exporting Resource 'leases' yaml > /logs/artifacts/leases.yaml
E0821 10:58:27.656947  969244 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.659185  969244 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.660831  969244 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.662513  969244 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.664070  969244 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
E0821 10:58:27.705938  969259 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.707710  969259 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.709509  969259 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.711227  969259 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.712862  969259 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
E0821 10:58:27.756584  969275 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.758201  969275 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.759742  969275 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.761214  969275 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
E0821 10:58:27.762864  969275 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
> Exporting /etc/hosts
make[1]: Entering directory '/home/prow/go/src/github.com/gardener/gardener'
./hack/kind-down.sh \
	--cluster-name gardener-operator-local \
	--path-kubeconfig /home/prow/go/src/github.com/gardener/gardener/dev-setup/gardenlet/components/kubeconfigs/seed-local/kubeconfig
Deleting cluster "gardener-operator-local" ...
Deleted nodes: ["gardener-operator-local-control-plane"]
# We need root privileges to clean the backup bucket directory, see https://github.com/gardener/gardener/issues/6752
docker run --rm --user root:root -v /home/prow/go/src/github.com/gardener/gardener/dev/local-backupbuckets:/dev/local-backupbuckets alpine rm -rf /dev/local-backupbuckets/garden-*
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
9824c27679d3: Pulling fs layer
9824c27679d3: Verifying Checksum
9824c27679d3: Download complete
9824c27679d3: Pull complete
Digest: sha256:4bcff63911fcb4448bd4fdacec207030997caf25e9bea4045fa6c8c44de311d1
Status: Downloaded newer image for alpine:latest
make[1]: Leaving directory '/home/prow/go/src/github.com/gardener/gardener'
make: *** [Makefile:468: ci-e2e-kind] Error 2
wrapper.sh] [TEST] Test Command exit code: 2
wrapper.sh] [CLEANUP] Waiting 30 seconds for pods stopped with terminationGracePeriod:30
wrapper.sh] [CLEANUP] Cleaning up after Docker in Docker ...
ca9df255414f
2bd736668e25
5cdedf475542
21c124024833
e012c4e87fd0
8fd67cc6f8a3
d2435eb63158
wrapper.sh] [CLEANUP] Waiting for docker to stop for 30 seconds
Stopping Docker: dockerProgram process in pidfile '/var/run/docker-ssd.pid', 1 process(es), refused to die.
wrapper.sh] [CLEANUP] Done cleaning up after Docker in Docker.
================================================================================
wrapper.sh] Exiting 2

Anything else we need to know:
N/A

Metadata

Metadata

Assignees

Labels

area/testingTesting relatedkind/flakeTracking or fixing a flaky test

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions