-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
CI failure
/home/jenkins/workspace/Cilium-PR-K8s-1.12-net-next/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:514
cannot install connectivity-check
Expected command: kubectl apply --force=false -f /home/jenkins/workspace/Cilium-PR-K8s-1.12-net-next/src/github.com/cilium/cilium/examples/kubernetes/connectivity-check/connectivity-check-proxy.yaml
To succeed, but it failed:
Exitcode: 1
Err: exit status 1
Stdout:
deployment.apps/echo-c created
deployment.apps/echo-c-host created
deployment.apps/pod-to-a-intra-node-proxy-egress-policy created
deployment.apps/pod-to-a-multi-node-proxy-egress-policy created
deployment.apps/pod-to-c-intra-node-proxy-ingress-policy created
deployment.apps/pod-to-c-multi-node-proxy-ingress-policy created
deployment.apps/pod-to-c-intra-node-proxy-to-proxy-policy created
deployment.apps/pod-to-c-multi-node-proxy-to-proxy-policy created
service/echo-c created
service/echo-c-headless created
service/echo-c-host-headless created
ciliumnetworkpolicy.cilium.io/pod-to-a-intra-node-proxy-egress-policy created
ciliumnetworkpolicy.cilium.io/pod-to-a-multi-node-proxy-egress-policy created
ciliumnetworkpolicy.cilium.io/pod-to-c-intra-node-proxy-to-proxy-policy created
ciliumnetworkpolicy.cilium.io/pod-to-c-multi-node-proxy-to-proxy-policy created
ciliumnetworkpolicy.cilium.io/echo-c created
deployment.apps/echo-a created
service/echo-a created
deployment.apps/echo-b created
deployment.apps/echo-b-host created
service/echo-b-host-headless created
Stderr:
The Service "echo-b" is invalid: spec.ports[0].nodePort: Invalid value: 31313: provided port is already allocated
/home/jenkins/workspace/Cilium-PR-K8s-1.12-net-next/src/github.com/cilium/cilium/test/k8sT/Policies.go:1495
After investigating this, it turns out that there's a chance that Services that are created in the test suite could conflict with each other's nodePort
. In some cases, we don't explicitly set the nodePort
and leave it up to K8s to allocate a port between the range of 30000
and 32768
. In other cases, we do explicitly set the nodePort
. In this test failure, we had a Service created first and was allocated a random port of 31313
, and later we deploy a Service with a set port to 31313
, causing the conflict.
The reason the conflict occured in the first place is because of a failure to clean up, which we don't have logs for, as it happened in a different block and didn't trigger an error, which we usually ignore. We only retrieve output when an error occurs in a specific block [It()
, etc]).
I posted on the #testing channel about different approaches and I will quote the post here, in case the Slack link expires.
I've come across a test failure that failed because two different Services conflicted on their
nodePort
. From what I can tell, a NodePort Service was allocated to port 31313 in the test suite, and then later in the test suite, a totally different NodePort Service was created, setting 31313 as the explicit NodePort. This obviously created a conflict and spit out a msg like:The Service "echo-b" is invalid: spec.ports[0].nodePort: Invalid value: 31313: provided port is already allocated
The first Service should have been deleted, but it clearly was not cleaned up, and there are no logs to explain why. We ignore errors when cleaning up to prevent the test suite from halting for "minor" reasons, but it does allow us to get into these situations, where we've potentially polluted the next test.Specifically for Services, I see two approaches to prevent this in the future:
- Explicitly set the nodePort for every Service we create, so that we don't get bitten by the random allocation between 30000 - 32767.
- Where we set the nodePort explicitly, choose a port from outside the range (30000 - 32767), to avoid conflicts with random allocation within that range.
Actually I just realized that this happened in the same parent Context block, and has nested Contexts within it.Essentially it looks like:
Context A { BeforeAll() { Created Service with randomly allocated port 31313 } AfterAll() { Cleaned up resources, but failed on Service above } ... Context B { BeforeAll() { Created Service with explicit port 31313; conflict } AfterAll() { Fails to clean up because above Service wasn't created } } }
Unfortunately, trying to dig into why the clean up failed would be moot, because we don't retrieve output from succeeding runs, so we still wouldn't have the reason of why it failed to clean up in the first place. We only have Context B's output (we actually only have the It() block within it). So that leaves us with the two aforementioned approaches above.
Found during #13053