CI: K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath: provided port is already allocated

## CI failure

```
/home/jenkins/workspace/Cilium-PR-K8s-1.12-net-next/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:514
cannot install connectivity-check
Expected command: kubectl apply --force=false -f /home/jenkins/workspace/Cilium-PR-K8s-1.12-net-next/src/github.com/cilium/cilium/examples/kubernetes/connectivity-check/connectivity-check-proxy.yaml 
To succeed, but it failed:
Exitcode: 1 
Err: exit status 1
Stdout:
 	 deployment.apps/echo-c created
	 deployment.apps/echo-c-host created
	 deployment.apps/pod-to-a-intra-node-proxy-egress-policy created
	 deployment.apps/pod-to-a-multi-node-proxy-egress-policy created
	 deployment.apps/pod-to-c-intra-node-proxy-ingress-policy created
	 deployment.apps/pod-to-c-multi-node-proxy-ingress-policy created
	 deployment.apps/pod-to-c-intra-node-proxy-to-proxy-policy created
	 deployment.apps/pod-to-c-multi-node-proxy-to-proxy-policy created
	 service/echo-c created
	 service/echo-c-headless created
	 service/echo-c-host-headless created
	 ciliumnetworkpolicy.cilium.io/pod-to-a-intra-node-proxy-egress-policy created
	 ciliumnetworkpolicy.cilium.io/pod-to-a-multi-node-proxy-egress-policy created
	 ciliumnetworkpolicy.cilium.io/pod-to-c-intra-node-proxy-to-proxy-policy created
	 ciliumnetworkpolicy.cilium.io/pod-to-c-multi-node-proxy-to-proxy-policy created
	 ciliumnetworkpolicy.cilium.io/echo-c created
	 deployment.apps/echo-a created
	 service/echo-a created
	 deployment.apps/echo-b created
	 deployment.apps/echo-b-host created
	 service/echo-b-host-headless created
	 
Stderr:
 	 The Service "echo-b" is invalid: spec.ports[0].nodePort: Invalid value: 31313: provided port is already allocated
	 

/home/jenkins/workspace/Cilium-PR-K8s-1.12-net-next/src/github.com/cilium/cilium/test/k8sT/Policies.go:1495
```

After investigating this, it turns out that there's a chance that Services that are created in the test suite could conflict with each other's `nodePort`. In some cases, we don't explicitly set the `nodePort` and leave it up to K8s to allocate a port between the range of `30000` and `32768`. In other cases, we do explicitly set the `nodePort`. In this test failure, we had a Service created first and was allocated a random port of `31313`, and later we deploy a Service with a set port to `31313`, causing the conflict.

The reason the conflict occured in the first place is because of a failure to clean up, which we don't have logs for, as it happened in a different block and didn't trigger an error, which we usually ignore. We only retrieve output when an error occurs in a specific block [`It()`, etc]).

I posted on the [#testing channel](https://cilium.slack.com/archives/C7PE7V806/p1599156174009900) about different approaches and I will quote the post here, in case the Slack link expires.

> I've come across a test failure that failed because two different Services conflicted on their `nodePort`. From what I can tell, a NodePort Service was allocated to port 31313 in the test suite, and then later in the test suite, a totally different NodePort Service was created, setting 31313 as the explicit NodePort. This obviously created a conflict and spit out a msg like:
> 
>  	> The Service "echo-b" is invalid: spec.ports[0].nodePort: Invalid value: 31313: provided port is already allocated
> 
> The first Service should have been deleted, but it clearly was not cleaned up, and there are no logs to explain why. We ignore errors when cleaning up to prevent the test suite from halting for "minor" reasons, but it does allow us to get into these situations, where we've potentially polluted the next test.Specifically for Services, I see two approaches to prevent this in the future:
> 
>    1) Explicitly set the nodePort for every Service we create, so that we don't get bitten by the random allocation between 30000 - 32767.
>    2) Where we set the nodePort explicitly, choose a port from outside the range (30000 - 32767), to avoid conflicts with random allocation within that range.

> Actually I just realized that this happened in the same parent Context block, and has nested Contexts within it.Essentially it looks like:
> 
> ```
> Context A {
>   BeforeAll() { Created Service with randomly allocated port 31313 }
>   AfterAll()  { Cleaned up resources, but failed on Service above } ...
>   Context B {
>     BeforeAll() { Created Service with explicit port 31313; conflict }
>     AfterAll() { 
>       Fails to clean up because above Service wasn't created
>     }
>   }
> }
> ```

Unfortunately, trying to dig into why the clean up failed would be moot, because we don't retrieve output from succeeding runs, so we still wouldn't have the reason of why it failed to clean up in the first place. We only have Context B's output (we actually only have the It() block within it). So that leaves us with the two aforementioned approaches above.

Found during https://github.com/cilium/cilium/pull/13053

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI: K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath: provided port is already allocated #13071

CI failure

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CI: K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath: provided port is already allocated #13071

Description

CI failure

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions