Skip to content

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented May 3, 2021

Cilium's DNS proxy listens on a port allocated by the kernel. That port can however conflict with the port used by echo-c-host-container from the connectivity checks, leading to CI flakes when run regularly.

A kernel comment on the function allocating the port gives us a way out:

* if snum is zero it means select any available local port.
* We try to allocate an odd port (and leave even ports for connect())

So using an even-numbered port for echo-c-host-container in the connectivity checks should strongly reduce the likelihood of this flake happening again. Other hostns pods already use even-numbered ports.

Fixes: #15459

Cilium's DNS proxy listens on a port allocated by the kernel. That port
can however conflict with the port used by echo-c-host-container from
the connectivity checks, leading to CI flakes when run regularly.

A kernel comment on the function allocating the port [1] gives us a way out:

    * if snum is zero it means select any available local port.
    * We try to allocate an odd port (and leave even ports for connect())

So using an even-numbered port for echo-c-host-container in the
connectivity checks should strongly reduce the likelihood of this flake
happening again. Other hostns pods already use even-numbered ports.

1 - https://elixir.bootlin.com/linux/v5.12.1/source/net/ipv4/inet_connection_sock.c#L354
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. ci/flake This is a known failure that occurs in the tree. Please investigate me! needs-backport/1.9 labels May 3, 2021
@pchaigno
Copy link
Member Author

pchaigno commented May 3, 2021

test-me-please

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Amazing find!

@pchaigno
Copy link
Member Author

pchaigno commented May 4, 2021

k8s-1.16-kernel-netnext failed with known flake #15455. Team reviews are covered. Marking as ready to merge.

christarazi added a commit to christarazi/cilium that referenced this pull request May 14, 2021
Following the same logic as cilium#15988,
we want to use an even-numbered port to reduce the likelihood that the
underlying kernel allocates a conflicting port for the nodePort.

Fixes: cilium#13071

Signed-off-by: Chris Tarazi <chris@isovalent.com>
twpayne pushed a commit that referenced this pull request May 20, 2021
Following the same logic as #15988,
we want to use an even-numbered port to reduce the likelihood that the
underlying kernel allocates a conflicting port for the nodePort.

Fixes: #13071

Signed-off-by: Chris Tarazi <chris@isovalent.com>
nbusseneau pushed a commit to nbusseneau/cilium that referenced this pull request May 21, 2021
[ upstream commit c983bd1 ]

Following the same logic as cilium#15988,
we want to use an even-numbered port to reduce the likelihood that the
underlying kernel allocates a conflicting port for the nodePort.

Fixes: cilium#13071

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
nbusseneau pushed a commit to nbusseneau/cilium that referenced this pull request May 23, 2021
[ upstream commit c983bd1 ]

Following the same logic as cilium#15988,
we want to use an even-numbered port to reduce the likelihood that the
underlying kernel allocates a conflicting port for the nodePort.

Fixes: cilium#13071

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
kaworu pushed a commit that referenced this pull request May 27, 2021
[ upstream commit c983bd1 ]

Following the same logic as #15988,
we want to use an even-numbered port to reduce the likelihood that the
underlying kernel allocates a conflicting port for the nodePort.

Fixes: #13071

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
aanm pushed a commit that referenced this pull request May 28, 2021
[ upstream commit c983bd1 ]

Following the same logic as #15988,
we want to use an even-numbered port to reduce the likelihood that the
underlying kernel allocates a conflicting port for the nodePort.

Fixes: #13071

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath: connectivity-check pods are not ready after timeout
6 participants