Skip to content

Conversation

borkmann
Copy link
Member

@borkmann borkmann commented Dec 9, 2020

See commit msg.

@borkmann borkmann added pending-review area/daemon Impacts operation of the Cilium daemon. area/operator Impacts the cilium-operator component area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/kube-proxy Issues related to kube-proxy (not the kube-proxy-free mode). labels Dec 9, 2020
@borkmann borkmann requested review from brb, tklauser, aanm and a team December 9, 2020 13:40
@borkmann borkmann requested review from a team as code owners December 9, 2020 13:40
@borkmann borkmann requested a review from joestringer December 9, 2020 13:40
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 9, 2020
@borkmann borkmann added the release-note/misc This PR makes changes that have no direct user impact. label Dec 9, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 9, 2020
@borkmann
Copy link
Member Author

borkmann commented Dec 9, 2020

test-me-please

Copy link
Member

@rolinh rolinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Could you please also update Hubble Relay as I think it's also affected (nodeport protection is global)? It also enables gops by default with the default listen address (see hubble-relay/cmd/serve/serve.go).

@borkmann
Copy link
Member Author

borkmann commented Dec 9, 2020

Nice catch! Could you please also update Hubble Relay as I think it's also affected (nodeport protection is global)? It also enables gops by default with the default listen address (see hubble-relay/cmd/serve/serve.go).

Ah, good point, that one fell through the cracks :/ will add.

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the overall changes LGTM, however a couple points that we need to take care:

  1. It would likely be better to Fatal if we can't initialize gops. If gops isn't running at all we won't be able to debug Cilium if we need it;
  2. Currently, the default is 127.0.0.1:0 and we are changing it to localhost:XXXX, maybe we should keep the 127.0.0.1.
  3. I don't see a "reuse port" in gops, so it is likely that we will hit something similar as #11573.

@tklauser
Copy link
Member

Marked for backport to v1.8 as this also fixes an issue with port collision between gops agent and the proxy, see #13400. I'll send a manual backport PR as cherry-picking is not straight forward.

@tklauser
Copy link
Member

1.8 backport PR: #15634

@tklauser
Copy link
Member

Apologies, I forgot to mark this as backport-pending/1.8 when opening #15634. Done so now.

aanm pushed a commit to tklauser/cilium that referenced this pull request Apr 17, 2021
[ upstream commit 7757d31 ]

Manually backported from cilium#14329 to address cilium#13400 for v1.8.

Lee reported that kube-proxy log had a warning that its bind protection
couldn't bind a specific port in the nodeport range. Turns out gops was
using this particular port already through it's auto-binding (127.0.0.1:0).
Meaning that in case gops collides with a NodePort service, we might
not be able to pull gops data from that port since either kube-proxy or
kube-proxt free variant will redirect us to the actual service instead.

Given this is rather unpredictable wrt which port the agent will bind for
gops, remap it to a fixed default port and add a user configurable knob
that allows to use a different one if necessary. Given the agent, operator,
clustermesh-apiserver and hubble-relay all start the gops listener, add
the --gops-port flag to each of them. The CNI does not have gops enabled
by default but only in debug mode hence no changes there for now given
it's unlikely being used this way in production.

Fixes: cilium#14218
Reported-by: Lee Hu via Slack
Co-authored-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
aanm pushed a commit that referenced this pull request Apr 17, 2021
[ upstream commit 7757d31 ]

Manually backported from #14329 to address #13400 for v1.8.

Lee reported that kube-proxy log had a warning that its bind protection
couldn't bind a specific port in the nodeport range. Turns out gops was
using this particular port already through it's auto-binding (127.0.0.1:0).
Meaning that in case gops collides with a NodePort service, we might
not be able to pull gops data from that port since either kube-proxy or
kube-proxt free variant will redirect us to the actual service instead.

Given this is rather unpredictable wrt which port the agent will bind for
gops, remap it to a fixed default port and add a user configurable knob
that allows to use a different one if necessary. Given the agent, operator,
clustermesh-apiserver and hubble-relay all start the gops listener, add
the --gops-port flag to each of them. The CNI does not have gops enabled
by default but only in debug mode hence no changes there for now given
it's unlikely being used this way in production.

Fixes: #14218
Reported-by: Lee Hu via Slack
Co-authored-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
EricMountain pushed a commit to DataDog/cilium that referenced this pull request Feb 21, 2022
[ upstream commit 7757d31 ]

Manually backported from cilium#14329 to address cilium#13400 for v1.8.

Lee reported that kube-proxy log had a warning that its bind protection
couldn't bind a specific port in the nodeport range. Turns out gops was
using this particular port already through it's auto-binding (127.0.0.1:0).
Meaning that in case gops collides with a NodePort service, we might
not be able to pull gops data from that port since either kube-proxy or
kube-proxt free variant will redirect us to the actual service instead.

Given this is rather unpredictable wrt which port the agent will bind for
gops, remap it to a fixed default port and add a user configurable knob
that allows to use a different one if necessary. Given the agent, operator,
clustermesh-apiserver and hubble-relay all start the gops listener, add
the --gops-port flag to each of them. The CNI does not have gops enabled
by default but only in debug mode hence no changes there for now given
it's unlikely being used this way in production.

Fixes: cilium#14218
Reported-by: Lee Hu via Slack
Co-authored-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/daemon Impacts operation of the Cilium daemon. area/kube-proxy Issues related to kube-proxy (not the kube-proxy-free mode). area/operator Impacts the cilium-operator component release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants