Skip to content

Conversation

asauber
Copy link
Member

@asauber asauber commented Feb 10, 2025

Users of Cilium have begun to report that the cilium sysdump command encounters errors with the following form:

error with exec request ... : Upgrade request required: ""

Kubernetes v1.31 made Websocket the default streaming protocol for use by kubectl1. As described in KEP-40062, the prior protocol in use, SPDY, has been deprecated since 2015, and is no longer supported by many proxies, gateways and loadbalancers. Such proxies will result in clients observing errors like the above in cases where SPDY is not supported.

This PR changes the streaming protocol used by the Cilium CLI for commands such as cilium sysdump, to use the new Websocket executor supported by client-go.

It makes use of the fallback mechanism provided by the apimachinery package, to fallback to SPDY in the case that common protocol errors are observed.

cilium-cli: enable websockets for k8s exec

@asauber asauber added the release-note/misc This PR makes changes that have no direct user impact. label Feb 10, 2025
@asauber asauber requested a review from a team as a code owner February 10, 2025 21:52
@asauber asauber requested a review from tommyp1ckles February 10, 2025 21:52
@github-actions github-actions bot added cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary labels Feb 10, 2025
@asauber asauber force-pushed the pr/asauber/websocket-exec-for-sysdump branch from bb772d4 to e2a5972 Compare February 10, 2025 22:02
@asauber
Copy link
Member Author

asauber commented Feb 10, 2025

/test

@bimmlerd
Copy link
Member

As an aside, this will also reduce the likelihood of #31067 by quite a bit 🫶

@asauber asauber added the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Feb 17, 2025
@asauber asauber force-pushed the pr/asauber/websocket-exec-for-sysdump branch from e2a5972 to db0e00e Compare February 17, 2025 20:55
@asauber asauber removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Feb 17, 2025
Users of Cilium have begun to report that the `cilium sysdump` command
encounters errors with the following form:

    error with exec request ... : Upgrade request required: ""

Kubernetes v1.31 made Websocket the default streaming protocol for use
by kubectl[1]. As described in KEP-4006[2], the prior protocol in use,
SPDY, has been deprecated since 2015, and is no longer supported by many
proxies, gateways and loadbalancers. Such proxies will result in clients
observing errors like the above in cases where SPDY is not supported.

This PR changes the streaming protocol used by the Cilium CLI for
commands such as `cilium sysdump`, to use the new Websocket executor
supported by client-go.

It makes use of the fallback mechanism provided by the apimachinery
package, to fallback to SPDY in the case that common protocol errors are
observed.

[1]: https://kubernetes.io/blog/2024/08/20/websockets-transition/
[2]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4006-transition-spdy-to-websockets

Signed-off-by: Andrew Sauber <andrew.sauber@isovalent.com>
@asauber asauber force-pushed the pr/asauber/websocket-exec-for-sysdump branch from db0e00e to 2b00ca4 Compare February 17, 2025 21:38
Users of Cilium have reported that the `cilium sysdump` command
encounters errors with the following form:

    failed to collect ...: Upgrade request required

As part of KEP-4006, a protocol translator proxy was implemented to
allow websocket-based connections to the kube-apiserver by kubectl to be
translated to SPDY-based connections to the kubelet by the
kube-apiserver, for the sake of features like kubectl port-forward.

In the case of either protocol for the backend-leg connection, we prefer
to use the websocket-based client connection to avoid errors like the
above.

This PR uses a fallback mechanism which was implemented in the
apimachinery package to support implementations in upgrading to
websocket clients, which was made default in Kubernetes 1.31.

[1]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4006-transition-spdy-to-websockets
[2]: kubernetes/kubernetes#119186

Signed-off-by: Andrew Sauber <andrew.sauber@isovalent.com>
@asauber asauber force-pushed the pr/asauber/websocket-exec-for-sysdump branch from 2b00ca4 to 88089dd Compare February 17, 2025 21:40
@asauber
Copy link
Member Author

asauber commented Feb 17, 2025

/test

@asauber
Copy link
Member Author

asauber commented Feb 18, 2025

This is hitting #37682

@gandro gandro removed the request for review from tommyp1ckles February 18, 2025 13:40
@aanm aanm added this pull request to the merge queue Feb 18, 2025
Merged via the queue into main with commit 10136c4 Feb 18, 2025
208 of 211 checks passed
@aanm aanm deleted the pr/asauber/websocket-exec-for-sysdump branch February 18, 2025 14:42
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 18, 2025
@julianwiedmann
Copy link
Member

@asauber 👋 did you have an opportunity to triage the errors in e2e-upgrade any further, prior to merge? I believe this is now causing regular flakes in CI, and needs either a fix or revert.

@asauber
Copy link
Member Author

asauber commented Mar 4, 2025

@julianwiedmann Yes, I traced the upstream websocket implementation, which has a hardcoded timeout after seeing no writes for 60 seconds. Since we switched to tcpdump writing to a file rather than stdout, the exec connection goes idle for long periods of time. This may be a fix to the issue, which I'm testing empirically. After compiling some more info I'll open an issue. If we don't have a fix or workaround a revert is in order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants