-
Notifications
You must be signed in to change notification settings - Fork 3.4k
hubble-relay: Return underlying connection errors when connecting to peer manager #35632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…Error for peer manager client When connecting to the hubble peer manager ensure we return the underlying connection error so it's easier to diagnose connection related problems. Currently the only error returned is the context timeout: ``` time="2024-09-18T21:13:13Z" level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local.:443" ``` This will ensure the underlying connection level error is returned when the context is cancelled. In the future we should switch to not using grpc.WithBlock at all which avoids many of these problems, but that requires more testing. In the short-term, lets set these dial options to improve things now, in-case the switch to non-blocking dials is delayed. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure users will greatly appreciate the change 🙂
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, got confused with the ClientConnBuilder
for a moment (vs ClientBuilder
). So in effect, we've been using these options to connect to Hubble peers for a long time but never did for the connection to the peer manager itself. All good, let's merge this change.
When connecting to the hubble peer manager ensure we return the underlying connection error so it's easier to diagnose connection related problems.
Currently the only error returned is the context timeout:
This will ensure the underlying connection level error is returned when the context is cancelled.
In the future we should switch to not using grpc.WithBlock at all which avoids many of these problems, but that requires more testing. In the short-term, lets set these dial options to improve things now, in-case the switch to non-blocking dials is delayed.
I'm marking for backport to v1.15 and 1.16 because the change is small and will be a huge improvement in errors that users see returned in Hubble Relay logs when they have an issue with Hubble Relay connecting to cilium agent.