Skip to content

dnsproxy: shared_client: fix fail-safe mechanism #35589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bimmlerd
Copy link
Member

@bimmlerd bimmlerd commented Oct 28, 2024

If a shared client exchange fell into the fail-safe timeout of one minute, but the handler loop (due to either an error, closing or a very delayed response) would write to the now reader-less channel, it would block all future progress of this shared client. Prevent that from happening by buffering the channel for the one message it will receive.

The corresponding change in cilium/dns is cilium/dns#15.

Note that this can lead to symptoms looking like a goroutine leak if sustained traffic comes from/to the same five-tuple, since each dns request is handled in its own goroutine, but they all share the same, stuck shared client.

Cilium's DNS proxy no longer gets stuck for a specific five-tuple if an `timeout waiting for response` error is encountered.

If a shared client exchange fell into the fail-safe timeout of one
minute, but the handler loop (due to either an error, closing or a
_very_ delayed response) would write to the now reader-less channel, it
would block all future progress of this shared client. Prevent that from
happening by buffering the channel for the one message it will receive.

The corresponding, backportable change cilium/dns is cilium/dns#15.

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
@bimmlerd bimmlerd added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/fqdn Affects the FQDN policies feature needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Oct 28, 2024
@github-actions github-actions bot added the sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. label Oct 28, 2024
@bimmlerd bimmlerd marked this pull request as ready for review October 28, 2024 11:19
@bimmlerd bimmlerd requested a review from a team as a code owner October 28, 2024 11:19
@bimmlerd bimmlerd requested a review from doniacld October 28, 2024 11:19
@bimmlerd
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 28, 2024
@julianwiedmann julianwiedmann added this pull request to the merge queue Oct 29, 2024
Merged via the queue into cilium:main with commit 392821c Oct 29, 2024
70 checks passed
@bimmlerd bimmlerd deleted the pr/bimmlerd/fix-stuck-shared-client branch October 29, 2024 07:35
@joamaki joamaki mentioned this pull request Nov 5, 2024
23 tasks
@joamaki joamaki added backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. and removed needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Nov 5, 2024
@github-actions github-actions bot added backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.16 The backport for Cilium 1.16.x for this PR is in progress. labels Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/fqdn Affects the FQDN policies feature backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants