dnsproxy: shared_client: fix fail-safe mechanism #35589
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If a shared client exchange fell into the fail-safe timeout of one minute, but the handler loop (due to either an error, closing or a very delayed response) would write to the now reader-less channel, it would block all future progress of this shared client. Prevent that from happening by buffering the channel for the one message it will receive.
The corresponding change in cilium/dns is cilium/dns#15.
Note that this can lead to symptoms looking like a goroutine leak if sustained traffic comes from/to the same five-tuple, since each dns request is handled in its own goroutine, but they all share the same, stuck shared client.