-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix etcd failure behavior when user or client context ends #12587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ncelled Connected() is being used for two use cases: 1. To wait for a particular client to be connected and reach quorum. 2. To wait for the overall agent to connect to a kvstore and reach quorum. The first use case is tied to a particular kvstore client. The second use case is not. The current implementation will block Connected() forever until isConnectedAndHasQuorum() returns success. If the context is cancelled or if the client is closed, this never happens as an error will always be returned. Change etcd's Connected() implementation to check the context and the etcd client context and return an error on the channel. This allows Watch() to return if this condition is met to properly return the Watch() call which is made against a particular client. This fixes a Watch() call never returning in case the etcd client is closed before connectivity was ever achieved. To account for the second use case, introduce an overall Connected() function which will block closing the channel until kvstore connectivity has been achieved. Signed-off-by: Thomas Graf <thomas@cilium.io>
When Watch() is called on an etcd client that is closing or the context has been cancelled then the Get() call will fail. The error handler will retry the loop with no chance of success as the context is already cancelled or the client has been closed. The retry will occur forever. The lack of sleep in the retry path will further put massive stress on the etcd rate limiter and thus will reduce the probabilit for other etcd operations to succeed in time. Fix Watch() to break out of the loop if the user or etcd client context has been cancelled. Signed-off-by: Thomas Graf <thomas@cilium.io>
Don't rely on timeout if the client is being cancelled. If the client is closing, a lock can never succeed again. Signed-off-by: Thomas Graf <thomas@cilium.io>
Don't rely on timeout of user context while waiting for initial connect if the client is closing. Signed-off-by: Thomas Graf <thomas@cilium.io>
test-me-please |
qmonnet
approved these changes
Jul 20, 2020
nebril
approved these changes
Jul 20, 2020
aanm
reviewed
Jul 20, 2020
case <-ctx.Done(): | ||
return ctx.Err() | ||
if err := <-kvstore.Client().Connected(ctx); err != nil { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional? If yes a comment would be nice to have here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this was not intentional. I'll fix it up.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind/bug
This is a bug in the Cilium logic.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-note/bug
This PR fixes an issue in a previous release of Cilium.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Several code paths which are blocking on other channels or waiting for certain conditions to be met do not properly respect the user and client context to abort.
The most serious consequence can be endless loops attempting to retry an etcd operation forever.