-
Notifications
You must be signed in to change notification settings - Fork 742
tso: fix tso proxy error propagation #9219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: artem_danilov <artem.danilov@airbnb.com>
Hi @Tema. Thanks for your PR. I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Now you can start all CI jobs with |
@rleungx @bufferflies @niubell PTAL |
/retest |
1 similar comment
/retest |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9219 +/- ##
=======================================
Coverage 76.01% 76.02%
=======================================
Files 470 470
Lines 73034 73045 +11
=======================================
+ Hits 55516 55531 +15
+ Misses 14074 14069 -5
- Partials 3444 3445 +1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, Maybe need to add intergrate test?
Signed-off-by: artem_danilov <artem.danilov@airbnb.com>
/retest |
pkg/utils/tsoutil/tso_dispatcher.go
Outdated
@@ -36,12 +37,20 @@ const ( | |||
maxMergeRequests = 10000 | |||
// DefaultTSOProxyTimeout defines the default timeout value of TSP Proxying | |||
DefaultTSOProxyTimeout = 3 * time.Second | |||
// TSOProxyStreamIdleTimeout defines how long Proxy stream will live if no request is received | |||
TSOProxyStreamIdleTimeout = 5 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This const doesn't need to be exported.
pkg/utils/tsoutil/tso_dispatcher.go
Outdated
} | ||
defer cancel() | ||
|
||
requests := make([]Request, maxMergeRequests+1) | ||
needUpdateServicePrimaryAddr := len(tsoPrimaryWatchers) > 0 && tsoPrimaryWatchers[0] != nil | ||
for { | ||
noProxyRequestsTimer := time.NewTimer(TSOProxyStreamIdleTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create a timer inside the loop? Is it possible to reuse the same timer always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let me pull it outside of the loop and just reset at this line instead
pkg/utils/tsoutil/tso_dispatcher.go
Outdated
case <-dispatcherCtx.Done(): | ||
return | ||
} | ||
tsDeadlineCh <- dl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not check dispatcherCtx.Done()
anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dispatcherCtx
can be cancelled in this function only or through the parent, which is grpc service which lifetime aligns with the processes lifetime. Previously it was parented to grpc stream, but this is exactly what this PR tries to avoid.Let me know if it still make sense to check for it, I just felt that this is no point in it anymore and cleaned up the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you mean, but in all cases involving channel sending and other potential blocking points, we are used to pairing them with select to ensure proper exit. This change makes me a bit uneasy, but if logically there is no possibility of it happening, I am okay with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It make sense. It is also worth to have it in case the passing context evolves and become possible to cancel as the code evolves. I've returned it in the other potential blocking points, but this particular line is non-blocking so I don't see any reason to have it. But please let me know if I miss something and you strongly believe we should have it on this particular line as well.
pkg/utils/tsoutil/tso_dispatcher.go
Outdated
if !loaded { | ||
log.Info("Start new tso proxy dispatcher", zap.String("forwardedHost", req.getForwardedHost())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PD's logs, we don't use capitalized initials; it's best to keep it consistent with other places. Please forgive my OCD 😄.
pkg/utils/tsoutil/tso_dispatcher.go
Outdated
} | ||
case <-dispatcherCtx.Done(): | ||
case <-noProxyRequestsTimer.C: | ||
log.Info("Close tso proxy as it is idle for a while") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
server/grpc_service.go
Outdated
var request *pdpb.TsoRequest | ||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var request *pdpb.TsoRequest | |
var err error | |
var ( | |
request *pdpb.TsoRequest | |
err error | |
) |
server/grpc_service.go
Outdated
forwarder = newTSOForwarder(stream) | ||
tsoStreamErr error | ||
// The following are tso forward stream related variables. | ||
tsoRequestProxyQueue context.Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using a name to emphasize that it is a context rather than the queue itself? Like tsoRequestProxyQueueCtx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, I forgot to rename it after changing struct to Context. Will do.
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reset LGTM.
pkg/utils/tsoutil/tso_dispatcher.go
Outdated
case <-dispatcherCtx.Done(): | ||
return | ||
} | ||
tsDeadlineCh <- dl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you mean, but in all cases involving channel sending and other potential blocking points, we are used to pairing them with select to ensure proper exit. This change makes me a bit uneasy, but if logically there is no possibility of it happening, I am okay with this change.
Signed-off-by: artem_danilov <artem.danilov@airbnb.com>
Signed-off-by: artem_danilov <artem.danilov@airbnb.com>
@bufferflies: You cannot manually add or delete the cherry pick branch category labels. It will be added automatically by bot when the PR is created. In response to adding label named type/cherry-pick-for-release-8.5. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
close tikv#9188 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request created to branch |
@JmPotato @bufferflies I'm still working on comprehensive unit tests for tso proxy server and cut a separate PR soon. |
Great, thanks for your contributions for PD. |
close tikv#9188 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request created to branch |
In response to a cherrypick label: new pull request created to branch |
close tikv#9188 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close tikv#9188 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request created to branch |
close #9188 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: 童剑 <1045931706@qq.com> Co-authored-by: Artem Danilov <artem.danilov@airbnb.com> Co-authored-by: 童剑 <1045931706@qq.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#9188 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: JmPotato <github@ipotato.me>
What problem does this PR solve?
Issue Number: Close #9188
What is changed and how does it work?
The change fixes the problems outlined in #9188 (comment)
Tested using the same way as reported in the issue and observe subsecond impact only now:
Check List
Tests
Code changes
Side effects
Related changes
pingcap/docs
/pingcap/docs-cn
:pingcap/tiup
:Release note