-
Notifications
You must be signed in to change notification settings - Fork 742
api: support to query whether pd has loaded region #8749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: lhy1024 <admin@liudos.us>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8749 +/- ##
==========================================
- Coverage 76.26% 76.22% -0.04%
==========================================
Files 461 462 +1
Lines 70392 70429 +37
==========================================
+ Hits 53682 53687 +5
- Misses 13356 13383 +27
- Partials 3354 3359 +5
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
pkg/storage/storage.go
Outdated
regionLoadedFromDefault bool | ||
regionLoadedFromStorage bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to distinguish these two kinds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to avoid problems after ps.useRegionStorage switch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using string?
regionLoadedFrom
: "", "etcd", "leveldb"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, we just have these two ways now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer just using a single field to store this info like:
type regionSource int
const (
unloaded regionSource = iota
etcd
leveldb
)
server/api/status.go
Outdated
@@ -39,11 +40,13 @@ func newStatusHandler(svr *server.Server, rd *render.Render) *statusHandler { | |||
// @Success 200 {object} versioninfo.Status | |||
// @Router /status [get] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether it is suitable. ref #8748 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about /member or /health?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/members return all member info according etcd.ListEtcdMembers
/health is to query the status of each node through the /ping interface to return information about all the nodes, if we don't add a new api, we won't be able to get the result of whether the load region is complete or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status is more like some fixed information to me, at least for the current definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it under /status, which is appropriate from the name, but currently status returns compiled information.
Yes, as I said in the issue.
pkg/versioninfo/versioninfo.go
Outdated
Version string `json:"version"` | ||
GitHash string `json:"git_hash"` | ||
StartTimestamp int64 `json:"start_timestamp"` | ||
AreRegionsLoaded bool `json:"are_regions_loaded"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer to use loaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loaded bool
json:"loaded"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
pkg/versioninfo/versioninfo.go
Outdated
@@ -31,6 +31,7 @@ type Status struct { | |||
Version string `json:"version"` | |||
GitHash string `json:"git_hash"` | |||
StartTimestamp int64 `json:"start_timestamp"` | |||
Loaded bool `json:"loaded"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer RegionLoaded
pkg/storage/storage.go
Outdated
regionLoadedFromDefault bool | ||
regionLoadedFromStorage bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using string?
regionLoadedFrom
: "", "etcd", "leveldb"
pkg/storage/storage.go
Outdated
if atomic.LoadInt32(&ps.useRegionStorage) == 0 { | ||
return ps.Storage.LoadRegions(ctx, f) | ||
err := ps.Storage.LoadRegions(ctx, f) | ||
if err == nil { | ||
ps.regionLoadedFromDefault = true | ||
} | ||
return err | ||
} | ||
|
||
ps.mu.Lock() | ||
defer ps.mu.Unlock() | ||
if !ps.regionLoaded { | ||
if !ps.regionLoadedFromStorage { | ||
if err := ps.regionStorage.LoadRegions(ctx, f); err != nil { | ||
return err | ||
} | ||
ps.regionLoaded = true | ||
ps.regionLoadedFromStorage = true | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we can use *coreStorage.LoadRegions()
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't check ps.regionLoadedFromStorage
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
pkg/storage/storage.go
Outdated
useRegionStorage int32 | ||
regionLoaded bool | ||
mu syncutil.Mutex | ||
useRegionStorage int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that safe to switch between using or not using RegionStore in the master branch or the latest releases now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultUseRegionStorage is true and only check switch when it was just elected leader.
pkg/storage/storage.go
Outdated
useRegionStorage int32 | ||
regionLoaded bool | ||
mu syncutil.Mutex | ||
useRegionStorageFlag int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use bool?
Signed-off-by: lhy1024 <admin@liudos.us>
/test pull-integration-realcluster-test |
server/api/status.go
Outdated
@@ -44,6 +45,7 @@ func (h *statusHandler) GetPDStatus(w http.ResponseWriter, _ *http.Request) { | |||
GitHash: versioninfo.PDGitHash, | |||
Version: versioninfo.PDReleaseVersion, | |||
StartTimestamp: h.svr.StartTimestamp(), | |||
RegionLoaded: storage.AreRegionsLoaded(h.svr.GetStorage()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a more general field here like "Ready" which we can extend in the future if we find more conditions under which PD can't serve?
Does PDStatus API exposed by all components of disaggregated PD?
Any chance we introduce a new API /ready which we can use across all TiDB components in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Tema, thanks for your advice
Q1: A general field will be helpful to expand different scenarios, but we found that currently there is only RegionLoaded is needed. If there are other situations in the future, we tend to add new fields with more accurate meanings.
Q2: Standalone PD or Scheduling Service will expose this API(especially the RegionLoaded field), others such as TSO Service don't.
Q3: IMO, it's good we can use the same interface to query service status, but the fields may be different. The benefits of unification may not be obvious, but it is indeed a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally you want to decouple data plane (pd) from control plane (tidb-operator, tiUp), so that if you make changes to one, you don't have to upgrade other and it can start leveraging any improvements right away. That is why usually the common convention is to use /ready
across all components. This way the control-plane can have a common logic for rolling restart of all components, without need to necessarily overcustomize each of them. This is not just about PD microservices but many other components of TiDB cluster (tidb, dm, cdc, tikv). Some of them still require additional customization, but it would be nice to keep it to the minimum. Did you talk to tidb-operator team? Don't they want to have a /ready
across all components. I've herd there is a work on tidb-operator v2 or something like that. This might be a good opportunity to standartize protocol between control plane and data place as once it is out there, it is hard to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @csuzhangxc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Tema, for TiDB Operator, it's better to use a general API (like /ready
) to check the status in most cases. If this API can't meet some other special cases in the future, then we can consider to add/use some other APIs.
For this RegionLoaded case, in fact, TiDB Operator just wants to know when it can restart the next instance safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niubell can we land this one and open a separate task for a more general /ready endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niubell can we land this one and open a separate task for a more general /ready endpoint?
@Tema Agreed, we will merge this PR first. After the general /ready proposal for all components is finalized, PD will also add the /ready API, cc @zhangjinpeng87 @csuzhangxc
/hold wait for @zhangjinpeng87 |
pkg/versioninfo/versioninfo.go
Outdated
@@ -31,6 +31,7 @@ type Status struct { | |||
Version string `json:"version"` | |||
GitHash string `json:"git_hash"` | |||
StartTimestamp int64 `json:"start_timestamp"` | |||
RegionLoaded bool `json:"region_loaded"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the operator/tiup uses this API and PD doesn't provide this field? Will it block the upgrade process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this part is implemented yet, but it should be backward compatible and pass readiness check if it is not provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion, we will use /ready api directly.
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
/hold |
Let's wait for @okJiang to give an approve. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: okJiang, rleungx The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
In response to a cherrypick label: new pull request created to branch |
close #8426, close #8748 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> Signed-off-by: artem_danilov <artem.danilov@airbnb.com> Co-authored-by: lhy1024 <liuhanyang@pingcap.com> Co-authored-by: lhy1024 <admin@liudos.us> Co-authored-by: Artem Danilov <artem.danilov@airbnb.com>
What problem does this PR solve?
Issue Number: Close #8748, close #8426
What is changed and how does it work?
Check List
Tests
Release note