-
Notifications
You must be signed in to change notification settings - Fork 742
Description
Enhancement Task
[2023/06/16 20:44:48.545 +00:00] [INFO] [tso_keyspace_group.go:172] ["enter into SplitKeyspaceGroupByID !!!!!!!!!!!!!!!!"]
[2023/06/16 20:44:48.545 +00:00] [INFO] [tso_keyspace_group.go:192] ["SplitKeyspaceGroupByID splitParams.Keyspaces !!!!!!!!!!!!!!!!"] [splitParams.Keyspaces="[4000]"]
[2023/06/16 20:44:48.545 +00:00] [INFO] [keyspace.go:673] ["SplitKeyspaceGroupByID after PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"]
[2023/06/16 20:44:48.545 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=0]
[2023/06/16 20:44:48.561 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=120]
[2023/06/16 20:44:48.572 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=240]
[2023/06/16 20:44:48.599 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=360]
...
[2023/06/16 20:44:48.843 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=3803]
[2023/06/16 20:44:48.853 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=4008]
There are more keyspaces to go, but it neither proceed nor exit from the assignment loop.
This might be related -- the keyspaces aren't consequtives.
Environment
dev env (dev-us-east-1-f01), namespace(tidb-serverless)
Root Cause Analysis
It's observed that it's blocked on manager.metaLock.Lock(keyspace.Id).
load 128 keyspaces (meta) from etcd (/pd/{cluster id}/keyspaces/meta/)
for _, keyspace := range keyspaces {
...
manager.metaLock.Lock(keyspace.Id)
...
metaLock is created with syncutil.NewLockGroup(syncutil.WithHash(MaskKeyspaceID)),
// MaskKeyspaceID is used to hash the spaceID inside the lockGroup.
// A simple mask is applied to spaceID to use its last byte as map key,
// limiting the maximum map length to 256.
// Since keyspaceID is sequentially allocated, this can also reduce the chance
// of collision when comparing with random hashes.
func MaskKeyspaceID(id uint32) uint32 {
return id & 0xFF
}
Which means that if two keyspace ids in the loaded batch have the same last byte, it encounters deadlock. In our case, the keyspace ids loaded from keyspace meta storage isn't consecutive -- though the keyspace id is created sequentially, the stored keyspace meta might be expired or deleted as time goes.
There are holes in the keyspace range in keyspace meta space:
...
/pd/7187976276065784319/keyspaces/meta/00000862
/pd/7187976276065784319/keyspaces/meta/00000863
/pd/7187976276065784319/keyspaces/meta/00000864
/pd/7187976276065784319/keyspaces/meta/00000880
/pd/7187976276065784319/keyspaces/meta/00000899
/pd/7187976276065784319/keyspaces/meta/00000920
/pd/7187976276065784319/keyspaces/meta/00000921
...
In our case, the process stuck at MetaLock.Lock(ks-id=4520) which has the same last byte 0xA8 as the first keyspace id 4008 in this batch, because MetaLock.Lock(ks-id=4008) already acquired.