The TSO request may have a high latency after the leader changes

## Enhancement Task

If the TSO request fails, it will try to update the members to get the new leader.

https://github.com/tikv/pd/blob/41ec8dced0d363950a6541141109aaf605a6b499/client/tso_dispatcher.go#L403-L416
https://github.com/tikv/pd/blob/41ec8dced0d363950a6541141109aaf605a6b499/client/tso_dispatcher.go#L436

And there is a backoff, which the minimum time is 100ms
https://github.com/tikv/pd/blob/41ec8dced0d363950a6541141109aaf605a6b499/client/pd_service_discovery.go#L532-L556

At the same time, the request can still be put into the channel and wait for handling:
https://github.com/tikv/pd/blob/41ec8dced0d363950a6541141109aaf605a6b499/client/tso_client.go#L528

And the request might be affected by the backoff because we need to wait for the stream to be re-established.



	err = td.processRequests(stream, tsoBatchController, done)
	// If error happens during tso stream handling, reset stream and run the next trial.
	if err == nil {
	// A nil error returned by `processRequests` indicates that the request batch is started successfully.
	// In this case, the `tsoBatchController` will be put back to the pool when the request is finished
	// asynchronously (either successful or not). This infers that the current `tsoBatchController` object will
	// be asynchronously accessed after the `processRequests` call. As a result, we need to use another
	// `tsoBatchController` for collecting the next batch. Do to this, we set the `tsoBatchController` to nil so that
	// another one will be fetched from the pool at the beginning of the batching loop.
	// Otherwise, the `tsoBatchController` won't be processed in other goroutines concurrently, and it can be
	// reused in the next loop safely.
	tsoBatchController = nil
	} else {
	exit := !td.handleProcessRequestError(ctx, bo, streamURL, cancel, err)

	func (c *pdServiceDiscovery) updateMemberLoop() {
	defer c.wg.Done()

	ctx, cancel := context.WithCancel(c.ctx)
	defer cancel()
	ticker := time.NewTicker(memberUpdateInterval)
	defer ticker.Stop()

	bo := retry.InitialBackoffer(updateMemberBackOffBaseTime, updateMemberTimeout, updateMemberBackOffBaseTime)
	for {
	select {
	case <-ctx.Done():
	log.Info("[pd] exit member loop due to context canceled")
	return
	case <-ticker.C:
	case <-c.checkMembershipCh:
	}
	failpoint.Inject("skipUpdateMember", func() {
	failpoint.Continue()
	})
	if err := bo.Exec(ctx, c.updateMember); err != nil {
	log.Error("[pd] failed to update member", zap.Strings("urls", c.GetServiceURLs()), errs.ZapError(err))
	}
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The TSO request may have a high latency after the leader changes #8835

Enhancement Task

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The TSO request may have a high latency after the leader changes #8835

Description

Enhancement Task

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions