perf: prioritize handling of Terminated after updating ddata #32756
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Follow-on to #32743 (in the theme of #30315).
When a shard region gracefully leaves, a rebalance is triggered which results in
RebalanceDone
messages. EachRebalanceDone
, if the shard is still in state, results in aShardHomeDeallocated
event. In the ddata coordinator, while waiting for confirmation most incoming messages are stashed, includingTerminated
messages.The shard region leaving will eventually result in a
Terminated
for that region, which results in aShardRegionTerminated
event, which updates the state as if every remaining shard with that region as home was deallocated (those shards will haveShardHomeDeallocated
in their future: theRebalanceDone
processing accounts for theShardRegionTerminated
being processed beforeRebalanceDone
.Accordingly, if there are enough
RebalanceDone
s in a graceful exit (note that if typed cluster sharding is in use, there are likely 1000/cluster_sizeRebalanceDone
s: more than 50 is reasonbly likely), the coordinator will spend most of the few hundred milliseconds afterwards waiting for the confirmation from ddata: the current stashing behavior likely means thatTerminated
is stuck behindRebalanceDone
s which at this point are superfluous (in terms of updating ddata) in light of theTerminated
.This change prioritizes the first
Terminated
of a region with shards received while waiting for a ddata update over anything unstashed after that update. Future steps could batch theTerminated
s andRebalanceDone
s, but much of the benefit can be had from This One Trick.