Skip to content

Conversation

leviramsey
Copy link
Contributor

Follow-on to #32743 (in the theme of #30315).

When a shard region gracefully leaves, a rebalance is triggered which results in RebalanceDone messages. Each RebalanceDone, if the shard is still in state, results in a ShardHomeDeallocated event. In the ddata coordinator, while waiting for confirmation most incoming messages are stashed, including Terminated messages.

The shard region leaving will eventually result in a Terminated for that region, which results in a ShardRegionTerminated event, which updates the state as if every remaining shard with that region as home was deallocated (those shards will have ShardHomeDeallocated in their future: the RebalanceDone processing accounts for the ShardRegionTerminated being processed before RebalanceDone.

Accordingly, if there are enough RebalanceDones in a graceful exit (note that if typed cluster sharding is in use, there are likely 1000/cluster_size RebalanceDones: more than 50 is reasonbly likely), the coordinator will spend most of the few hundred milliseconds afterwards waiting for the confirmation from ddata: the current stashing behavior likely means that Terminated is stuck behind RebalanceDones which at this point are superfluous (in terms of updating ddata) in light of the Terminated.

This change prioritizes the first Terminated of a region with shards received while waiting for a ddata update over anything unstashed after that update. Future steps could batch the Terminateds and RebalanceDones, but much of the benefit can be had from This One Trick.

@leviramsey
Copy link
Contributor Author

CI will fail until #32755 is incorporated into this branch.

Copy link
Contributor

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@johanandren johanandren merged commit a9916f3 into akka:main Jul 1, 2025
9 of 13 checks passed
@johanandren johanandren added this to the 2.10.7 milestone Jul 1, 2025
@leviramsey leviramsey deleted the fast-termination branch July 1, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants