Optimize cleanup of shards, used in resharding #6085
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Optimize the background task that takes care of cleaning up shards, specifically deleting points from it that don't belong in it anymore.
This change makes the cleanup process a lot faster, which means it'll complete in less wall clock time.
The primary change is about how we select what points to delete. Before, we used a hash ring filter during the scroll request to only scroll points that don't belong into the shard anymore. Now, we simply scroll all point IDs and filter the list after the scroll request. The key thing here is that the hash ring check is very expensive. Before, the hash ring check was done a lot more often across all segments in the shard. Now we only check each point ID once.
To summarize, this PR does two things:
I don't have a fancy graph, but I do have performance numbers. This is on a single node, with 5 million points, resharding from 1 to 2 shards, with 2 or 100 segments.
Time for shard cleanup to complete:
All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
cargo +nightly fmt --all
command prior to submission?cargo clippy --all --all-features
command?