Optimize cleanup of shards, used in resharding #6085

timvisee · 2025-02-28T14:47:07Z

Optimize the background task that takes care of cleaning up shards, specifically deleting points from it that don't belong in it anymore.

This change makes the cleanup process a lot faster, which means it'll complete in less wall clock time.

The primary change is about how we select what points to delete. Before, we used a hash ring filter during the scroll request to only scroll points that don't belong into the shard anymore. Now, we simply scroll all point IDs and filter the list after the scroll request. The key thing here is that the hash ring check is very expensive. Before, the hash ring check was done a lot more often across all segments in the shard. Now we only check each point ID once.

To summarize, this PR does two things:

move hash ring check outside of scroll operation
use wait=false on all delete operations, except the last one

I don't have a fancy graph, but I do have performance numbers. This is on a single node, with 5 million points, resharding from 1 to 2 shards, with 2 or 100 segments.

Time for shard cleanup to complete:

Before this PR:
- 2 segments: 115 seconds
- 100 segments: 72 seconds
With hash ring check outside scroll operation:
- 2 segments: 35 seconds
- 100 segments: 53 seconds
With wait=false:
- 2 segments: 20 seconds
- 100 segments: 43 seconds

All Submissions:

Contributions should target the dev branch. Did you create your branch from dev?
Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass tests?
Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
Have you checked your code using cargo clippy --all --all-features command?

agourlay · 2025-02-28T16:19:07Z

Out of curiosity, why is a check using the Hashring so expensive?

generall · 2025-02-28T16:55:11Z

I am not entirely understand why number of comparisons is different. In scroll we do not check all points, but only points which are candidates to our limit+offset pool. Is that because we search top-N results in each segment independently?

generall · 2025-02-28T22:22:04Z

CPU usage before and after this fix:

timvisee · 2025-03-03T09:21:45Z

Is that because we search top-N results in each segment independently?

Yes

CPU usage before and after this fix:

Awesome! Thank you for sharing the results. The difference seems bigger than I expected.

I assume that part of the spikes in the new part of the graph are just because of indexing.

* Don't filter hash ring during scroll, but after * Only use wait=true in last delete batch while cleaning points * Link to pull request

timvisee added 2 commits February 28, 2025 15:28

Don't filter hash ring during scroll, but after

42a0f50

Only use wait=true in last delete batch while cleaning points

7084054

timvisee requested review from agourlay, generall and ffuugoo February 28, 2025 14:47

Link to pull request

e40ad9c

timvisee changed the title ~~Optimize cleanup of shards, using in resharding~~ Optimize cleanup of shards, used in resharding Feb 28, 2025

github-actions bot mentioned this pull request Feb 28, 2025

Flaky test hnsw_discover_test::hnsw_discover_precision #2973

Open

generall approved these changes Feb 28, 2025

View reviewed changes

generall merged commit 6fd3bba into dev Feb 28, 2025
17 checks passed

generall deleted the resharding-optimize-clean-shard branch February 28, 2025 17:33

timvisee added a commit that referenced this pull request Mar 21, 2025

Optimize cleanup of shards, used in resharding (#6085)

5761852

* Don't filter hash ring during scroll, but after * Only use wait=true in last delete batch while cleaning points * Link to pull request

timvisee mentioned this pull request Mar 21, 2025

Bump version to 1.13.5 #6223

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize cleanup of shards, used in resharding #6085

Optimize cleanup of shards, used in resharding #6085

Uh oh!

timvisee commented Feb 28, 2025 •

edited

Loading

Uh oh!

agourlay commented Feb 28, 2025

Uh oh!

generall commented Feb 28, 2025

Uh oh!

Uh oh!

generall commented Feb 28, 2025

Uh oh!

timvisee commented Mar 3, 2025

Uh oh!

Uh oh!

Optimize cleanup of shards, used in resharding #6085

Optimize cleanup of shards, used in resharding #6085

Uh oh!

Conversation

timvisee commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

All Submissions:

New Feature Submissions:

Uh oh!

agourlay commented Feb 28, 2025

Uh oh!

generall commented Feb 28, 2025

Uh oh!

Uh oh!

generall commented Feb 28, 2025

Uh oh!

timvisee commented Mar 3, 2025

Uh oh!

Uh oh!

timvisee commented Feb 28, 2025 •

edited

Loading