Resharding: optimize resharding stream records transfer #6260

timvisee · 2025-03-26T16:53:56Z

Optimize the resharding stream records shard transfer function.

This applies two techniques to optimize during resharding:

preselect list of point IDs before reading point data, prevent discarding a lot of read data
oversample batch size because we only send a fraction of points, prevent transferring small batches

Before this PR, a resharding transfer would read all points from the shard, including all vector and payload data. But, only a fraction of those read points are actually transferred to the remote shard. That means that 50% (1->2 shards) or even 95% (19->20 shards) of the read data was immediately discarded. That's a waste of resources!

Reading all that data can be very expensive, especially on a cluster that has huge vectors/payloads or on a cluster that has very high memory pressure. This PR changes the logic to scroll just point IDs first. We then make a preselection of point IDs by filtering them using the resharding hash ring. Finally we retrieve vectors and payloads for exactly that preselection so that 100% of the read data will be transferred.

Along with that this now also applies oversampling to the batch size. It accounts for the points that will be filtered out by the hash ring. If we'd do a resharding transfer for going from 3 to 4 shards, only 25% of the points in the shard will be transferred. If we provide a batch size of 100, it'd mean that we'd only transfer roughly 25 points in each batch before. Now the batch size is multiplied by 4 so that we still transfer roughly 100 points in each batch.

In code I've extracted the logic for reading the batch into two separate functions. One for regular batches, and one specialized for batching using a hash ring applying the above optimizations.

Basic benchmark here.

All Submissions:

Contributions should target the dev branch. Did you create your branch from dev?
Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass tests?
Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
Have you checked your code using cargo clippy --all --all-features command?

Copilot

Pull Request Overview

This PR optimizes the resharding stream records transfer by reducing unnecessary data reads and applying an oversampling technique to preserve batch sizes when using a hash ring filter. It refactors the batching logic into two functions—one for regular transfers and one for transfers using hash rings—and applies oversampling to account for filtering out points.

Comments suppressed due to low confidence (2)

lib/collection/src/shards/forward_proxy_shard.rs:117

[nitpick] Consider rephrasing 'real number of transferred points' to 'actual number of transferred points' for clarity.

/// Returns new point offset and real number of transferred points. The new point offset can be

lib/collection/src/shards/forward_proxy_shard.rs:250

Although the design likely ensures 'new' is non-empty, it might be beneficial to assert or handle the case where the new shard list is empty to avoid a multiplication by zero in the oversample_factor calculation.

HashRingRouter::Resharding { old: _, new } => new.len(),

KShivendu

~~I think there's a small mistake~~. Rest LGTM

I wonder much speed up we will observe in practice :D

lib/collection/src/shards/forward_proxy_shard.rs

timvisee · 2025-03-27T14:27:12Z

I did some testing, results are about what I did expect.

Setup:

4 peers (all on same host)
3 shards, resharding to 4
replication factor 1
512d vectors
integer and large text payload
everything on-disk
size of collection is different per type of storage, and not super relevant

I did measure the wall clock time for the three resharding transfers that happen when resharding up.

Result:

On ramdisk:
- Before PR: 17s, 23s, 30s
- With PR: 17s, 23s, 30s
On NVMe drive:
- Before PR: 16s, 17s, 17s
- With PR: 10s, 12s, 16s
On old SATA HDD:
- Before PR: 81s, 110s, 116s
- With PR: 81s, 85s, 88s

This is definitely not a scientific test, but this naive approach already shows an improvement. I expect (but did not test) the effect to be a lot greater with networked storage and/or with high memory pressure.

bfb command to set up collection:

bfb --uri http://127.0.0.1:6334/,http://127.0.0.2:6334/,http://127.0.0.3:6334/,http://127.0.0.4:6334/ -t3 -p3 -n 500000 -d 512 --shards 3 --replication-factor 1 --indexing-threshold 0 --skip-wait-index --int-payloads 100 --text-payloads --text-payload-length 512 --timestamp-payload --retry 3 --on-disk-payload --on-disk-payload-index --on-disk-index true --on-disk-vectors true --timeout 20

* In a resharding transfer, only read vectors/payloads of relevant points * Extract batch reading logic into function * Apply oversampling in case of resharding * Apply review suggestions * Fix parsing point IDs from first scroll batch

In a resharding transfer, only read vectors/payloads of relevant points

289992c

github-actions bot mentioned this pull request Mar 26, 2025

Flaky test hnsw_discover_test::hnsw_discover_precision #2973

Open

timvisee added 2 commits March 27, 2025 09:49

Extract batch reading logic into function

d8ba6a3

Apply oversampling in case of resharding

85a128f

timvisee changed the title ~~Resharding: only read vectors/payload of points to transfer~~ Resharding: optimize resharding stream records transfer Mar 27, 2025

timvisee marked this pull request as ready for review March 27, 2025 09:44

timvisee requested review from generall, ffuugoo, coszio and Copilot March 27, 2025 09:45

Copilot AI reviewed Mar 27, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

Apply review suggestions

f54fee6

github-actions bot mentioned this pull request Mar 27, 2025

Flaky test multivector_filtrable_hnsw_test::test_multi_filterable_hnsw::case_5_recommend_eq #5902

Closed

ffuugoo approved these changes Mar 27, 2025

View reviewed changes

KShivendu approved these changes Mar 27, 2025

View reviewed changes

lib/collection/src/shards/forward_proxy_shard.rs Show resolved Hide resolved

Fix parsing point IDs from first scroll batch

b3459f5

timvisee merged commit 9382df0 into dev Mar 28, 2025
17 checks passed

timvisee deleted the resharding-only-read-points-to-transfer branch March 28, 2025 10:38

timvisee mentioned this pull request Mar 31, 2025

Bump version to 1.13.6 #6277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resharding: optimize resharding stream records transfer #6260

Resharding: optimize resharding stream records transfer #6260

Uh oh!

timvisee commented Mar 26, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

This comment was marked as resolved.

KShivendu left a comment •

edited

Loading

Uh oh!

Uh oh!

timvisee commented Mar 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Resharding: optimize resharding stream records transfer #6260

Resharding: optimize resharding stream records transfer #6260

Uh oh!

Conversation

timvisee commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

All Submissions:

New Feature Submissions:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

This comment was marked as resolved.

KShivendu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timvisee commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timvisee commented Mar 26, 2025 •

edited

Loading

KShivendu left a comment •

edited

Loading

timvisee commented Mar 27, 2025 •

edited

Loading