Skip to content

Conversation

timvisee
Copy link
Member

@timvisee timvisee commented Feb 28, 2025

Extension of #6074

Improve tracking of records to transfer in shard transfers, specifically for stream records and resharding stream records transfer.

The main two improvements are:

  • count records actually being transferred, not batch size
  • in case of resharding, estimate number of records to transfer based on resharding fraction

For example, if we're resharding from one to two shards, a resharding transfer will only transfer ~50% of the records, not all of them.

The current resharding implementation still filters points with the hash ring after scrolling, not during scrolling itself. Using the hash ring filter during scrolling itself is very expensive and makes the transfer significantly slower, bumping transfer time from 28 to 500 seconds.
Branch with it implemented: https://github.com/qdrant/qdrant/commits/improve-transfer-progress-tracker-hashring-filter/

Example collection cluster info during resharding transfer (now shows 250k and correct record count):

{
  "peer_id": 6294971774567105,
  "shard_count": 2,
  "local_shards": [
    {
      "shard_id": 0,
      "points_count": 500000,
      "state": "Active"
    },
    {
      "shard_id": 1,
      "points_count": 145399,
      "state": "Resharding"
    }
  ],
  "shard_transfers": [
    {
      "shard_id": 0,
      "to_shard_id": 1,
      "from": 6294971774567105,
      "to": 6294971774567105,
      "sync": true,
      "method": "resharding_stream_records",
      "comment": "Transferring records (145399/250000), started 16s ago, ETA: 11.68s"
    }
  ],
  // ...
}

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

// - shards: 4 -> 3
// points: 25/25/25/25 -> 33/33/33
// transfer fraction of each shard: 1/3 = 0.333
Ordering::Greater => 1.0 / to as f32,
Copy link
Member

@agourlay agourlay Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know that to can't be zero here?

Copy link
Member Author

@timvisee timvisee Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have 0 shards. It is rejected on a higher level, and thus you cannot shard down to 0 shards either.


pub fn add(&mut self, delta: usize) {
self.points_transferred += delta;
self.points_total = max(self.points_total, self.points_transferred);
Copy link
Member

@agourlay agourlay Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is points_total changing during the resharding?
Is it to account for some ongoing updates?

Copy link
Member Author

@timvisee timvisee Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The total count is an estimate, as it's very expensive to do exact counting here.

It therefore is expected to overshoot the estimated total about 50% of the time.

Note that since #6074 we use non-exact counting in regular shard transfers as well in which case the same might happen.

I'd prefer to update the total instead of showing some broken value like 123/100.

@timvisee timvisee merged commit b01b36b into dev Mar 4, 2025
17 checks passed
@timvisee timvisee deleted the improve-transfer-progress-tracker branch March 4, 2025 13:40
timvisee added a commit that referenced this pull request Mar 21, 2025
…6084)

* Improve transfer progress tracker, use add and set functions

* Count actually transferred points in batch

* While resharding, estimate transfer size based on shard fraction

* Remove debug assertion causing test failures

* Reformat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants