Skip to content

Conversation

timvisee
Copy link
Member

@timvisee timvisee commented Apr 16, 2025

Fixes on top of #6364

The main things this solves is:

  • if resharding abort fails, don't forget to mark replica as dead
  • if aborting resharding, don't forget to also cancel related shard transfers
  • first mark replica as dead, then abort resharding
  • no more replica state blinking, don't mark as active first and then dead
  • actually delete points in replica even if in dead state

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (3)

lib/collection/src/tests/points_dedup.rs:137

  • [nitpick] Consider introducing a named constant or inline documentation for the literal 'false' parameter to clarify its purpose in the call to update_local.
.update_local(op, true, HwMeasurementAcc::new(), false)

lib/collection/src/shards/shard_holder/resharding.rs:333

  • [nitpick] The literal 'true' passed into delete_local_points may benefit from a named constant or comment clarifying its semantic meaning, which can improve readability and maintainability.
.delete_local_points(filter, HwMeasurementAcc::disposable(), true) // Internal operation, no performance tracking needed

lib/collection/src/collection/mod.rs:464

  • [nitpick] The repeated use of literal 'false' for aborting resharding might be made clearer by extracting it into a well-named constant to better convey its purpose in the control flow.
abort_resharding_result = self.abort_resharding(state.key(), false).await;

Copy link
Contributor

@ffuugoo ffuugoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment regarding when to trigger abort_resharding, LGTM overall

@timvisee timvisee force-pushed the abort-resharding-on-replica-dead-timvisee branch from f3e0c73 to eb78314 Compare April 16, 2025 12:28
Copy link
Member

@KShivendu KShivendu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. I'm testing it locally rn to find any potential problems.

@timvisee
Copy link
Member Author

Thanks! Also doing extensive testing now.

@timvisee timvisee merged commit c68280c into abort-resharding-on-replica-dead Apr 16, 2025
16 checks passed
@timvisee timvisee deleted the abort-resharding-on-replica-dead-timvisee branch April 16, 2025 13:44
KShivendu added a commit that referenced this pull request Apr 16, 2025
* Abort resharding if any Resharding replica is to be marked dead

* Abort resharding before marking shard as Dead

* add comments

* Abort resharding after marking as dead (#6394)

* Force delete points when aborting resharding, even if replica is dead

* Abort resharding after marking replica as dead

* Only abort resharding if we mark related replica as dead

* Just rely on shard replica state

* Propagate shard transfer error right away

---------

Co-authored-by: Tim Visée <tim+github@visee.me>
pull bot pushed a commit to kp-forks/qdrant that referenced this pull request Apr 21, 2025
…nt#6364)

* Abort resharding if any Resharding replica is to be marked dead

* Abort resharding before marking shard as Dead

* add comments

* Abort resharding after marking as dead (qdrant#6394)

* Force delete points when aborting resharding, even if replica is dead

* Abort resharding after marking replica as dead

* Only abort resharding if we mark related replica as dead

* Just rely on shard replica state

* Propagate shard transfer error right away

---------

Co-authored-by: Tim Visée <tim+github@visee.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants