Skip to content

Conversation

KShivendu
Copy link
Member

@KShivendu KShivendu commented May 8, 2025

Force deletes to a peer should also abort all shards transfers from this peer. Otherwise, they can remain stuck there forever.

This makes the test introduced in #6505 to pass

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

@@ -461,6 +461,26 @@ impl TableOfContent {
false
}

pub async fn abort_peer_transfers(&self, peer_id: PeerId) -> CollectionResult<()> {
for collection in self.collections.read().await.values() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote to abort both to and from. Because it also does not make sense to transfer to a peer that is being removed.

Then I recommend to also explicitly abort the shard transfer task. Because removing it from the state here, does not stop the ongoing transfer task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because removing it from the state here, does not stop the ongoing transfer task

Ahh right. I didn't realise that register_abort_transfer() is local and it's not pushing it to consensus. Thanks :)

Copy link
Contributor

@ffuugoo ffuugoo May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(reposted as a separate comment)

Comment on lines 82 to 87
// If we reach here, either there are no shards or force removal is requested
if has_shards {
log::warn!("Aborting peer transfers for peer {peer_id} before force removal");
toc.abort_peer_transfers(dispatcher.clone(), access, peer_id)
.await?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this should be done differently. Instead of explicitly submitting ShardTransferOperations::Abort in the API, we should modify the code that is called when RemovePeer consensus operation is applied, and make it abort transfers as part of RemovePeer operation.

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot better! Simple and effective.

@KShivendu KShivendu requested a review from ffuugoo May 14, 2025 09:43
@timvisee timvisee merged commit 993f06a into force-delete-peer-test May 14, 2025
16 checks passed
@timvisee timvisee deleted the abort-transfer-on-force-peer-delete branch May 14, 2025 10:05
KShivendu added a commit that referenced this pull request May 15, 2025
* Test for reproducing force delete peer bug

* drop seed port

* use larger transfers to ensure killing before transfer is completed

* simplify test cases

* kill and wait for same commit before force delete

* Don't kill to avoid breaking consesnsus before force delete peer

* Abort transfer on force peer delete (#6507)

* Abort transfer on force peer delete

* ToC changes to abort internally

* Abort shard transfer using consensus not just locally

* fmt

* fix fmt

* Abort transfers internally when shard is being removed

* remove unused imports
@KShivendu KShivendu mentioned this pull request May 21, 2025
3 tasks
generall pushed a commit that referenced this pull request May 22, 2025
* Test for reproducing force delete peer bug

* drop seed port

* use larger transfers to ensure killing before transfer is completed

* simplify test cases

* kill and wait for same commit before force delete

* Don't kill to avoid breaking consesnsus before force delete peer

* Abort transfer on force peer delete (#6507)

* Abort transfer on force peer delete

* ToC changes to abort internally

* Abort shard transfer using consensus not just locally

* fmt

* fix fmt

* Abort transfers internally when shard is being removed

* remove unused imports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants