-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Abort transfer on force peer delete and add test for bug #6505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 Walkthrough""" WalkthroughThis change introduces a new integration test in the consensus test suite that verifies forced deletion of a peer during active shard transfers in a distributed cluster. The test sets up a three-peer cluster with two shards and three replicas, creates a distributed collection, and inserts data to establish state. It simulates a peer failure by killing one peer process, triggers shard recovery with additional data insertion, and restarts the killed peer to initiate shard transfers. The test forcibly deletes the source peer of a shard transfer via the cluster API with a force flag and asserts that ongoing shard transfers are aborted. Additionally, the Suggested reviewers
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🔇 Additional comments (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
tests/consensus_tests/test_force_delete_peer.py (5)
11-16
: Add docstring for helper function.This function is well-implemented, but lacks documentation explaining its purpose and behavior.
def force_delete_peer(peer_api_uri: str, peer_id: int): + """ + Force deletes a peer from the cluster. + + Args: + peer_api_uri: URI of the peer API to send the request to + peer_id: ID of the peer to delete + + Raises: + AssertionError: If the request fails + """ response = requests.delete( f"{peer_api_uri}/cluster/peer/{peer_id}?force=true", ) assert response.status_code == 200, f"Failed to force delete peer: {response.text}"🧰 Tools
🪛 Ruff (0.8.2)
12-12:
requests
may be undefined, or defined from star imports(F405)
18-22
: Add docstring for get_peer_id function.Similar to the previous function, this helper function would benefit from documentation explaining its purpose and return value.
def get_peer_id(peer_api_uri: str) -> int: + """ + Gets the peer ID from the cluster info endpoint. + + Args: + peer_api_uri: URI of the peer API to send the request to + + Returns: + int: The ID of the peer + + Raises: + AssertionError: If the request fails + """ response = requests.get(f"{peer_api_uri}/cluster") assert response.status_code == 200, f"Failed to get peer ID: {response.text}" return response.json()["result"]["peer_id"]🧰 Tools
🪛 Ruff (0.8.2)
19-19:
requests
may be undefined, or defined from star imports(F405)
42-46
: Add comment explaining which peer is being killed.The comment "Kill last peer" is somewhat ambiguous. Add more context about why this specific peer is chosen and what effect it will have.
- # Now - # Kill last peer + # Kill the last peer to simulate a node failure + # This will cause its replicas to be marked as dead p = processes.pop() p.kill() sleep(1) # Give killed peer time to release WAL lock🧰 Tools
🪛 Ruff (0.8.2)
44-44:
processes
may be undefined, or defined from star imports(F405)
48-49
: Clarify the purpose of additional data insertion.The comment doesn't fully explain why inserting more points triggers recovery transfer.
- # Upsert points to mark dummy replica as dead, that will trigger recovery transfer + # Upsert points to trigger write operations that will mark the killed peer's replicas + # as dead, which will then trigger recovery transfers when the peer is restarted upsert_random_points(peer_api_uris[0], 100)
78-84
: Add assertion error message and consider using wait function.Consider adding an error message to make test failures more informative, and consider using a wait function instead of a fixed sleep to make the test more reliable.
- sleep(1) + # Wait a moment for the force delete operation to be processed + sleep(1) # We expect transfers to be aborted transfers = get_collection_cluster_info(peer_api_uris[0], COLLECTION_NAME)[ "shard_transfers" ] - assert len(transfers) == 0 + assert len(transfers) == 0, "Shard transfers were not aborted after force deleting the source peer"Consider using a wait function for more reliable testing:
def wait_for_no_transfers(peer_api_uri: str, collection_name: str, timeout: int = 10): """Wait until there are no shard transfers for the collection.""" start_time = time.time() while time.time() - start_time < timeout: transfers = get_collection_cluster_info(peer_api_uri, collection_name)["shard_transfers"] if len(transfers) == 0: return sleep(0.5) raise TimeoutError(f"Transfers were not aborted within {timeout} seconds") # Then use it instead of sleep + assert wait_for_no_transfers(peer_api_uris[0], COLLECTION_NAME)🧰 Tools
🪛 Ruff (0.8.2)
81-81:
get_collection_cluster_info
may be undefined, or defined from star imports(F405)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/consensus_tests/test_force_delete_peer.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/consensus_tests/test_force_delete_peer.py (2)
tests/consensus_tests/fixtures.py (1)
upsert_random_points
(83-129)tests/consensus_tests/utils.py (6)
assert_project_root
(110-113)start_cluster
(220-245)wait_collection_exists_and_active_on_all_peers
(753-757)start_peer
(147-180)wait_for_collection_shard_transfers_count
(616-622)get_collection_cluster_info
(294-298)
🪛 Ruff (0.8.2)
tests/consensus_tests/test_force_delete_peer.py
2-2: from .utils import *
used; unable to detect undefined names
(F403)
12-12: requests
may be undefined, or defined from star imports
(F405)
19-19: requests
may be undefined, or defined from star imports
(F405)
24-24: assert_project_root
may be undefined, or defined from star imports
(F405)
27-27: start_cluster
may be undefined, or defined from star imports
(F405)
30-30: wait_collection_exists_and_active_on_all_peers
may be undefined, or defined from star imports
(F405)
44-44: processes
may be undefined, or defined from star imports
(F405)
52-52: start_peer
may be undefined, or defined from star imports
(F405)
59-59: wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports
(F405)
61-61: get_collection_cluster_info
may be undefined, or defined from star imports
(F405)
71-71: processes
may be undefined, or defined from star imports
(F405)
72-72: Local variable url
is assigned to but never used
Remove assignment to unused variable url
(F841)
81-81: get_collection_cluster_info
may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: integration-tests-consensus
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
🔇 Additional comments (4)
tests/consensus_tests/test_force_delete_peer.py (4)
6-9
: LGTM - Clear constant definitions.The constants are well-defined with descriptive names that indicate their purpose in the test.
58-65
: LGTM - Good verification of shard transfer state.This section correctly waits for and verifies the expected shard transfer activity.
🧰 Tools
🪛 Ruff (0.8.2)
59-59:
wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports(F405)
61-61:
get_collection_cluster_info
may be undefined, or defined from star imports(F405)
75-76
: LGTM - Force deletion of peer.This section correctly implements the force deletion of the source peer, which is the key operation being tested.
1-85
:❓ Verification inconclusive
Test verifies incorrect behavior.
Based on the PR objective, this test aims to verify that "Qdrant does not abort shard transfers if the source peer is force deleted." However, the test is written to expect that transfers are aborted (line 84:
assert len(transfers) == 0
). This contradicts the stated behavior in the PR description.Please verify whether the test is:
- Correctly implementing a new desired behavior (aborting transfers when a peer is force deleted), or
- Incorrectly expecting the opposite of the current behavior described in the PR
🏁 Script executed:
#!/bin/bash # Look for any existing behavior or related issues in the codebase # Check if there are any issues or pull requests mentioning force delete and shard transfers echo "Searching for related issues or PRs:" rg -i "force delete.*shard transfer|shard transfer.*force delete" -g "*.md" -g "*.txt" # Look for any similar test cases echo "Searching for similar test cases:" rg -i "force_delete|shard_transfer" -g "*/test_*.py"Length of output: 452
Inconsistent test assertion on shard transfer behavior after force delete
The test in
tests/consensus_tests/test_force_delete_peer.py
currently asserts that shard transfers are aborted after a peer is force‐deleted:
- Line 84:
assert len(transfers) == 0However, the PR description states that Qdrant should not abort ongoing shard transfers when the source peer is force‐deleted. Please confirm the intended behavior and update the assertion accordingly (for example, expecting
len(transfers) == 1
if transfers must continue).🧰 Tools
🪛 Ruff (0.8.2)
2-2:
from .utils import *
used; unable to detect undefined names(F403)
12-12:
requests
may be undefined, or defined from star imports(F405)
19-19:
requests
may be undefined, or defined from star imports(F405)
24-24:
assert_project_root
may be undefined, or defined from star imports(F405)
27-27:
start_cluster
may be undefined, or defined from star imports(F405)
30-30:
wait_collection_exists_and_active_on_all_peers
may be undefined, or defined from star imports(F405)
44-44:
processes
may be undefined, or defined from star imports(F405)
52-52:
start_peer
may be undefined, or defined from star imports(F405)
59-59:
wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports(F405)
61-61:
get_collection_cluster_info
may be undefined, or defined from star imports(F405)
71-71:
processes
may be undefined, or defined from star imports(F405)
72-72: Local variable
url
is assigned to but never usedRemove assignment to unused variable
url
(F841)
81-81:
get_collection_cluster_info
may be undefined, or defined from star imports(F405)
# Stop the 'source' node to simulate an unreachable node | ||
source_peer_url = peer_id_to_url[from_peer_id] | ||
peer_idx = peer_api_uris.index(source_peer_url) | ||
p = processes.pop(peer_idx) | ||
url = peer_api_uris.pop(peer_idx) | ||
sleep(1) # Give killed peer time to release WAL lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle the unused variable and clarify comment.
The variable url
is assigned but never used, and the comment could be clearer about the purpose of stopping the source node.
- # Stop the 'source' node to simulate an unreachable node
+ # Stop the source node of the shard transfer to simulate a failure during transfer
source_peer_url = peer_id_to_url[from_peer_id]
peer_idx = peer_api_uris.index(source_peer_url)
p = processes.pop(peer_idx)
- url = peer_api_uris.pop(peer_idx)
+ peer_api_uris.pop(peer_idx) # Remove the URL of the killed peer from the list
sleep(1) # Give killed peer time to release WAL lock
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Stop the 'source' node to simulate an unreachable node | |
source_peer_url = peer_id_to_url[from_peer_id] | |
peer_idx = peer_api_uris.index(source_peer_url) | |
p = processes.pop(peer_idx) | |
url = peer_api_uris.pop(peer_idx) | |
sleep(1) # Give killed peer time to release WAL lock | |
# Stop the source node of the shard transfer to simulate a failure during transfer | |
source_peer_url = peer_id_to_url[from_peer_id] | |
peer_idx = peer_api_uris.index(source_peer_url) | |
p = processes.pop(peer_idx) | |
peer_api_uris.pop(peer_idx) # Remove the URL of the killed peer from the list | |
sleep(1) # Give killed peer time to release WAL lock |
🧰 Tools
🪛 Ruff (0.8.2)
71-71: processes
may be undefined, or defined from star imports
(F405)
72-72: Local variable url
is assigned to but never used
Remove assignment to unused variable url
(F841)
def test_force_delete_source_peer_during_transfers(tmp_path: pathlib.Path): | ||
assert_project_root() | ||
|
||
# seed port to reuse the same port for the restarted nodes | ||
peer_api_uris, peer_dirs, bootstrap_uri = start_cluster(tmp_path, N_PEERS, 20000) | ||
|
||
create_collection(peer_api_uris[0], shard_number=N_SHARDS, replication_factor=N_REPLICAS) | ||
wait_collection_exists_and_active_on_all_peers( | ||
collection_name=COLLECTION_NAME, peer_api_uris=peer_api_uris | ||
) | ||
|
||
peer_id_to_url = {} | ||
for peer_api_uri in peer_api_uris: | ||
peer_id = get_peer_id(peer_api_uri) | ||
peer_id_to_url[peer_id] = peer_api_uri | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add docstring to explain test purpose and steps.
The test function should include a docstring that explains its purpose and the steps it follows to test the behavior.
def test_force_delete_source_peer_during_transfers(tmp_path: pathlib.Path):
+ """
+ Tests that shard transfers are aborted when the source peer is force deleted.
+
+ Test steps:
+ 1. Start a cluster with multiple peers
+ 2. Create a collection with multiple shards and replicas
+ 3. Insert initial data
+ 4. Kill one peer to simulate failure
+ 5. Insert more data to trigger recovery transfer
+ 6. Restart the killed peer
+ 7. Wait for shard transfer to start
+ 8. Force delete the source peer of the transfer
+ 9. Verify that transfers are aborted
+ """
assert_project_root()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def test_force_delete_source_peer_during_transfers(tmp_path: pathlib.Path): | |
assert_project_root() | |
# seed port to reuse the same port for the restarted nodes | |
peer_api_uris, peer_dirs, bootstrap_uri = start_cluster(tmp_path, N_PEERS, 20000) | |
create_collection(peer_api_uris[0], shard_number=N_SHARDS, replication_factor=N_REPLICAS) | |
wait_collection_exists_and_active_on_all_peers( | |
collection_name=COLLECTION_NAME, peer_api_uris=peer_api_uris | |
) | |
peer_id_to_url = {} | |
for peer_api_uri in peer_api_uris: | |
peer_id = get_peer_id(peer_api_uri) | |
peer_id_to_url[peer_id] = peer_api_uri | |
def test_force_delete_source_peer_during_transfers(tmp_path: pathlib.Path): | |
""" | |
Tests that shard transfers are aborted when the source peer is force deleted. | |
Test steps: | |
1. Start a cluster with multiple peers | |
2. Create a collection with multiple shards and replicas | |
3. Insert initial data | |
4. Kill one peer to simulate failure | |
5. Insert more data to trigger recovery transfer | |
6. Restart the killed peer | |
7. Wait for shard transfer to start | |
8. Force delete the source peer of the transfer | |
9. Verify that transfers are aborted | |
""" | |
assert_project_root() | |
# seed port to reuse the same port for the restarted nodes | |
peer_api_uris, peer_dirs, bootstrap_uri = start_cluster(tmp_path, N_PEERS, 20000) | |
create_collection(peer_api_uris[0], shard_number=N_SHARDS, replication_factor=N_REPLICAS) | |
wait_collection_exists_and_active_on_all_peers( | |
collection_name=COLLECTION_NAME, peer_api_uris=peer_api_uris | |
) | |
peer_id_to_url = {} | |
for peer_api_uri in peer_api_uris: | |
peer_id = get_peer_id(peer_api_uri) | |
peer_id_to_url[peer_id] = peer_api_uri |
🧰 Tools
🪛 Ruff (0.8.2)
24-24: assert_project_root
may be undefined, or defined from star imports
(F405)
27-27: start_cluster
may be undefined, or defined from star imports
(F405)
30-30: wait_collection_exists_and_active_on_all_peers
may be undefined, or defined from star imports
(F405)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
tests/consensus_tests/test_force_delete_peer.py (3)
2-2
: Avoid wildcard imports.The wildcard import (
from .utils import *
) makes it difficult to track which symbols are being imported and where they come from. This can lead to naming conflicts and makes the code harder to understand and maintain.-from .utils import * +from .utils import (assert_project_root, start_cluster, start_peer, + wait_collection_exists_and_active_on_all_peers, + wait_for_collection_shard_transfers_count, + get_collection_cluster_info, processes)🧰 Tools
🪛 Ruff (0.8.2)
2-2:
from .utils import *
used; unable to detect undefined names(F403)
23-23
: Add docstring to explain test purpose and steps.The test function should include a docstring that explains its purpose and the steps it follows to test the behavior.
def test_force_delete_source_peer_during_transfers(tmp_path: pathlib.Path): + """ + Tests that shard transfers are aborted when the source peer is force deleted. + + Test steps: + 1. Start a cluster with multiple peers + 2. Create a collection with multiple shards and replicas + 3. Insert initial data + 4. Kill one peer to simulate failure + 5. Insert more data to trigger recovery transfer + 6. Restart the killed peer + 7. Wait for shard transfer to start + 8. Force delete the source peer of the transfer + 9. Verify that transfers are aborted + """ assert_project_root()
67-72
: Handle the unused variable and clarify comment.The variable
url
is assigned but never used, and the comment could be clearer about the purpose of stopping the source node.- # Stop the 'source' node to simulate an unreachable node + # Stop the source node of the shard transfer to simulate a failure during transfer source_peer_url = peer_id_to_url[from_peer_id] peer_idx = peer_api_uris.index(source_peer_url) p = processes.pop(peer_idx) - url = peer_api_uris.pop(peer_idx) + peer_api_uris.pop(peer_idx) # Remove the URL of the killed peer from the list sleep(1) # Give killed peer time to release WAL lock🧰 Tools
🪛 Ruff (0.8.2)
70-70:
processes
may be undefined, or defined from star imports(F405)
71-71: Local variable
url
is assigned to but never usedRemove assignment to unused variable
url
(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/consensus_tests/test_force_delete_peer.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/consensus_tests/test_force_delete_peer.py (2)
tests/consensus_tests/fixtures.py (1)
upsert_random_points
(83-129)tests/consensus_tests/utils.py (6)
assert_project_root
(110-113)start_cluster
(220-245)wait_collection_exists_and_active_on_all_peers
(753-757)start_peer
(147-180)wait_for_collection_shard_transfers_count
(616-622)get_collection_cluster_info
(294-298)
🪛 Ruff (0.8.2)
tests/consensus_tests/test_force_delete_peer.py
2-2: from .utils import *
used; unable to detect undefined names
(F403)
12-12: requests
may be undefined, or defined from star imports
(F405)
19-19: requests
may be undefined, or defined from star imports
(F405)
24-24: assert_project_root
may be undefined, or defined from star imports
(F405)
26-26: start_cluster
may be undefined, or defined from star imports
(F405)
29-29: wait_collection_exists_and_active_on_all_peers
may be undefined, or defined from star imports
(F405)
43-43: processes
may be undefined, or defined from star imports
(F405)
51-51: start_peer
may be undefined, or defined from star imports
(F405)
58-58: wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports
(F405)
60-60: get_collection_cluster_info
may be undefined, or defined from star imports
(F405)
70-70: processes
may be undefined, or defined from star imports
(F405)
71-71: Local variable url
is assigned to but never used
Remove assignment to unused variable url
(F841)
80-80: get_collection_cluster_info
may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consistency
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: storage-compat-test
- GitHub Check: integration-tests
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: lint
🔇 Additional comments (4)
tests/consensus_tests/test_force_delete_peer.py (4)
11-16
: Force delete peer function looks good.The function makes a DELETE request with the force flag and properly validates the response. This is the core functionality being tested.
🧰 Tools
🪛 Ruff (0.8.2)
12-12:
requests
may be undefined, or defined from star imports(F405)
18-22
: Peer ID retrieval function looks good.The function correctly retrieves and parses the peer ID from the cluster API response.
🧰 Tools
🪛 Ruff (0.8.2)
19-19:
requests
may be undefined, or defined from star imports(F405)
77-83
: Good validation of test expectation.The test correctly verifies that shard transfers are aborted after force deleting the source peer, which is the main behavior being tested.
🧰 Tools
🪛 Ruff (0.8.2)
80-80:
get_collection_cluster_info
may be undefined, or defined from star imports(F405)
38-56
: Good simulation of peer failure and recovery scenario.This section effectively simulates a real-world scenario where a peer fails and is later restarted, triggering shard transfers. The comments clearly explain the purpose of each step.
🧰 Tools
🪛 Ruff (0.8.2)
43-43:
processes
may be undefined, or defined from star imports(F405)
51-51:
start_peer
may be undefined, or defined from star imports(F405)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
tests/consensus_tests/test_force_delete_peer.py (3)
1-4
: Replace wildcard imports with explicit imports for better maintainability.Using wildcard imports (
from .utils import *
) makes it difficult to track which symbols are being imported. This can lead to naming conflicts and makes the code harder to understand and maintain.import pathlib -from .utils import * +from .utils import ( + assert_project_root, start_cluster, start_peer, processes, + wait_collection_exists_and_active_on_all_peers, + wait_for_collection_shard_transfers_count, + get_collection_cluster_info +) from .fixtures import upsert_random_points, create_collection from time import sleep +import requests🧰 Tools
🪛 Ruff (0.8.2)
2-2:
from .utils import *
used; unable to detect undefined names(F403)
23-23
: Add docstring to explain test purpose and steps.Adding a docstring would make the test function's purpose and methodology clearer to other developers.
def test_force_delete_source_peer_during_transfers(tmp_path: pathlib.Path): + """ + Tests that shard transfers are aborted when the source peer is force deleted. + + Test steps: + 1. Start a cluster with multiple peers + 2. Create a collection with multiple shards and replicas + 3. Insert initial data + 4. Kill one peer to simulate failure + 5. Insert more data to trigger recovery transfer + 6. Restart the killed peer + 7. Wait for shard transfer to start + 8. Force delete the source peer of the transfer + 9. Verify that transfers are aborted + """ assert_project_root()
66-70
: Fix unused variable and clarify comment about stopping the source node.The variable
url
is assigned but never used. Also, the comment could be more specific about why you're stopping the source node.- # Stop the 'source' node to simulate an unreachable node + # Stop the source node of the shard transfer to simulate a failure during transfer source_peer_url = peer_id_to_url[from_peer_id] peer_idx = peer_api_uris.index(source_peer_url) p = processes.pop(peer_idx) - url = peer_api_uris.pop(peer_idx) + peer_api_uris.pop(peer_idx) # Remove the URL of the killed peer from the list🧰 Tools
🪛 Ruff (0.8.2)
69-69:
processes
may be undefined, or defined from star imports(F405)
70-70: Local variable
url
is assigned to but never usedRemove assignment to unused variable
url
(F841)
🧹 Nitpick comments (3)
tests/consensus_tests/test_force_delete_peer.py (3)
63-63
: Improve assertion message with more context.Add an assertion message to make it clearer what's being tested and why the last peer is expected to be the destination.
- assert transfers[0]["to"] == list(peer_id_to_url.keys())[-1] # last peer was restarted + assert transfers[0]["to"] == list(peer_id_to_url.keys())[-1], "Transfer should be to the restarted peer"
75-75
: Replace arbitrary sleep with a more reliable wait mechanism.Using a fixed sleep duration can make tests flaky. Consider implementing a polling mechanism that waits for a specific condition or explain why the sleep is necessary in this case.
- sleep(1) + # Wait briefly for cluster state to update after peer deletion + sleep(1) # This delay is necessary to ensure cluster state is updated before checking transfers
77-80
: Add a detailed assertion message for test conclusion.Make the final assertion more informative by explaining the expected behavior.
# We expect transfers to be aborted wait_for_collection_shard_transfers_count( - peer_api_uris[0], COLLECTION_NAME, 0 + peer_api_uris[0], COLLECTION_NAME, 0, + # Verification that after force-deleting the source peer, no shard transfers remain )🧰 Tools
🪛 Ruff (0.8.2)
78-78:
wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports(F405)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/consensus_tests/test_force_delete_peer.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/consensus_tests/test_force_delete_peer.py (2)
tests/consensus_tests/fixtures.py (1)
upsert_random_points
(83-129)tests/consensus_tests/utils.py (6)
assert_project_root
(110-113)start_cluster
(220-245)wait_collection_exists_and_active_on_all_peers
(753-757)start_peer
(147-180)wait_for_collection_shard_transfers_count
(616-622)get_collection_cluster_info
(294-298)
🪛 Ruff (0.8.2)
tests/consensus_tests/test_force_delete_peer.py
2-2: from .utils import *
used; unable to detect undefined names
(F403)
12-12: requests
may be undefined, or defined from star imports
(F405)
19-19: requests
may be undefined, or defined from star imports
(F405)
24-24: assert_project_root
may be undefined, or defined from star imports
(F405)
26-26: start_cluster
may be undefined, or defined from star imports
(F405)
29-29: wait_collection_exists_and_active_on_all_peers
may be undefined, or defined from star imports
(F405)
42-42: processes
may be undefined, or defined from star imports
(F405)
50-50: start_peer
may be undefined, or defined from star imports
(F405)
57-57: wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports
(F405)
59-59: get_collection_cluster_info
may be undefined, or defined from star imports
(F405)
69-69: processes
may be undefined, or defined from star imports
(F405)
70-70: Local variable url
is assigned to but never used
Remove assignment to unused variable url
(F841)
78-78: wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: lint
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (ubuntu-latest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
tests/consensus_tests/test_force_delete_peer.py (1)
67-71
: 🛠️ Refactor suggestionAdd a timeout to the transfer abort verification.
If the transfers are not aborted as expected, the current wait function might hang indefinitely. Consider adding a timeout parameter or checking for specific failure conditions.
# We expect transfers to be aborted wait_for_collection_shard_transfers_count( - peer_api_uris[0], COLLECTION_NAME, 0 + peer_api_uris[0], COLLECTION_NAME, 0, max_wait=30 # Add a timeout of 30 seconds ) + + # Verify the peer was actually deleted + cluster_info = requests.get(f"{peer_api_uris[0]}/cluster").json()["result"] + assert from_peer_id not in [peer["id"] for peer in cluster_info["peers"]], "Source peer was not actually deleted"🧰 Tools
🪛 Ruff (0.8.2)
68-68:
wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports(F405)
♻️ Duplicate comments (2)
tests/consensus_tests/test_force_delete_peer.py (2)
1-5
: Add explicit imports for clarity and maintainability.The code uses wildcard imports and implicitly relies on the
requests
module that's imported through the star import. For better code clarity, maintainability, and to avoid potential issues if the utility module changes, use explicit imports.import pathlib from .utils import * from .fixtures import upsert_random_points, create_collection from time import sleep +import requests
Additionally, I recommend explicitly importing the specific functions you're using from
.utils
rather than using a wildcard import:-from .utils import * +from .utils import ( + assert_project_root, start_cluster, start_peer, + wait_collection_exists_and_active_on_all_peers, + wait_for_collection_shard_transfers_count, + get_collection_cluster_info, processes +)🧰 Tools
🪛 Ruff (0.8.2)
2-2:
from .utils import *
used; unable to detect undefined names(F403)
21-29
: Add a docstring to explain test purpose and steps.The test function should include a docstring that explains its purpose and the steps it follows to test the behavior.
def test_force_delete_source_peer_during_transfers(tmp_path: pathlib.Path): + """ + Tests that shard transfers are aborted when the source peer is force deleted. + + Test steps: + 1. Start a cluster with multiple peers + 2. Create a collection with multiple shards and replicas + 3. Insert initial data + 4. Kill one peer to simulate failure + 5. Insert more data to trigger recovery transfer + 6. Restart the killed peer + 7. Wait for shard transfer to start + 8. Force delete the source peer of the transfer + 9. Verify that transfers are aborted + """ assert_project_root() peer_api_uris, peer_dirs, bootstrap_uri = start_cluster(tmp_path, N_PEERS)🧰 Tools
🪛 Ruff (0.8.2)
22-22:
assert_project_root
may be undefined, or defined from star imports(F405)
24-24:
start_cluster
may be undefined, or defined from star imports(F405)
27-27:
wait_collection_exists_and_active_on_all_peers
may be undefined, or defined from star imports(F405)
🧹 Nitpick comments (4)
tests/consensus_tests/test_force_delete_peer.py (4)
9-14
: Function looks good but add a docstring.The helper function is well-implemented with proper error handling. Consider adding a docstring to explain the purpose of this function and its parameters.
def force_delete_peer(peer_api_uri: str, peer_id: int): + """ + Force delete a peer from the cluster. + + Args: + peer_api_uri: The API URI of any active peer in the cluster + peer_id: The ID of the peer to be deleted + + Raises: + AssertionError: If the request fails + """ response = requests.delete( f"{peer_api_uri}/cluster/peer/{peer_id}?force=true", ) assert response.status_code == 200, f"Failed to force delete peer: {response.text}"🧰 Tools
🪛 Ruff (0.8.2)
10-10:
requests
may be undefined, or defined from star imports(F405)
16-20
: Add a docstring to the helper function.Similar to the previous function, adding a docstring would improve code readability and maintainability.
def get_peer_id(peer_api_uri: str) -> int: + """ + Get the ID of a peer by querying its cluster information. + + Args: + peer_api_uri: The API URI of the peer + + Returns: + int: The ID of the peer + + Raises: + AssertionError: If the request fails + """ response = requests.get(f"{peer_api_uri}/cluster") assert response.status_code == 200, f"Failed to get peer ID: {response.text}" return response.json()["result"]["peer_id"]🧰 Tools
🪛 Ruff (0.8.2)
17-17:
requests
may be undefined, or defined from star imports(F405)
39-43
: Clarify the peer selection strategy.The test kills the last peer from the
processes
list but there's no explicit verification that this corresponds to the last peer inpeer_api_uris
. A comment would help clarify this relationship.- # Kill last peer + # Kill last peer (processes and peer_api_uris are populated in the same order by start_cluster) p = processes.pop() p.kill() sleep(1) # Give killed peer time to release WAL lock🧰 Tools
🪛 Ruff (0.8.2)
40-40:
processes
may be undefined, or defined from star imports(F405)
54-61
: Add verification of the shard transfer.The test waits for a shard transfer to start and checks the transfer count, but it would be good to add more detailed verification of the transfer state and logging for debugging.
# Wait for start of shard transfer wait_for_collection_shard_transfers_count(peer_api_uris[0], COLLECTION_NAME, 1) transfers = get_collection_cluster_info(peer_api_uris[0], COLLECTION_NAME)[ "shard_transfers" ] assert len(transfers) == 1 + print(f"Observed shard transfer: {transfers[0]}") assert transfers[0]["to"] == list(peer_id_to_url.keys())[-1] # last peer was restarted + assert "from" in transfers[0], f"Missing source peer in transfer: {transfers[0]}" from_peer_id = transfers[0]["from"]🧰 Tools
🪛 Ruff (0.8.2)
55-55:
wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports(F405)
57-57:
get_collection_cluster_info
may be undefined, or defined from star imports(F405)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/consensus_tests/test_force_delete_peer.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/consensus_tests/test_force_delete_peer.py (2)
tests/consensus_tests/fixtures.py (1)
upsert_random_points
(83-129)tests/consensus_tests/utils.py (6)
assert_project_root
(110-113)start_cluster
(220-245)wait_collection_exists_and_active_on_all_peers
(753-757)start_peer
(147-180)wait_for_collection_shard_transfers_count
(616-622)get_collection_cluster_info
(294-298)
🪛 Ruff (0.8.2)
tests/consensus_tests/test_force_delete_peer.py
2-2: from .utils import *
used; unable to detect undefined names
(F403)
10-10: requests
may be undefined, or defined from star imports
(F405)
17-17: requests
may be undefined, or defined from star imports
(F405)
22-22: assert_project_root
may be undefined, or defined from star imports
(F405)
24-24: start_cluster
may be undefined, or defined from star imports
(F405)
27-27: wait_collection_exists_and_active_on_all_peers
may be undefined, or defined from star imports
(F405)
40-40: processes
may be undefined, or defined from star imports
(F405)
48-48: start_peer
may be undefined, or defined from star imports
(F405)
55-55: wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports
(F405)
57-57: get_collection_cluster_info
may be undefined, or defined from star imports
(F405)
68-68: wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-consistency
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
tests/consensus_tests/test_force_delete_peer.py (3)
1-5
: Explicit imports should be used instead of wildcard imports.The code currently uses a wildcard import (
from .utils import *
) which makes it difficult to track which symbols are being imported. This could lead to maintainability issues and potential naming conflicts. Additionally, the code uses therequests
module without explicitly importing it.import pathlib -from .utils import * +from .utils import (assert_project_root, start_cluster, start_peer, + wait_collection_exists_and_active_on_all_peers, + wait_for_collection_shard_transfers_count, + get_collection_cluster_info, wait_for_same_commit, + processes) from .fixtures import upsert_random_points, create_collection from time import sleep +import requests🧰 Tools
🪛 Ruff (0.8.2)
2-2:
from .utils import *
used; unable to detect undefined names(F403)
21-22
: Add a docstring to the test function.A docstring would help explain the purpose of the test and the steps it takes, making it easier for other developers to understand the test case.
def test_force_delete_source_peer_during_transfers(tmp_path: pathlib.Path): + """ + Tests that shard transfers are aborted when the source peer is force deleted. + + Test steps: + 1. Start a cluster with multiple peers + 2. Create a collection with multiple shards and replicas + 3. Insert initial data + 4. Kill one peer to simulate failure + 5. Insert more data to trigger recovery transfer + 6. Restart the killed peer + 7. Wait for shard transfer to start + 8. Force delete the source peer of the transfer + 9. Verify that transfers are aborted + """ assert_project_root()🧰 Tools
🪛 Ruff (0.8.2)
22-22:
assert_project_root
may be undefined, or defined from star imports(F405)
64-69
: Improve the comment and handling of the source peer removal.The comment could be clearer and the code for killing the peer could be more explicit.
- # Stop the 'source' node to simulate an unreachable node which needs to be deleted + # Stop the source node of the shard transfer to simulate a failure during transfer source_peer_url = peer_id_to_url[from_peer_id] peer_idx = peer_api_uris.index(source_peer_url) - processes.pop(peer_idx).kill() # Kill the source peer - peer_api_uris.pop(peer_idx) # Remove from urls so we don't try to call it + source_process = processes.pop(peer_idx) + source_process.kill() # Kill the source peer + peer_api_uris.pop(peer_idx) # Remove from urls so we don't try to call it🧰 Tools
🪛 Ruff (0.8.2)
67-67:
processes
may be undefined, or defined from star imports(F405)
🧹 Nitpick comments (3)
tests/consensus_tests/test_force_delete_peer.py (3)
39-42
: Add a more robust way to identify the peer to kill.The current implementation simply kills the last peer in the processes list, which might be brittle if the order of processes changes. Consider using a more explicit way to identify which peer to kill.
- # Kill last peer - p = processes.pop() - p.kill() + # Kill the last peer to simulate a node failure + last_peer_process = processes.pop() + last_peer_process.kill() sleep(1) # Give killed peer time to release WAL lock🧰 Tools
🪛 Ruff (0.8.2)
40-40:
processes
may be undefined, or defined from star imports(F405)
57-63
: Verify the data transfer properties more thoroughly.The assertions are good, but could be more comprehensive to ensure the transfer is happening as expected. Consider adding assertions for other properties of the transfer.
transfers = get_collection_cluster_info(peer_api_uris[0], COLLECTION_NAME)[ "shard_transfers" ] assert len(transfers) == 1 + # Verify that transfer is going to the restarted peer assert transfers[0]["to"] == list(peer_id_to_url.keys())[-1] # last peer was restarted + # Verify that transfer is happening from a valid source peer from_peer_id = transfers[0]["from"] + assert from_peer_id in peer_id_to_url, f"Invalid source peer ID: {from_peer_id}"🧰 Tools
🪛 Ruff (0.8.2)
57-57:
get_collection_cluster_info
may be undefined, or defined from star imports(F405)
75-78
: Add an assertion to verify cluster stability after force deleting a peer.After verifying that transfers are aborted, it would be valuable to ensure the cluster remains stable and operational.
# We expect transfers to be aborted wait_for_collection_shard_transfers_count( peer_api_uris[0], COLLECTION_NAME, 0 ) + + # Verify that the cluster remains stable after force deleting a peer + # Try to insert some more points to ensure the cluster is still operational + upsert_random_points(peer_api_uris[0], 50) + + # Verify that all remaining peers have the same view of the cluster + wait_for_same_commit(peer_api_uris)🧰 Tools
🪛 Ruff (0.8.2)
76-76:
wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports(F405)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/consensus_tests/test_force_delete_peer.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/consensus_tests/test_force_delete_peer.py (2)
tests/consensus_tests/fixtures.py (1)
upsert_random_points
(83-129)tests/consensus_tests/utils.py (7)
assert_project_root
(110-113)start_cluster
(220-245)wait_collection_exists_and_active_on_all_peers
(753-757)start_peer
(147-180)wait_for_collection_shard_transfers_count
(616-622)get_collection_cluster_info
(294-298)wait_for_same_commit
(592-597)
🪛 Ruff (0.8.2)
tests/consensus_tests/test_force_delete_peer.py
2-2: from .utils import *
used; unable to detect undefined names
(F403)
10-10: requests
may be undefined, or defined from star imports
(F405)
17-17: requests
may be undefined, or defined from star imports
(F405)
22-22: assert_project_root
may be undefined, or defined from star imports
(F405)
24-24: start_cluster
may be undefined, or defined from star imports
(F405)
27-27: wait_collection_exists_and_active_on_all_peers
may be undefined, or defined from star imports
(F405)
40-40: processes
may be undefined, or defined from star imports
(F405)
48-48: start_peer
may be undefined, or defined from star imports
(F405)
55-55: wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports
(F405)
57-57: get_collection_cluster_info
may be undefined, or defined from star imports
(F405)
67-67: processes
may be undefined, or defined from star imports
(F405)
70-70: wait_for_same_commit
may be undefined, or defined from star imports
(F405)
76-76: wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-low-resources
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
- GitHub Check: lint
🔇 Additional comments (4)
tests/consensus_tests/test_force_delete_peer.py (4)
9-14
: LGTM: Helper function for force deleting a peer.The function is well-defined with proper type hints and error handling. It's a good abstraction for the test.
🧰 Tools
🪛 Ruff (0.8.2)
10-10:
requests
may be undefined, or defined from star imports(F405)
16-19
: LGTM: Helper function for retrieving peer ID.This utility function provides a clean abstraction for getting a peer's ID with proper error handling and type annotations.
🧰 Tools
🪛 Ruff (0.8.2)
17-17:
requests
may be undefined, or defined from star imports(F405)
54-56
: LGTM: Waiting for shard transfer initialization.Good usage of the
wait_for_collection_shard_transfers_count
utility function to ensure the test waits for the expected condition before proceeding.🧰 Tools
🪛 Ruff (0.8.2)
55-55:
wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports(F405)
72-73
: LGTM: Force deleting the source peer.Correctly uses the helper function to force delete the source peer of the shard transfer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/consensus_tests/test_force_delete_peer.py (2)
1-5
: 🛠️ Refactor suggestionClean up imports to improve code maintainability.
The code uses wildcard imports (
from .utils import *
) which makes it difficult to track which symbols are imported. It also uses therequests
module without explicitly importing it.import pathlib -from .utils import * +from .utils import (assert_project_root, start_cluster, start_peer, + wait_collection_exists_and_active_on_all_peers, + wait_for_collection_shard_transfers_count, + get_collection_cluster_info, processes) from .fixtures import upsert_random_points, create_collection from time import sleep +import requests🧰 Tools
🪛 Ruff (0.8.2)
2-2:
from .utils import *
used; unable to detect undefined names(F403)
21-22
: 🛠️ Refactor suggestionAdd docstring to explain test purpose and steps.
The test function should include a docstring that explains its purpose and the steps it follows to test the behavior.
def test_force_delete_source_peer_during_transfers(tmp_path: pathlib.Path): + """ + Tests that shard transfers are aborted when the source peer is force deleted. + + Test steps: + 1. Start a cluster with 3 peers + 2. Create a collection with 2 shards and 3 replicas + 3. Insert initial data + 4. Kill one peer to simulate failure + 5. Insert more data to trigger recovery transfer + 6. Restart the killed peer + 7. Wait for shard transfer to start + 8. Force delete the source peer of the transfer + 9. Verify that transfers are aborted + """ assert_project_root()🧰 Tools
🪛 Ruff (0.8.2)
22-22:
assert_project_root
may be undefined, or defined from star imports(F405)
🧹 Nitpick comments (5)
tests/consensus_tests/test_force_delete_peer.py (5)
9-14
: Function looks good, but add docstring.The function to force delete a peer is well-implemented, but missing a docstring to explain its purpose and parameters.
def force_delete_peer(peer_api_uri: str, peer_id: int): + """ + Force delete a peer from the cluster with the specified ID. + + Args: + peer_api_uri: URI of any active peer in the cluster + peer_id: ID of the peer to be forcibly deleted + """ response = requests.delete( f"{peer_api_uri}/cluster/peer/{peer_id}?force=true", ) assert response.status_code == 200, f"Failed to force delete peer: {response.text}"🧰 Tools
🪛 Ruff (0.8.2)
10-10:
requests
may be undefined, or defined from star imports(F405)
16-20
: Function looks good, but add docstring.The function to get peer ID is well-implemented, but missing a docstring to explain its purpose and parameters.
def get_peer_id(peer_api_uri: str) -> int: + """ + Get the peer ID for a peer given its API URI. + + Args: + peer_api_uri: URI of the peer + + Returns: + The peer ID as an integer + """ response = requests.get(f"{peer_api_uri}/cluster") assert response.status_code == 200, f"Failed to get peer ID: {response.text}" return response.json()["result"]["peer_id"]🧰 Tools
🪛 Ruff (0.8.2)
17-17:
requests
may be undefined, or defined from star imports(F405)
39-42
: Improve comment clarity for killed peer.The comment "Kill last peer" could be clearer about which peer is being killed and why.
- # Kill last peer + # Kill the last peer to simulate a node failure p = processes.pop() p.kill() sleep(1) # Give killed peer time to release WAL lock🧰 Tools
🪛 Ruff (0.8.2)
40-40:
processes
may be undefined, or defined from star imports(F405)
44-45
: Clarify why upserting points triggers recovery.The comment mentions marking a dummy replica as dead, but it would be helpful to explain how upserting points accomplishes this.
- # Upsert points to mark dummy replica as dead, that will trigger recovery transfer + # Upsert points - since one peer is down, these updates won't reach it + # This will mark its replicas as stale, triggering recovery transfer when it rejoins upsert_random_points(peer_api_uris[0], 100)
54-62
: Good verification of transfer target, consider verifying shard ID.The code correctly waits for and verifies the shard transfer, but it might be useful to capture and verify the specific shard ID being transferred.
# Wait for start of shard transfer wait_for_collection_shard_transfers_count(peer_api_uris[0], COLLECTION_NAME, 1) transfers = get_collection_cluster_info(peer_api_uris[0], COLLECTION_NAME)[ "shard_transfers" ] assert len(transfers) == 1 assert transfers[0]["to"] == list(peer_id_to_url.keys())[-1] # last peer was restarted from_peer_id = transfers[0]["from"] + shard_id = transfers[0]["shard_id"] # Capture shard ID for debugging/verification
🧰 Tools
🪛 Ruff (0.8.2)
55-55:
wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports(F405)
57-57:
get_collection_cluster_info
may be undefined, or defined from star imports(F405)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/consensus_tests/test_force_delete_peer.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/consensus_tests/test_force_delete_peer.py (2)
tests/consensus_tests/fixtures.py (1)
upsert_random_points
(83-129)tests/consensus_tests/utils.py (6)
assert_project_root
(110-113)start_cluster
(220-245)wait_collection_exists_and_active_on_all_peers
(753-757)start_peer
(147-180)wait_for_collection_shard_transfers_count
(616-622)get_collection_cluster_info
(294-298)
🪛 Ruff (0.8.2)
tests/consensus_tests/test_force_delete_peer.py
2-2: from .utils import *
used; unable to detect undefined names
(F403)
10-10: requests
may be undefined, or defined from star imports
(F405)
17-17: requests
may be undefined, or defined from star imports
(F405)
22-22: assert_project_root
may be undefined, or defined from star imports
(F405)
24-24: start_cluster
may be undefined, or defined from star imports
(F405)
27-27: wait_collection_exists_and_active_on_all_peers
may be undefined, or defined from star imports
(F405)
40-40: processes
may be undefined, or defined from star imports
(F405)
48-48: start_peer
may be undefined, or defined from star imports
(F405)
55-55: wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports
(F405)
57-57: get_collection_cluster_info
may be undefined, or defined from star imports
(F405)
73-73: wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-consistency
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: lint
- GitHub Check: integration-tests
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
🔇 Additional comments (4)
tests/consensus_tests/test_force_delete_peer.py (4)
31-35
: LGTM! Useful mapping between peer IDs and URLs.The code builds a mapping between peer IDs and their API URLs, which is essential for later operations in the test.
47-53
: LGTM! Good approach for restarting the peer.The code correctly restarts the previously killed peer, which should trigger shard transfers to synchronize the data it missed while being down.
🧰 Tools
🪛 Ruff (0.8.2)
48-48:
start_peer
may be undefined, or defined from star imports(F405)
69-71
: LGTM! Force deleting the peer is the key test action.This is the core action being tested - forcing deletion of a peer while it's the source of an ongoing shard transfer.
72-75
: LGTM! Good verification of expected behavior.The test correctly verifies that shard transfers are aborted after force deleting the source peer, which is the behavior this test aims to verify.
🧰 Tools
🪛 Ruff (0.8.2)
73-73:
wait_for_collection_shard_transfers_count
may be undefined, or defined from star imports(F405)
* Abort transfer on force peer delete * ToC changes to abort internally * Abort shard transfer using consensus not just locally * fmt * fix fmt * Abort transfers internally when shard is being removed * remove unused imports
* Test for reproducing force delete peer bug * drop seed port * use larger transfers to ensure killing before transfer is completed * simplify test cases * kill and wait for same commit before force delete * Don't kill to avoid breaking consesnsus before force delete peer * Abort transfer on force peer delete (#6507) * Abort transfer on force peer delete * ToC changes to abort internally * Abort shard transfer using consensus not just locally * fmt * fix fmt * Abort transfers internally when shard is being removed * remove unused imports
Add new test for force deleting peer while transfers are ongoing. Currently, it looks like Qdrant doesn't abort shard transfers if the source peer is force deleted.
All Submissions:
dev
branch. Did you create your branch fromdev
?