-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix PointId Hash #6932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PointId Hash #6932
Conversation
fn hash<H: std::hash::Hasher>(&self, state: &mut H) { | ||
// WARN: Changing this implementation might break backward compatibility | ||
// Of hash-ring and shard routing | ||
let __self_discr = std::mem::discriminant(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have doubts about this: I used "Expand Macro" from Rust Playground, but it generates nightly. This version is copilot suggestion of how to fix Nightly build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust 1.87 generates the same (std::mem::discriminant
).
📝 WalkthroughWalkthroughA new unit test named Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Note ⚡️ Unit Test Generation - BetaCodeRabbit's unit test generation is now available in Beta! Automatically generate comprehensive unit tests for your code changes, ensuring better test coverage and catching edge cases you might miss. Our AI analyzes your code structure and creates tests that follow best practices and your project's testing patterns. Learn more here, or just try it under ✨ Finishing Touches. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/collection/src/operations/point_ops.rs (1)
936-1003
: Excellent regression test for hash consistency.This test effectively addresses the PR objective by ensuring that shard mapping remains deterministic after the Hash implementation fix. The mix of UUID and numeric IDs provides good coverage.
Consider refactoring for better maintainability:
+ let expected_mappings = vec![ + (id1, 2), (id2, 1), (id3, 2), (id4, 2), (id5, 0), + (id6, 0), (id7, 0), (id8, 2), (id9, 1) + ]; + match operation_to_shard { OperationToShard::ByShard(by_shard) => { + let mut actual_mappings = HashMap::new(); for (shard_id, points) in by_shard { for point in points { - // Important: This mapping should not change with new updates! - if point.id == id1 { - assert_eq!(shard_id, 2); - } - // ... (repeat for all IDs) + actual_mappings.insert(point.id.clone(), shard_id); } } + + for (id, expected_shard) in expected_mappings { + assert_eq!( + actual_mappings.get(&id), + Some(&expected_shard), + "Point ID {:?} should map to shard {}, but was assigned to {:?}", + id, expected_shard, actual_mappings.get(&id) + ); + } }This approach makes the expected mappings more visible and provides better error messages.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/operations/point_ops.rs
(1 hunks)lib/segment/src/types.rs
(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
Learnt from: timvisee
PR: qdrant/qdrant#6444
File: lib/segment/src/index/field_index/map_index/immutable_map_index.rs:373-375
Timestamp: 2025-05-08T10:34:42.223Z
Learning: In the Qdrant codebase, storage backends may have different error handling approaches. Specifically, the mmap variant of `remove_point` method does not return errors, while the RocksDB variant does propagate errors using the `?` operator. These differences are intentional by design.
lib/collection/src/operations/point_ops.rs (4)
Learnt from: generall
PR: #6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using id.to_string().parse().unwrap()
is acceptable for testing purposes and doesn't need to match production code's id_tracker.internal_id()
approach. Test code can use mock implementations that serve the testing goals.
Learnt from: timvisee
PR: #6444
File: lib/segment/src/index/field_index/map_index/immutable_map_index.rs:373-375
Timestamp: 2025-05-08T10:34:42.223Z
Learning: In the Qdrant codebase, storage backends may have different error handling approaches. Specifically, the mmap variant of remove_point
method does not return errors, while the RocksDB variant does propagate errors using the ?
operator. These differences are intentional by design.
Learnt from: timvisee
PR: #6546
File: lib/segment/src/index/field_index/full_text_index/text_index.rs:497-518
Timestamp: 2025-05-19T09:25:54.285Z
Learning: In the Qdrant project, using .expect()
and .unwrap()
is considered acceptable in test code since test failures should be loud and obvious, and proper error propagation adds unnecessary complexity to test code.
Learnt from: coszio
PR: #6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
lib/segment/src/types.rs (1)
Learnt from: generall
PR: #6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using id.to_string().parse().unwrap()
is acceptable for testing purposes and doesn't need to match production code's id_tracker.internal_id()
approach. Test code can use mock implementations that serve the testing goals.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test-consistency
- GitHub Check: e2e-tests
- GitHub Check: lint
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
🔇 Additional comments (3)
lib/segment/src/types.rs (2)
149-149
: LGTM: Correctly removed Hash from derive attributesRemoving
Hash
from the derive macro is the correct approach when providing a manual implementation of theHash
trait.
158-170
: Excellent manual Hash implementation with proper backward compatibility considerationsThis manual
Hash
implementation correctly addresses the UUID crate Hash changes mentioned in the PR objectives:
- Proper enum hashing: First hashes the discriminant to distinguish between
NumId
andUuid
variants- Deterministic UUID hashing: Uses
uuid.as_bytes()
for consistent byte-level hashing- Backward compatibility warning: The comment appropriately warns about the impact on hash-ring and shard routing
This ensures consistent shard mapping regardless of changes in the UUID crate's Hash implementation.
lib/collection/src/operations/point_ops.rs (1)
929-929
: LGTM: Necessary imports for the test.The imports are correctly added to support the new test functionality.
Also applies to: 932-932
UUID crate changed Hash implementation, which silently broke our
HashRing
routing#6595
This PR adds manual implementation of Hash for
ExtendedPointId
and unit-test which pins shard mapping.