-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix default replication factor #6632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
lib/storage/src/content_manager/toc/mod.rs (2)
366-366
: Fix typo in error message.There's a typo in the error message: "NUmber" should be "Number".
- NonZeroU32::new(number_of_peers as u32).expect("NUmber of peers must be at least 1"); + NonZeroU32::new(number_of_peers as u32).expect("Number of peers must be at least 1");
368-371
: Consider improving error handling to avoid potential panics.The
expect()
calls could panic if the configuration contains invalid values (e.g., 0 for shard number). While theCollectionConfigDefaults
struct has validation attributes, runtime validation failures could still occur.Consider using more graceful error handling:
- let suggested_shard_number = collection_defaults - .map(|cd| cd.get_shard_number(number_of_peers as u32)) - .map(|x| NonZeroU32::new(x).expect("Shard number must be at least 1")) - .unwrap_or_else(|| default_shard_number().saturating_mul(non_zero_number_of_peers)); + let suggested_shard_number = collection_defaults + .map(|cd| cd.get_shard_number(number_of_peers as u32)) + .and_then(NonZeroU32::new) + .unwrap_or_else(|| default_shard_number().saturating_mul(non_zero_number_of_peers));This would gracefully fall back to the default calculation if the configured shard number is invalid, rather than panicking.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
lib/collection/src/config.rs
(2 hunks)lib/segment/src/data_types/collection_defaults.rs
(1 hunks)lib/segment/src/data_types/mod.rs
(1 hunks)lib/segment/src/types.rs
(0 hunks)lib/storage/src/content_manager/toc/collection_meta_ops.rs
(1 hunks)lib/storage/src/content_manager/toc/create_collection.rs
(2 hunks)lib/storage/src/content_manager/toc/mod.rs
(4 hunks)lib/storage/src/dispatcher.rs
(1 hunks)lib/storage/src/types.rs
(1 hunks)
💤 Files with no reviewable changes (1)
- lib/segment/src/types.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/storage/src/content_manager/toc/create_collection.rs (1)
lib/collection/src/config.rs (1)
default_replication_factor
(186-188)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: lint
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
🔇 Additional comments (10)
lib/storage/src/types.rs (1)
19-19
: LGTM: Import path updated correctly.The import path for
CollectionConfigDefaults
has been correctly updated to reflect its relocation to the newsegment::data_types::collection_defaults
module.lib/segment/src/data_types/mod.rs (1)
2-2
: LGTM: New module declaration added correctly.The
collection_defaults
module is properly declared and placed in alphabetical order with the existing modules.lib/storage/src/content_manager/toc/create_collection.rs (3)
89-89
: LGTM: Correct use ofand_then
for optional field access.The change from
.map()
to.and_then()
correctly handles the case whereshard_number
is now an optional field within the optionalCollectionConfigDefaults
, flatteningOption<Option<T>>
toOption<T>
.
120-120
: LGTM: Correct use ofand_then
for optional field access.The change from
.map()
to.and_then()
correctly handles the case wherereplication_factor
is now an optional field within the optionalCollectionConfigDefaults
, flatteningOption<Option<T>>
toOption<T>
.
124-124
: LGTM: Correct use ofand_then
for optional field access.The change from
.map()
to.and_then()
correctly handles the case wherewrite_consistency_factor
is now an optional field within the optionalCollectionConfigDefaults
, flatteningOption<Option<T>>
toOption<T>
.lib/storage/src/content_manager/toc/collection_meta_ops.rs (1)
49-52
: LGTM: Proper fallback logic for shard number determination.The code correctly prioritizes explicit shard number from the operation request and falls back to the dynamically computed suggested shard number. This maintains backward compatibility while enabling the new default calculation logic.
lib/storage/src/dispatcher.rs (1)
88-95
: LGTM! Code simplification improves maintainability.The refactoring delegates shard number calculation to the
suggest_shard_distribution
method, which is a good separation of concerns. This removes manual calculation logic and makes the code more maintainable.lib/collection/src/config.rs (1)
183-192
: LGTM! Simplified fallback defaults align with refactoring.The hardcoded defaults of 1 serve as appropriate fallbacks, while the main configuration logic is now handled by
CollectionConfigDefaults
. This change supports the broader refactoring to make default values more configurable and dynamic.lib/segment/src/data_types/collection_defaults.rs (1)
32-44
: Well-designed conflict resolution logic.The
get_shard_number
method properly handles the potential conflict betweenshard_number
andshard_number_per_node
by:
- Logging a warning when both are set (good observability)
- Prioritizing
shard_number
in conflicts (reasonable precedence)- Falling back to
number_of_peers
when neither is configured (sensible default)This provides clear, predictable behavior for configuration conflicts.
lib/storage/src/content_manager/toc/mod.rs (1)
359-411
: Excellent refactoring improves configurability.The new
suggest_shard_distribution
method signature and implementation provide much better flexibility by:
- Accepting dynamic configuration through
CollectionConfigDefaults
- Supporting cluster-aware shard number calculation
- Maintaining backward compatibility with fallback logic
This change successfully centralizes collection configuration logic while preserving existing behavior.
let collection_defaults = self.storage_config.collection.as_ref(); | ||
|
||
let suggested_shard_number = | ||
collection_defaults.map(|config| config.get_shard_number(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
LGTM: Improved shard number calculation using dynamic defaults.
The refactoring replaces static shard number calculation with dynamic computation using the new CollectionConfigDefaults.get_shard_number()
method. This approach better integrates with cluster-aware configuration defaults.
Verify that the parameter 1
passed to get_shard_number(1)
is appropriate for this context. Let me check the implementation of this method:
🏁 Script executed:
#!/bin/bash
# Description: Find the get_shard_number method implementation to understand the parameter meaning
# Expected: Implementation showing what the parameter represents (peer count, multiplier, etc.)
ast-grep --pattern $'impl $_ {
$$$
fn get_shard_number($_, $_) -> $_ {
$$$
}
$$$
}'
Length of output: 96
🏁 Script executed:
#!/bin/bash
set -e
# Locate the get_shard_number function definition
rg -n "fn get_shard_number" .
# Fallback: search for uses to find its origin
rg -n "get_shard_number" .
Length of output: 583
🏁 Script executed:
#!/bin/bash
# Display get_shard_number implementation with context
rg -n -C3 "pub fn get_shard_number" lib/segment/src/data_types/collection_defaults.rs
Length of output: 421
Fix hard-coded peer count in shard calculation
The get_shard_number(&self, number_of_peers: u32)
method uses the cluster’s peer count to scale shard_number_per_node
. By passing the literal 1
you’re effectively forcing a single-node assumption, which will yield incorrect shard counts when running in multi-peer mode.
Locations to update:
- lib/storage/src/content_manager/toc/collection_meta_ops.rs:47
Suggested change:
- let suggested_shard_number =
- collection_defaults.map(|config| config.get_shard_number(1));
+ let suggested_shard_number =
+ collection_defaults.map(|config| config.get_shard_number(number_of_peers as u32));
If number_of_peers
isn’t in scope here, pull it in from your cluster context or pass it through this function.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In lib/storage/src/content_manager/toc/collection_meta_ops.rs around lines 44 to
47, the call to get_shard_number currently passes a hard-coded value of 1 for
the number_of_peers parameter, which assumes a single-node setup and can cause
incorrect shard calculations in multi-peer clusters. To fix this, replace the
literal 1 with the actual number_of_peers value obtained from the cluster
context or pass it through the function so that get_shard_number receives the
correct peer count dynamically.
Fixes #6623
(also fixes the same problem for configurable number of shards)