Skip to content

Conversation

generall
Copy link
Member

@generall generall commented Jun 2, 2025

Fixes #6623

(also fixes the same problem for configurable number of shards)

@generall generall requested a review from JojiiOfficial June 2, 2025 20:32

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
lib/storage/src/content_manager/toc/mod.rs (2)

366-366: Fix typo in error message.

There's a typo in the error message: "NUmber" should be "Number".

-            NonZeroU32::new(number_of_peers as u32).expect("NUmber of peers must be at least 1");
+            NonZeroU32::new(number_of_peers as u32).expect("Number of peers must be at least 1");

368-371: Consider improving error handling to avoid potential panics.

The expect() calls could panic if the configuration contains invalid values (e.g., 0 for shard number). While the CollectionConfigDefaults struct has validation attributes, runtime validation failures could still occur.

Consider using more graceful error handling:

-        let suggested_shard_number = collection_defaults
-            .map(|cd| cd.get_shard_number(number_of_peers as u32))
-            .map(|x| NonZeroU32::new(x).expect("Shard number must be at least 1"))
-            .unwrap_or_else(|| default_shard_number().saturating_mul(non_zero_number_of_peers));
+        let suggested_shard_number = collection_defaults
+            .map(|cd| cd.get_shard_number(number_of_peers as u32))
+            .and_then(NonZeroU32::new)
+            .unwrap_or_else(|| default_shard_number().saturating_mul(non_zero_number_of_peers));

This would gracefully fall back to the default calculation if the configured shard number is invalid, rather than panicking.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0525a3 and 62e7546.

📒 Files selected for processing (9)
  • lib/collection/src/config.rs (2 hunks)
  • lib/segment/src/data_types/collection_defaults.rs (1 hunks)
  • lib/segment/src/data_types/mod.rs (1 hunks)
  • lib/segment/src/types.rs (0 hunks)
  • lib/storage/src/content_manager/toc/collection_meta_ops.rs (1 hunks)
  • lib/storage/src/content_manager/toc/create_collection.rs (2 hunks)
  • lib/storage/src/content_manager/toc/mod.rs (4 hunks)
  • lib/storage/src/dispatcher.rs (1 hunks)
  • lib/storage/src/types.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • lib/segment/src/types.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/storage/src/content_manager/toc/create_collection.rs (1)
lib/collection/src/config.rs (1)
  • default_replication_factor (186-188)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: lint
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: storage-compat-test
🔇 Additional comments (10)
lib/storage/src/types.rs (1)

19-19: LGTM: Import path updated correctly.

The import path for CollectionConfigDefaults has been correctly updated to reflect its relocation to the new segment::data_types::collection_defaults module.

lib/segment/src/data_types/mod.rs (1)

2-2: LGTM: New module declaration added correctly.

The collection_defaults module is properly declared and placed in alphabetical order with the existing modules.

lib/storage/src/content_manager/toc/create_collection.rs (3)

89-89: LGTM: Correct use of and_then for optional field access.

The change from .map() to .and_then() correctly handles the case where shard_number is now an optional field within the optional CollectionConfigDefaults, flattening Option<Option<T>> to Option<T>.


120-120: LGTM: Correct use of and_then for optional field access.

The change from .map() to .and_then() correctly handles the case where replication_factor is now an optional field within the optional CollectionConfigDefaults, flattening Option<Option<T>> to Option<T>.


124-124: LGTM: Correct use of and_then for optional field access.

The change from .map() to .and_then() correctly handles the case where write_consistency_factor is now an optional field within the optional CollectionConfigDefaults, flattening Option<Option<T>> to Option<T>.

lib/storage/src/content_manager/toc/collection_meta_ops.rs (1)

49-52: LGTM: Proper fallback logic for shard number determination.

The code correctly prioritizes explicit shard number from the operation request and falls back to the dynamically computed suggested shard number. This maintains backward compatibility while enabling the new default calculation logic.

lib/storage/src/dispatcher.rs (1)

88-95: LGTM! Code simplification improves maintainability.

The refactoring delegates shard number calculation to the suggest_shard_distribution method, which is a good separation of concerns. This removes manual calculation logic and makes the code more maintainable.

lib/collection/src/config.rs (1)

183-192: LGTM! Simplified fallback defaults align with refactoring.

The hardcoded defaults of 1 serve as appropriate fallbacks, while the main configuration logic is now handled by CollectionConfigDefaults. This change supports the broader refactoring to make default values more configurable and dynamic.

lib/segment/src/data_types/collection_defaults.rs (1)

32-44: Well-designed conflict resolution logic.

The get_shard_number method properly handles the potential conflict between shard_number and shard_number_per_node by:

  • Logging a warning when both are set (good observability)
  • Prioritizing shard_number in conflicts (reasonable precedence)
  • Falling back to number_of_peers when neither is configured (sensible default)

This provides clear, predictable behavior for configuration conflicts.

lib/storage/src/content_manager/toc/mod.rs (1)

359-411: Excellent refactoring improves configurability.

The new suggest_shard_distribution method signature and implementation provide much better flexibility by:

  • Accepting dynamic configuration through CollectionConfigDefaults
  • Supporting cluster-aware shard number calculation
  • Maintaining backward compatibility with fallback logic

This change successfully centralizes collection configuration logic while preserving existing behavior.

Comment on lines 44 to 47
let collection_defaults = self.storage_config.collection.as_ref();

let suggested_shard_number =
collection_defaults.map(|config| config.get_shard_number(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

LGTM: Improved shard number calculation using dynamic defaults.

The refactoring replaces static shard number calculation with dynamic computation using the new CollectionConfigDefaults.get_shard_number() method. This approach better integrates with cluster-aware configuration defaults.

Verify that the parameter 1 passed to get_shard_number(1) is appropriate for this context. Let me check the implementation of this method:


🏁 Script executed:

#!/bin/bash
# Description: Find the get_shard_number method implementation to understand the parameter meaning
# Expected: Implementation showing what the parameter represents (peer count, multiplier, etc.)

ast-grep --pattern $'impl $_ {
  $$$
  fn get_shard_number($_, $_) -> $_ {
    $$$
  }
  $$$
}'

Length of output: 96


🏁 Script executed:

#!/bin/bash
set -e
# Locate the get_shard_number function definition
rg -n "fn get_shard_number" .

# Fallback: search for uses to find its origin
rg -n "get_shard_number" .

Length of output: 583


🏁 Script executed:

#!/bin/bash
# Display get_shard_number implementation with context
rg -n -C3 "pub fn get_shard_number" lib/segment/src/data_types/collection_defaults.rs

Length of output: 421


Fix hard-coded peer count in shard calculation

The get_shard_number(&self, number_of_peers: u32) method uses the cluster’s peer count to scale shard_number_per_node. By passing the literal 1 you’re effectively forcing a single-node assumption, which will yield incorrect shard counts when running in multi-peer mode.

Locations to update:

  • lib/storage/src/content_manager/toc/collection_meta_ops.rs:47

Suggested change:

- let suggested_shard_number =
-     collection_defaults.map(|config| config.get_shard_number(1));
+ let suggested_shard_number =
+     collection_defaults.map(|config| config.get_shard_number(number_of_peers as u32));

If number_of_peers isn’t in scope here, pull it in from your cluster context or pass it through this function.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In lib/storage/src/content_manager/toc/collection_meta_ops.rs around lines 44 to
47, the call to get_shard_number currently passes a hard-coded value of 1 for
the number_of_peers parameter, which assumes a single-node setup and can cause
incorrect shard calculations in multi-peer clusters. To fix this, replace the
literal 1 with the actual number_of_peers value obtained from the cluster
context or pass it through the function so that get_shard_number receives the
correct peer count dynamically.

@generall generall merged commit 59c034f into dev Jun 4, 2025
17 checks passed
@generall generall deleted the fix-default-replication-factor branch June 4, 2025 08:04
generall added a commit that referenced this pull request Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants