-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix UnindexedField infra to handle lookup&range index requirements #6496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This looks hacky. It seems like we are missing one abstraction. We have:
Just because we have customizable integer index, I feel like we should have one which only captures functionality, following the internal enum FieldIndexType {
Int,
Datetime,
IntMap,
Keyword,
Float,
Geo,
FullText,
Bool,
Uuid,
UuidMap,
Null
} This way we should be able to select one of these when checking the filter, and then validate against schema if it is fulfilled or not. If it is not, it should be trivial to implement a conversion to a payload schema. |
369c2e2
to
a3443ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/collection/src/problems/unindexed_field.rs (2)
413-427
: Fix documentation comment forupdate_needs_range
.The documentation comment for
update_needs_range
incorrectly refers to "Match capability" instead of "Range capability".-/// Capture that an index with Match capability is required. +/// Capture that an index with Range capability is required. /// Does not toggle off if has been set once already. fn update_needs_range(&mut self, is_range: bool) {
415-419
: Consider optimizing the conditional logic in update methods.Both helper methods have a conditional that checks if the flag is false before setting it based on the input. Since these methods are only called to potentially set the flag to true (never to set it back to false), you could simplify them:
fn update_needs_match(&mut self, is_match: bool) { - if !self.needs_match_index { - self.needs_match_index = is_match - } + self.needs_match_index |= is_match; } fn update_needs_range(&mut self, is_range: bool) { - if !self.needs_range_index { - self.needs_range_index = is_range - } + self.needs_range_index |= is_range; }This uses the bitwise OR assignment operator to set the flag to true if either the current value is true or
is_match
/is_range
is true, which achieves the same result more concisely.Also applies to: 424-427
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/collection/src/problems/unindexed_field.rs
(5 hunks)lib/segment/src/types.rs
(1 hunks)tests/openapi/test_strictmode.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/openapi/test_strictmode.py
- lib/segment/src/types.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: integration-tests
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: lint
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: storage-compat-test
🔇 Additional comments (4)
lib/collection/src/problems/unindexed_field.rs (4)
312-313
: Good addition of capability tracking flags.These new boolean fields explicitly track whether the query needs indexes that support match or range operations, which aligns with the PR objective of capturing specific indexing requirements.
373-374
: LGTM! Well-placed capability detection.You're correctly updating the match and range requirements based on condition predicates. This approach is more explicit and robust than trying to infer requirements solely from schema type comparisons.
432-439
: Good implementation of capability-based index requirement detection.This is a key improvement that addresses the PR objective. The code now explicitly checks if the current index supports the required capabilities before falling back to comparing schema types.
440-440
: Variable renaming improves code clarity.Renaming the variable to
index_info_kind
makes it clearer that this represents the type/kind of the index being evaluated.
Thanks for the feedback 👍 I agree that my solution is hacky, I will explore your idea to make the solution more generic. |
a3443ac
to
af248be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/collection/src/problems/unindexed_field.rs (1)
578-605
: Consider a more generic approach in a future refactoring.While this implementation addresses the immediate needs, consider exploring the abstraction mentioned in the PR comments for a future refactoring. A proper abstraction could make the code more maintainable and reduce the need for explicit conversions between types.
A future improvement could define a trait that represents index functionality and implements it for both
PayloadFieldSchema
andFieldIndexType
, allowing for more flexible comparisons without explicit conversions. This would align with the suggestion in the PR comments about introducing a new abstraction focused solely on capturing the functionality of the index.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/collection/src/problems/unindexed_field.rs
(11 hunks)
🔇 Additional comments (12)
lib/collection/src/problems/unindexed_field.rs (12)
11-11
: Good addition of required imports.The new imports for
IntegerIndexParams
andEnumIter
trait are necessary for the refactored code to handle parametrized integer indexes and iterate over the newFieldIndexType
enum.Also applies to: 18-18
165-167
: LGTM: Refactored to use new FieldIndexType enum.The function now returns an iterator of the new
FieldIndexType
enum instead ofPayloadFieldSchema
, which better represents specific index capabilities rather than complete schema definitions.
169-189
: Improved index inference logic with explicit capability types.The function now correctly returns specific index capabilities (
FieldIndexType
variants) based on the match value type, which creates a clearer separation between index functionality requirements and schema definitions.
191-211
: Well-structured index type inference from AnyVariants.Similar to the previous function, this now correctly maps value types to their required index capabilities using the new
FieldIndexType
enum, maintaining consistent behavior with the refactored design.
213-256
: Comprehensive refactoring of index type inference from field conditions.This function properly maps various field conditions to their required index capabilities, taking into account different query types like matches, ranges, geo queries, etc. The changes align well with the overall refactoring approach.
349-392
: Updated condition handling to use the new FieldIndexType.The function now properly collects required indexes as
FieldIndexType
values and converts them toPayloadFieldSchema
only when updating the unindexed schema map, which maintains a cleaner separation of concerns.
395-408
: Enhanced index capability checking.The
needs_index
method now correctly checks if the existing schema supports any of the required index capabilities, rather than making direct schema comparisons. This addresses the issue mentioned in the PR description about handling lookup and range requirements.
410-517
: Updated expression parsing to use FieldIndexType.The update from expression method now uses the new
FieldIndexType
enum, maintaining consistency with the refactored approach throughout the codebase.
520-533
: Well-designed FieldIndexType enum.The new enum provides a focused representation of index capabilities, aligning with the PR's goal of explicitly capturing indexing requirements. This is a good step toward addressing the "missing abstraction" mentioned in the PR comments.
535-573
: Key improvement: Properly handling parametrized integer indexes.The
schema_capacities
function correctly maps fromPayloadFieldSchema
to its supported capabilities, with special handling forIntegerIndexParams
that checks both lookup and range flags. This directly addresses the core issue of handling lookup and range index requirements.
557-560
: Good debug assertion for parameter validation.The debug assertion ensures that either lookup or range must be true for an Integer payload index, which helps catch potential configuration issues during development.
575-606
: Well-implemented conversion from capability type to schema.The
From<FieldIndexType> for PayloadFieldSchema
implementation correctly creates appropriate field schemas based on the required capability type, completing the two-way conversion between capabilities and schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/openapi/test_strictmode.py
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
- GitHub Check: lint
🔇 Additional comments (3)
tests/openapi/test_strictmode.py (3)
165-165
: Appropriate test renaming for clarity.The renaming of this test from
test_strict_mode_unindexed_filter_read_validation
totest_strict_mode_unindexed_filter_keyword_read_validation
improves clarity by specifying the field type being tested. This aligns well with the new test for integer fields.
220-303
: Well-structured test for integer field indexing requirements.This new test properly validates strict mode behavior with integer fields, specifically testing the lookup capability requirements. The test covers all necessary scenarios:
- Verifying behavior without strict mode
- Testing with strict mode but allowing unindexed filtering
- Testing with strict mode and disabling unindexed filtering
- Creating an index with lookup capability and verifying success
- Removing lookup capability and verifying failure
The test appropriately verifies behavior after each index configuration change, addressing the previously raised concern.
1456-1475
: Formula expression test extended to cover integer fields.The formula expression test has been appropriately extended to test integer fields with lookup capabilities. The test creates an index on
discount_price
with the necessary lookup capability and verifies that the query succeeds after index creation. This aligns with the PR objective of handling lookup requirements for integer fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/collection/src/problems/unindexed_field.rs (1)
608-623
: Extend unit test coverage for integer index parametersThe current unit test only covers one case: when both lookup and range are true. Based on the past review comments, there are additional cases that should be tested:
- lookup = true, range = false => only IntMatch is inserted
- lookup = false, range = true => only IntRange is inserted
- lookup = false, range = false => triggers the debug assertion
#[cfg(test)] mod tests { use super::*; #[test] fn integer_index_capacities() { let params = PayloadSchemaParams::Integer(IntegerIndexParams { lookup: Some(true), range: Some(true), ..Default::default() }); let schema = PayloadFieldSchema::FieldParams(params); let index_types = schema_capacities(&schema); assert!(index_types.contains(&FieldIndexType::IntMatch)); assert!(index_types.contains(&FieldIndexType::IntRange)); } + + #[test] + fn integer_index_lookup_only() { + let params = PayloadSchemaParams::Integer(IntegerIndexParams { + lookup: Some(true), + range: Some(false), + ..Default::default() + }); + let schema = PayloadFieldSchema::FieldParams(params); + let index_types = schema_capacities(&schema); + assert!(index_types.contains(&FieldIndexType::IntMatch)); + assert!(!index_types.contains(&FieldIndexType::IntRange)); + } + + #[test] + fn integer_index_range_only() { + let params = PayloadSchemaParams::Integer(IntegerIndexParams { + lookup: Some(false), + range: Some(true), + ..Default::default() + }); + let schema = PayloadFieldSchema::FieldParams(params); + let index_types = schema_capacities(&schema); + assert!(!index_types.contains(&FieldIndexType::IntMatch)); + assert!(index_types.contains(&FieldIndexType::IntRange)); + } + + #[test] + #[should_panic(expected = "lookup or range must be true")] + fn integer_index_no_capabilities() { + let params = PayloadSchemaParams::Integer(IntegerIndexParams { + lookup: Some(false), + range: Some(false), + ..Default::default() + }); + let schema = PayloadFieldSchema::FieldParams(params); + let _index_types = schema_capacities(&schema); + } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/collection/src/problems/unindexed_field.rs
(11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: lint
- GitHub Check: test-consensus-compose
- GitHub Check: storage-compat-test
🔇 Additional comments (8)
lib/collection/src/problems/unindexed_field.rs (8)
520-533
: Well-designed abstraction withFieldIndexType
enumThe introduction of the
FieldIndexType
enum is an excellent improvement that explicitly represents the different index capabilities. This addresses the core issue by separating index functionality from schema representation, making it easier to track specific index requirements like lookup and range capabilities.This aligns well with the suggestion from coszio's comment about introducing a focused abstraction for index functionality.
535-573
: Clear mapping between schema and capabilitiesThe
schema_capacities
function provides a clean mapping fromPayloadFieldSchema
to the specific index capabilities it supports. The special handling for parametrized integer indexes is particularly important, as it correctly extracts whether the index supports lookup, range, or both capabilities.The debug assertion at lines 557-560 is a good safety check to ensure that at least one of lookup or range is enabled for integer indexes.
549-563
: Comprehensive handling of integer index parametersThe implementation of integer index parameters handling is well-designed, checking for both lookup and range capabilities explicitly. This is the core of the fix for detecting missing indexes based on actual requirements.
Note that the implementation properly handles the feedback mentioned in past reviews where the suggested approach couldn't be used due to other fields like
is_principal
&on_disk
preventing a complete equality test.
575-606
: Clean conversion from index type to schemaThe
From<FieldIndexType> for PayloadFieldSchema
implementation provides a clean way to convert from internal index types back to the corresponding schema representation. The implementation for integer indexes correctly creates appropriateIntegerIndexParams
with the specific capability enabled.
395-408
: Improved index capability matchingThe updated
needs_index
method now performs a more granular capability check rather than a simple kind equality check. This allows for proper detection of missing indexes based on specific capabilities (lookup vs range).
169-256
: Consistent index type inferenceThe changes to functions like
infer_index_from_match_value
,infer_index_from_any_variants
, andinfer_index_from_field_condition
are consistent with the new design, correctly returningVec<FieldIndexType>
that explicitly represent required index capabilities.
349-392
: Clear separation of concerns in extractionThe
update_from_condition
method now has a cleaner separation of concerns: it collects the required index types first, and only converts them to schema representations when adding them to theunindexed_schema
. This makes the code more maintainable and easier to understand.
510-517
: Consistent index handling in expressionsThe changes to
update_from_expression
maintain consistency with the approach used inupdate_from_condition
, ensuring that the handling of index requirements is uniform across different contexts.
@coszio can you please have a look and tell me if the approach is what you had in mind 🙏 |
FieldIndexType::IntMatch => { | ||
let params = IntegerIndexParams { | ||
lookup: Some(true), | ||
..Default::default() | ||
}; | ||
PayloadFieldSchema::FieldParams(PayloadSchemaParams::Integer(params)) | ||
} | ||
FieldIndexType::IntRange => { | ||
let params = IntegerIndexParams { | ||
range: Some(true), | ||
..Default::default() | ||
}; | ||
PayloadFieldSchema::FieldParams(PayloadSchemaParams::Integer(params)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should treat the parametrized versions as advanced usage. Since we don't issue a recommendation for on_disk/is_tenant/etc settings, I'd say we can just convert to unparametrized type index here.
Moreover, this change would make it more obvious to convert to PayloadSchemaType
instead.
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8bc3033 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
lib/collection/src/problems/unindexed_field.rs (1)
591-604
: Consider handling more parameter combinationsYour implementation only provides conversions for single-capability integer indexes (either lookup or range). Consider adding a combined option that sets both capabilities when needed.
match val { FieldIndexType::IntMatch => { let params = IntegerIndexParams { lookup: Some(true), ..Default::default() }; PayloadFieldSchema::FieldParams(PayloadSchemaParams::Integer(params)) } FieldIndexType::IntRange => { let params = IntegerIndexParams { range: Some(true), ..Default::default() }; PayloadFieldSchema::FieldParams(PayloadSchemaParams::Integer(params)) } + // Consider adding this combined option if needed in the future + // FieldIndexType::IntBoth => { + // let params = IntegerIndexParams { + // lookup: Some(true), + // range: Some(true), + // ..Default::default() + // }; + // PayloadFieldSchema::FieldParams(PayloadSchemaParams::Integer(params)) + // } FieldIndexType::KeywordMatch => {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/collection/src/problems/unindexed_field.rs
(11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-low-resources
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consistency
- GitHub Check: integration-tests
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: storage-compat-test
- GitHub Check: lint
🔇 Additional comments (6)
lib/collection/src/problems/unindexed_field.rs (6)
524-537
: Good abstraction with the newFieldIndexType
enumCreating a dedicated enum for index capabilities aligns well with the PR objectives to explicitly capture indexing requirements. This makes the code more self-documenting and easier to maintain compared to the previous approach.
539-586
: Robust schema capability mapping implementationThe
schema_capabilities
function provides a clear mapping between schema types and their index capabilities. The special handling for parametrized integer indexes (lines 560-573) correctly implements the lookup and range requirements.Note that the debug assertion on line 567-570 ensures that integer indexes always have at least one capability enabled, which is a good safeguard.
588-619
: Effective conversion fromFieldIndexType
toPayloadFieldSchema
The
From
implementation correctly maps each index type to the appropriate schema structure. For theIntMatch
andIntRange
cases, you've correctly set only the required parameter (lookup or range) to true and left other parameters at their defaults.
395-408
: Improved index capability checkingThe updated
needs_index
method now correctly compares index capabilities more granularly instead of the previous simplistic kind equality check. This ensures that fields with specific index requirements (like integer fields with only lookup enabled) are properly detected when missing those capabilities.
591-604
: Parametrized index handling aligns with PR objectivesThe explicit handling of parametrized integer indexes with separate lookups for
lookup
andrange
capabilities directly addresses the PR objective of handling these requirements separately, as mentioned in the PR summary.
164-167
: Improved iterator implementation with EnumIterUsing the
EnumIter
trait fromstrum
to iterate over all enum variants is a clean approach that reduces manual maintenance when adding new index types.
PayloadSchemaParams::Integer(integer_index_params) => { | ||
if integer_index_params.lookup == Some(true) { | ||
index_types.insert(FieldIndexType::IntMatch); | ||
} | ||
if integer_index_params.range == Some(true) { | ||
index_types.insert(FieldIndexType::IntRange); | ||
} | ||
debug_assert!( | ||
!index_types.is_empty(), | ||
"lookup or range must be true for Integer payload index" | ||
); | ||
// unifying match arm types | ||
true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Consider handling the case when both lookup and range are false
Currently the debug assertion triggers when both lookup and range are false, but this could potentially happen in real-world scenarios. Consider how to handle this gracefully rather than relying on the debug assertion.
Run this script to check if there are any existing integer indexes with both lookup and range set to false:
🏁 Script executed:
#!/bin/bash
# Search for integer index parameters where both lookup and range might be false
# Look for places where IntegerIndexParams is created or used
rg "IntegerIndexParams" --type rust -A 5 -B 5
Length of output: 24820
🏁 Script executed:
#!/bin/bash
# Extract the Validate impl for qdrant::IntegerIndexParams
rg -n "impl Validate for super::qdrant::IntegerIndexParams" -A 20 -B 2 lib/api/src/grpc/validate.rs
Length of output: 753
``
🏁 Script executed:
#!/bin/bash
# Locate the definition of validate_integer_index_params
rg -n "validate_integer_index_params" -A 10 -B 5 lib/api/src/grpc/validate.rs
Length of output: 1095
🏁 Script executed:
#!/bin/bash
# Locate the definition of validate_integer_index_params in segment data types
rg -n "validate_integer_index_params" -A 20 -B 5 lib/segment/src/data_types/index.rs
Length of output: 975
Handle invalid IntegerIndexParams when neither lookup nor range is set
In release builds the debug_assert!
is a no-op, and the code will silently proceed with an empty index_types
if both lookup
and range
are unset (or None
). We should explicitly catch “no capabilities enabled” and surface it as an error (or return false
), rather than relying on debug_assert!
.
Locations to update:
- In
lib/collection/src/problems/unindexed_field.rs
(around lines 560–573), replace thedebug_assert!
with explicit handling for the case where neitherlookup == Some(true)
norrange == Some(true)
—for example, returnfalse
or emit anIssue
. - (Optional) Align the shared validation in
segment::data_types::index::validate_integer_index_params
(and the gRPCValidate
impl) to also treatNone
/None
as invalid, or document whyNone
defaults are acceptable.
🤖 Prompt for AI Agents
In lib/collection/src/problems/unindexed_field.rs around lines 560 to 573,
replace the debug_assert that checks if either lookup or range is true with
explicit handling for the case when both are false or None. Instead of
asserting, check if index_types is empty after evaluating lookup and range, and
if so, return false or emit an appropriate Issue to indicate invalid
IntegerIndexParams. This ensures the code gracefully handles invalid
configurations in release builds rather than silently proceeding.
afa5f13
to
092a3f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
lib/collection/src/problems/unindexed_field.rs (1)
539-586
:⚠️ Potential issueReplace debug_assert with explicit error handling
The
debug_assert
on lines 567-570 could be problematic in release builds, as assertions are stripped out during release compilation. For integer indexes, the case where neither lookup nor range is true should be handled explicitly.if integer_index_params.range == Some(true) { index_types.insert(FieldIndexType::IntRange); } -debug_assert!( - !index_types.is_empty(), - "lookup or range must be true for Integer payload index" -); +// Handle the case where neither lookup nor range is enabled +if index_types.is_empty() { + // Either return false or add a sensible default + log::warn!("Integer index with neither lookup nor range capability detected"); + // Default to providing at least one capability to avoid silent failure + index_types.insert(FieldIndexType::IntMatch); +} // unifying match arm types true
🧹 Nitpick comments (1)
lib/collection/src/problems/unindexed_field.rs (1)
588-619
: Consider simplifyingFrom<FieldIndexType>
to use unparametrized typesAs mentioned in a previous comment by coszio, it might be better to convert to unparametrized types for simplicity, especially since we don't make recommendations for other parameters like
on_disk
oris_tenant
.For integer indexes, consider using
PayloadFieldSchema::FieldType(PayloadSchemaType::Integer)
instead of creating parametrized versions, or provide a clearer explanation in a comment about why parametrized versions are needed here.impl From<FieldIndexType> for PayloadFieldSchema { fn from(val: FieldIndexType) -> Self { match val { - FieldIndexType::IntMatch => { - let params = IntegerIndexParams { - lookup: Some(true), - ..Default::default() - }; - PayloadFieldSchema::FieldParams(PayloadSchemaParams::Integer(params)) - } - FieldIndexType::IntRange => { - let params = IntegerIndexParams { - range: Some(true), - ..Default::default() - }; - PayloadFieldSchema::FieldParams(PayloadSchemaParams::Integer(params)) - } + FieldIndexType::IntMatch | FieldIndexType::IntRange => { + // Using unparametrized type as we're just recommending the basic index type + PayloadFieldSchema::FieldType(PayloadSchemaType::Integer) + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/problems/unindexed_field.rs
(11 hunks)tests/openapi/test_strictmode.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/openapi/test_strictmode.py
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus-compose
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consistency
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
🔇 Additional comments (4)
lib/collection/src/problems/unindexed_field.rs (4)
523-537
: Good introduction ofFieldIndexType
enumThe new
FieldIndexType
enum clearly defines the distinct index capabilities, which is a good abstraction that separates type from functionality requirements. This aligns with the feedback received in the PR comments about creating a more structured representation of index functionality.
386-392
: Good use of iterators for schema conversionThe process of converting from
FieldIndexType
toPayloadFieldSchema
and extending the schema collection is well implemented using iterators, which is efficient and idiomatic Rust.
395-408
: Improved index capability check logicThe
needs_index
method has been improved to check specific index capabilities rather than just the schema kind. This allows for more granular checking of whether an existing index supports the required operations.
169-256
: Good refactoring of inference functionsThe index inference functions have been updated to return vectors of
FieldIndexType
which makes the code more explicit about the required index capabilities. This is a good improvement over the previous version which returnedPayloadFieldSchema
directly.
#[cfg(test)] | ||
mod tests { | ||
use super::*; | ||
#[test] | ||
fn integer_index_capacities() { | ||
let params = PayloadSchemaParams::Integer(IntegerIndexParams { | ||
lookup: Some(true), | ||
range: Some(true), | ||
..Default::default() | ||
}); | ||
let schema = PayloadFieldSchema::FieldParams(params); | ||
let index_types = schema_capabilities(&schema); | ||
assert!(index_types.contains(&FieldIndexType::IntMatch)); | ||
assert!(index_types.contains(&FieldIndexType::IntRange)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Expand test coverage for index capabilities
The current test only verifies that integer indexes with both lookup and range set to true map to both IntMatch and IntRange capabilities. Tests for other cases are missing, as mentioned in previous review comments.
Please add these additional test cases to thoroughly validate the integer index mapping logic:
#[test]
fn integer_index_lookup_only() {
let params = PayloadSchemaParams::Integer(IntegerIndexParams {
lookup: Some(true),
range: Some(false),
..Default::default()
});
let schema = PayloadFieldSchema::FieldParams(params);
let index_types = schema_capabilities(&schema);
assert!(index_types.contains(&FieldIndexType::IntMatch));
assert!(!index_types.contains(&FieldIndexType::IntRange));
}
#[test]
fn integer_index_range_only() {
let params = PayloadSchemaParams::Integer(IntegerIndexParams {
lookup: Some(false),
range: Some(true),
..Default::default()
});
let schema = PayloadFieldSchema::FieldParams(params);
let index_types = schema_capabilities(&schema);
assert!(!index_types.contains(&FieldIndexType::IntMatch));
assert!(index_types.contains(&FieldIndexType::IntRange));
}
#[test]
fn integer_index_none_enabled() {
// Test case for when neither lookup nor range is enabled
let params = PayloadSchemaParams::Integer(IntegerIndexParams {
lookup: Some(false),
range: Some(false),
..Default::default()
});
let schema = PayloadFieldSchema::FieldParams(params);
let index_types = schema_capabilities(&schema);
// This assertion will depend on how you handle the empty case
// after removing the debug_assert
}
🏁 Script executed:
#!/bin/bash
# Let's check how comprehensive the current testing is for the functionality
# First, check if there are any other tests in the file
rg -l "fn.*test" lib/collection/src/problems/unindexed_field.rs
# Second, check if there are tests for other combinations of lookup/range flags
rg -A5 -B5 "lookup.*range|range.*lookup" --type rust lib/collection/src/problems/unindexed_field.rs
Length of output: 610
Expand test coverage for integer index capabilities
The existing test only covers the case where both lookup
and range
are Some(true)
. We should add the following tests to ensure all flag combinations are validated:
• integer_index_lookup_only
– only lookup = true
• integer_index_range_only
– only range = true
• integer_index_none_enabled
– both flags false (see handling of the empty‐set case)
Suggested additions in lib/collection/src/problems/unindexed_field.rs
:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn integer_index_capacities() {
let params = PayloadSchemaParams::Integer(IntegerIndexParams {
lookup: Some(true),
range: Some(true),
..Default::default()
});
let schema = PayloadFieldSchema::FieldParams(params);
let index_types = schema_capabilities(&schema);
assert!(index_types.contains(&FieldIndexType::IntMatch));
assert!(index_types.contains(&FieldIndexType::IntRange));
}
#[test]
fn integer_index_lookup_only() {
let params = PayloadSchemaParams::Integer(IntegerIndexParams {
lookup: Some(true),
range: Some(false),
..Default::default()
});
let schema = PayloadFieldSchema::FieldParams(params);
let index_types = schema_capabilities(&schema);
assert!(index_types.contains(&FieldIndexType::IntMatch));
assert!(!index_types.contains(&FieldIndexType::IntRange));
}
#[test]
fn integer_index_range_only() {
let params = PayloadSchemaParams::Integer(IntegerIndexParams {
lookup: Some(false),
range: Some(true),
..Default::default()
});
let schema = PayloadFieldSchema::FieldParams(params);
let index_types = schema_capabilities(&schema);
assert!(!index_types.contains(&FieldIndexType::IntMatch));
assert!(index_types.contains(&FieldIndexType::IntRange));
}
#[test]
#[should_panic(expected = "lookup or range must be true for Integer payload index")]
fn integer_index_none_enabled() {
let params = PayloadSchemaParams::Integer(IntegerIndexParams {
lookup: Some(false),
range: Some(false),
..Default::default()
});
let schema = PayloadFieldSchema::FieldParams(params);
// debug_assert enforces at least one true; expecting a panic here
let _ = schema_capabilities(&schema);
}
}
🤖 Prompt for AI Agents
In lib/collection/src/problems/unindexed_field.rs around lines 621 to 636, the
existing test only covers the case where both lookup and range are true for
integer index capabilities. Add three new tests: one where only lookup is true
and range is false, asserting IntMatch is present but IntRange is not; one where
only range is true and lookup is false, asserting IntRange is present but
IntMatch is not; and one where both lookup and range are false, which should
trigger a panic due to the debug_assert in schema_capabilities. Implement these
tests inside the #[cfg(test)] mod tests block to fully validate all combinations
of integer index flags.
092a3f9
to
9d3029c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
lib/collection/src/problems/unindexed_field.rs (3)
621-636
: 🛠️ Refactor suggestionExpand test coverage for integer index parameters
The test for integer index capabilities is a good start, but it only covers the case where both
lookup
andrange
aretrue
. As mentioned in previous review comments, you should add tests for other combinations: lookup-only, range-only, and neither enabled (which should trigger the debug assertion).#[test] fn integer_index_lookup_only() { let params = PayloadSchemaParams::Integer(IntegerIndexParams { lookup: Some(true), range: Some(false), ..Default::default() }); let schema = PayloadFieldSchema::FieldParams(params); let index_types = schema_capabilities(&schema); assert!(index_types.contains(&FieldIndexType::IntMatch)); assert!(!index_types.contains(&FieldIndexType::IntRange)); } #[test] fn integer_index_range_only() { let params = PayloadSchemaParams::Integer(IntegerIndexParams { lookup: Some(false), range: Some(true), ..Default::default() }); let schema = PayloadFieldSchema::FieldParams(params); let index_types = schema_capabilities(&schema); assert!(!index_types.contains(&FieldIndexType::IntMatch)); assert!(index_types.contains(&FieldIndexType::IntRange)); } #[test] #[should_panic(expected = "lookup or range must be true for Integer payload index")] fn integer_index_none_enabled() { let params = PayloadSchemaParams::Integer(IntegerIndexParams { lookup: Some(false), range: Some(false), ..Default::default() }); let schema = PayloadFieldSchema::FieldParams(params); let _ = schema_capabilities(&schema); }
560-573
:⚠️ Potential issueHandle case when both lookup and range are false
The current implementation uses a debug assertion that is only enforced in debug builds. In release builds, this assertion is ignored, potentially allowing an invalid state where both lookup and range are false.
As noted in the previous review comment, you should replace this debug assertion with explicit error handling to ensure the code behaves correctly in all build configurations.
if integer_index_params.lookup == Some(true) { index_types.insert(FieldIndexType::IntMatch); } if integer_index_params.range == Some(true) { index_types.insert(FieldIndexType::IntRange); } - debug_assert!( - !index_types.is_empty(), - "lookup or range must be true for Integer payload index" - ); - // unifying match arm types - true + if index_types.is_empty() { + // Both lookup and range are false or None, which is an invalid configuration + log::warn!("Invalid integer index configuration: both lookup and range are disabled"); + false + } else { + true + }
539-586
: 🛠️ Refactor suggestionStrong implementation of schema capability mapping
The
schema_capabilities
function properly mapsPayloadFieldSchema
to index capabilities, with good handling of parametrized integer indexes that separately track lookup and range capabilities.However, there's a potential issue with the debug assertion at lines 567-570. In release builds, this assertion would be compiled out, potentially allowing both
lookup
andrange
to be false without any error handling.- debug_assert!( - !index_types.is_empty(), - "lookup or range must be true for Integer payload index" - ); - // unifying match arm types - true + if index_types.is_empty() { + // Handle the case where both lookup and range are false + log::warn!("Integer payload index with neither lookup nor range capability"); + // Return false to indicate no capabilities, or consider adding a default capability + false + } else { + true + }
🧹 Nitpick comments (1)
lib/collection/src/problems/unindexed_field.rs (1)
588-619
: Implementation creates appropriate schema recommendationsThe
From<FieldIndexType>
implementation properly creates schema representations for issue recommendations. For parameterizable types likeIntMatch
andIntRange
, it creates field parameters with appropriate options, and for basic types it uses field type schemas.A minor observation: both
UuidMatch
andUuidRange
convert to the samePayloadSchemaType::Uuid
(lines 611-612). This appears to match the system design where UUID fields inherently support both matching and range operations, but it might be worth adding a comment to clarify this intentional behavior.FieldIndexType::KeywordMatch => { PayloadFieldSchema::FieldType(PayloadSchemaType::Keyword) } FieldIndexType::FloatRange => PayloadFieldSchema::FieldType(PayloadSchemaType::Float), FieldIndexType::Text => PayloadFieldSchema::FieldType(PayloadSchemaType::Text), FieldIndexType::BoolMatch => PayloadFieldSchema::FieldType(PayloadSchemaType::Bool), FieldIndexType::UuidMatch => PayloadFieldSchema::FieldType(PayloadSchemaType::Uuid), - FieldIndexType::UuidRange => PayloadFieldSchema::FieldType(PayloadSchemaType::Uuid), + FieldIndexType::UuidRange => { + // UUID fields inherently support both match and range operations + PayloadFieldSchema::FieldType(PayloadSchemaType::Uuid) + }, FieldIndexType::DatetimeRange => { PayloadFieldSchema::FieldType(PayloadSchemaType::Datetime) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/problems/unindexed_field.rs
(11 hunks)tests/openapi/test_strictmode.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/openapi/test_strictmode.py
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-low-resources
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: lint
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
🔇 Additional comments (3)
lib/collection/src/problems/unindexed_field.rs (3)
524-537
: Well-designed enum for explicit capability representationThe introduction of
FieldIndexType
is an excellent design choice that explicitly models index capabilities instead of relying on schema representations. This aligns well with coszio's suggestion to introduce an abstraction focused on the functionality of indexes, making the code more maintainable and easier to understand.
386-392
: Simplified index generation and schema update logicThe updated approach cleanly separates index capability detection from schema generation. The code now first determines required capabilities (as
FieldIndexType
), then converts them to appropriate schema recommendations only when updating the unindexed schema map. This is much clearer than the previous approach.
395-407
: Improved granularity in index capability checksThe revised
needs_index
method now checks if the existing payload schema supports any of the requiredFieldIndexType
s by mapping the schema to its capabilities. This provides more granular control over index requirements matching and aligns with coszio's suggestion to focus on the functionality aspects of indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
lib/collection/src/problems/unindexed_field.rs (3)
610-628
: 🛠️ Refactor suggestionExpand test coverage for integer index parameters
The current test only verifies that integer indexes with both lookup and range set to true map to both
IntMatch
andIntRange
capabilities. Tests for other combinations are missing.Add tests for the following cases:
- lookup = true, range = false → only
FieldIndexType::IntMatch
is inserted- lookup = false, range = true → only
FieldIndexType::IntRange
is inserted- lookup = false, range = false → should trigger the assertion or handle the empty case
#[test] fn integer_index_capacities() { let params = PayloadSchemaParams::Integer(IntegerIndexParams { lookup: Some(true), range: Some(true), ..Default::default() }); let schema = PayloadFieldSchema::FieldParams(params); let index_types = schema_capabilities(&schema); assert!(index_types.contains(&FieldIndexType::IntMatch)); assert!(index_types.contains(&FieldIndexType::IntRange)); } + +#[test] +fn integer_index_lookup_only() { + let params = PayloadSchemaParams::Integer(IntegerIndexParams { + lookup: Some(true), + range: Some(false), + ..Default::default() + }); + let schema = PayloadFieldSchema::FieldParams(params); + let index_types = schema_capabilities(&schema); + assert!(index_types.contains(&FieldIndexType::IntMatch)); + assert!(!index_types.contains(&FieldIndexType::IntRange)); +} + +#[test] +fn integer_index_range_only() { + let params = PayloadSchemaParams::Integer(IntegerIndexParams { + lookup: Some(false), + range: Some(true), + ..Default::default() + }); + let schema = PayloadFieldSchema::FieldParams(params); + let index_types = schema_capabilities(&schema); + assert!(!index_types.contains(&FieldIndexType::IntMatch)); + assert!(index_types.contains(&FieldIndexType::IntRange)); +}
564-578
:⚠️ Potential issueHandle the case when both lookup and range are false in IntegerIndexParams
This is a response to the previous review comment about handling invalid IntegerIndexParams. As suggested, we should explicitly handle the case when both lookup and range are false (or None) rather than relying on the debug assertion.
Replace the debug assertion with explicit handling:
if integer_index_params.lookup == Some(true) { index_types.insert(FieldIndexType::IntMatch); } if integer_index_params.range == Some(true) { index_types.insert(FieldIndexType::IntRange); } -debug_assert!( - !index_types.is_empty(), - "lookup or range must be true for Integer payload index" -); +if index_types.is_empty() { + // Either return false to indicate this does not satisfy indexing requirements + // or insert a default capability + return false; +} // unifying match arm types true
544-591
: 🛠️ Refactor suggestionAdd explicit error handling for invalid integer index parameters
Currently, the
debug_assert!
on lines 572-575 checks that at least one oflookup
orrange
is true for integer indexes. However, this assertion is skipped in release builds, potentially allowing invalid configurations through.Consider replacing the debug assertion with explicit error handling:
if integer_index_params.lookup == Some(true) { index_types.insert(FieldIndexType::IntMatch); } if integer_index_params.range == Some(true) { index_types.insert(FieldIndexType::IntRange); } -debug_assert!( - !index_types.is_empty(), - "lookup or range must be true for Integer payload index" -); +if index_types.is_empty() { + log::warn!("Neither lookup nor range is enabled for Integer payload index"); + // Provide a sensible default or return false to indicate invalid configuration + index_types.insert(FieldIndexType::IntRange); // Default to range capability +} // unifying match arm types true
🧹 Nitpick comments (1)
lib/collection/src/problems/unindexed_field.rs (1)
593-608
: Consider preserving index capability information in the schema conversionThe current conversion from
FieldIndexType
toPayloadSchemaType
loses information about whether an integer index needs lookup or range capability. This might lead to generating suboptimal schema recommendations.You could consider extending the conversion to preserve capability information:
impl From<FieldIndexType> for PayloadSchemaType { fn from(val: FieldIndexType) -> Self { match val { - FieldIndexType::IntMatch => PayloadSchemaType::Integer, - FieldIndexType::IntRange => PayloadSchemaType::Integer, + FieldIndexType::IntMatch | FieldIndexType::IntRange => PayloadSchemaType::Integer, FieldIndexType::KeywordMatch => PayloadSchemaType::Keyword, FieldIndexType::FloatRange => PayloadSchemaType::Float, FieldIndexType::Text => PayloadSchemaType::Text, FieldIndexType::BoolMatch => PayloadSchemaType::Bool, - FieldIndexType::UuidMatch => PayloadSchemaType::Uuid, - FieldIndexType::UuidRange => PayloadSchemaType::Uuid, + FieldIndexType::UuidMatch | FieldIndexType::UuidRange => PayloadSchemaType::Uuid, FieldIndexType::DatetimeRange => PayloadSchemaType::Datetime, FieldIndexType::Geo => PayloadSchemaType::Geo, } } }Then, you could also consider implementing
From<FieldIndexType> for PayloadSchemaParams
to generate more specific schema recommendations:impl From<FieldIndexType> for PayloadSchemaParams { fn from(val: FieldIndexType) -> Self { match val { FieldIndexType::IntMatch => PayloadSchemaParams::Integer(IntegerIndexParams { lookup: Some(true), range: Some(false), ..Default::default() }), FieldIndexType::IntRange => PayloadSchemaParams::Integer(IntegerIndexParams { lookup: Some(false), range: Some(true), ..Default::default() }), // ... other conversions ... } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/collection/src/problems/unindexed_field.rs
(11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-low-resources
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: integration-tests
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: lint
- GitHub Check: storage-compat-test
🔇 Additional comments (3)
lib/collection/src/problems/unindexed_field.rs (3)
529-542
: Good introduction of theFieldIndexType
enum!This enum clearly represents specific index capabilities, making it easier to explicitly track lookup and range requirements. This approach is more maintainable than using
PayloadFieldSchema
vectors directly.
163-166
: Good refactoring of theall_indexes
function using enum iterationUsing
FieldIndexType::iter()
makes the code more maintainable, as adding new index types to the enum will automatically include them in the returned iterator.
397-410
:❌ Incorrect review comment
Consider checking all required index types instead of any
The
needs_index
method checks if any of the required index types are supported by the schema. In cases where multiple capabilities are needed, it might be more accurate to check if all required index types are supported.
🏁 Script executed:
#!/bin/bash # Let's check if there are cases where multiple index types are required rg -A5 -B5 "required_index.push\(.*\).*required_index.push\(" --type rust lib/collection/src/problems/unindexed_field.rsLength of output: 122
I’d like to inspect how
required_indexes
is built and whereneeds_index
is called. Let’s pull in those contexts:
🏁 Script executed:
#!/bin/bash # 1) Find all invocations of .needs_index() rg -n -C3 "\.needs_index" --type rust lib/collection # 2) Locate where `required_indexes` is initialized and populated rg -n -C5 "let required_indexes" --type rust lib/collection/src/problemsLength of output: 1284
I want to see how
required_index
is constructed to know if it can ever contain multiple entries:
🏁 Script executed:
#!/bin/bash # Find where `required_index` is defined and populated rg -n 'required_index' -C5 --type rust lib/collection/src/problems/unindexed_field.rsLength of output: 6069
.any()
is correct for OR-based index requirementsThe
required_indexes
slice isn’t a list of all capabilities you must have, but rather a set of acceptable index types (i.e. any one of them will satisfy the query). Throughoutinfer_index_from_*
andall_indexes()
, you’re assembling a list of alternatives, not conjunctive requirements. Using.any()
here correctly skips adding an index as soon as you find one supported type.No changes needed.
Likely an incorrect or invalid review comment.
…6496) * Fix UnindexedField infra to handle lookup&range index requirements * more generic approach * better test * restore * small unit test for sanity * review nits * Do not recommend parametrized index --------- Co-authored-by: Luis Cossío <luis.cossio@outlook.com>
…6496) * Fix UnindexedField infra to handle lookup&range index requirements * more generic approach * better test * restore * small unit test for sanity * review nits * Do not recommend parametrized index --------- Co-authored-by: Luis Cossío <luis.cossio@outlook.com>
Follow up to #6363 (comment)
This PR fills a hole in the missing index detection where the
lookup
&range
capabilities were not taken into account.Also no fix required for
Formula
because of #6494