-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[strict mode | score boosting] Strict mode for formula queries #6317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThe changes update multiple areas of the codebase. The type for the ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
lib/storage/src/issues_subscribers.rs (1)
18-18
: Consistent module path update for method call.The method call has been updated to use the fully qualified path
collection::problems::UnindexedField::submit_possible_suspects
instead of just using the importedUnindexedField
.Since you've already imported
UnindexedField
on line 4, you could simplify this to:- collection::problems::UnindexedField::submit_possible_suspects( + UnindexedField::submit_possible_suspects(lib/collection/src/operations/verification/query.rs (2)
108-136
: Strict mode check for unindexed fields inCollectionQueryRequest
This method correctly returns an error when an expression references an unindexed field withunindexed_filtering_retrieve == Some(false)
. As an optional improvement, consider logging or reporting metrics when such errors occur repeatedly.
160-187
: Duplicate strict mode logic inCollectionQueryGroupsRequest
This block mirrors theCollectionQueryRequest
logic. Factoring out a shared helper method could reduce repetition and potential future divergence.lib/collection/src/collection/payload_index_schema.rs (1)
107-110
: Delegate toone_unindexed_filter_key
The originalone_unindexed_key
function now hands off filter checks to a more specialized method. Consider renaming for clarity since it’s focused on filter usage.lib/collection/src/problems/unindexed_field.rs (2)
405-431
:needs_index
method
Compares inferred schemas to existing indexes, returningtrue
only if the appropriate schema kind is missing. The TODO regarding parameter comparisons for integer indexes is valid.
433-541
:update_from_expression
coverage
Thoroughly enumerates everyExpressionInternal
variant, extracting relevant JSON paths. Consider unit tests for complex nested scenarios to ensure correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (14)
lib/api/src/rest/schema.rs
(1 hunks)lib/collection/Cargo.toml
(1 hunks)lib/collection/src/collection/payload_index_schema.rs
(3 hunks)lib/collection/src/collection/search.rs
(1 hunks)lib/collection/src/lib.rs
(1 hunks)lib/collection/src/operations/universal_query/formula.rs
(3 hunks)lib/collection/src/operations/universal_query/shard_query.rs
(1 hunks)lib/collection/src/operations/verification/query.rs
(3 hunks)lib/collection/src/problems/unindexed_field.rs
(5 hunks)lib/collection/src/tests/payload.rs
(5 hunks)lib/segment/Cargo.toml
(0 hunks)lib/segment/src/lib.rs
(0 hunks)lib/storage/src/issues_subscribers.rs
(2 hunks)src/issues_setup.rs
(1 hunks)
💤 Files with no reviewable changes (2)
- lib/segment/src/lib.rs
- lib/segment/Cargo.toml
🧰 Additional context used
🧬 Code Definitions (4)
lib/collection/src/tests/payload.rs (1)
lib/collection/src/collection/payload_index_schema.rs (1)
one_unindexed_filter_key
(157-162)
lib/collection/src/collection/search.rs (1)
lib/collection/src/problems/unindexed_field.rs (1)
slow_query_threshold
(36-38)
lib/storage/src/issues_subscribers.rs (1)
lib/collection/src/problems/unindexed_field.rs (1)
submit_possible_suspects
(90-103)
lib/collection/src/collection/payload_index_schema.rs (1)
lib/collection/src/problems/unindexed_field.rs (2)
new
(275-286)new
(331-336)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (22)
lib/collection/src/lib.rs (1)
12-12
: Addition of problems module completes the module relocationThis change makes the new
problems
module publicly accessible from the collection crate, which aligns with the PR objective of relocating the module from segment to collection to facilitate evaluation of variables withinExpressionInternal
.lib/collection/Cargo.toml (1)
61-61
: Appropriate addition of HTTP dependencyAdding the HTTP crate as a workspace dependency is necessary to support the
JsonPath
type usage in the REST API's datetime key expression. This aligns with the PR's goal of enhancing formula query handling.src/issues_setup.rs (1)
4-4
: Import path updated to reflect module relocationThe import path has been correctly updated to use the relocated
problems
module from the collection crate instead of the segment crate. This change maintains consistency with the architectural changes in this PR.lib/api/src/rest/schema.rs (1)
685-685
: Type enhancement for datetime_key from String to JsonPathChanging the type from
String
toJsonPath
provides better type safety and validation for datetime keys in formula expressions. This change directly supports the PR objective of implementing strict mode for formula queries, by ensuring that datetime keys are properly validated as JSON paths.lib/collection/src/collection/search.rs (1)
320-320
: Update to import path for consistent module structure.The code has been updated to use
crate::problems::UnindexedField
instead ofsegment::problems::UnindexedField
, reflecting the relocation of theproblems
module from the segment to the collection crate as mentioned in the PR objectives.lib/collection/src/operations/universal_query/shard_query.rs (1)
404-409
: Improved type safety and error handling for datetime keys.The implementation now parses the
dt_key
string into aJsonPath
object instead of using it directly. This change:
- Adds validation at parsing time, rejecting invalid keys
- Provides clear error messages for invalid inputs
- Uses a strongly-typed
JsonPath
object instead of a raw string- Aligns with the PR objective of using
JsonPath
directly for datetime keysThis is a good improvement for type safety and error handling.
lib/storage/src/issues_subscribers.rs (1)
4-4
: Updated import to reflect relocated module.The import has been updated to use
collection::problems::UnindexedField
instead ofsegment::problems::UnindexedField
, consistent with the module relocation mentioned in the PR objectives.lib/collection/src/tests/payload.rs (1)
72-72
: Updated method name to reflect specialized functionality.All instances of
one_unindexed_key()
have been updated toone_unindexed_filter_key()
to match the renamed method inPayloadIndexSchema
. This aligns with the code changes where the original method was refactored to handle both filters and expressions using thePotentiallyUnindexed
enum, with specialized wrapper methods for specific use cases.The update is consistent across all occurrences in the test, ensuring continued correct functionality while supporting the PR objective to implement checks for unindexed fields in formula expressions.
Also applies to: 91-91, 114-114, 143-143
lib/collection/src/operations/universal_query/formula.rs (3)
11-11
: Check import usage
This newly added import forCondition
andGeoPoint
is straightforward and consistent with the rest of the file’s usage.
32-32
: Switch fromDatetime(String)
toDatetimeKey(JsonPath)
Replacing the string-based approach with a typedJsonPath
clarifies the payload reference while reducing parsing overhead.
90-93
: NewDatetimeKey
variant logic
Inserting the JSON path intopayload_vars
and wrapping it inDatetimeExpression::PayloadVariable
looks correct. This allows date-time extraction directly from the payload.lib/collection/src/operations/verification/query.rs (2)
6-6
: Added import forCollectionError
andCollectionResult
No issues detected.
8-8
: Importing new query types
BringsCollectionQueryGroupsRequest
,CollectionQueryRequest
, andQuery
into scope for strict mode checks.lib/collection/src/collection/payload_index_schema.rs (7)
11-11
: Added import forExpressionInternal
Enables expression-based logic within payload index checks.
13-13
: Importunindexed_field
Aligns with the new unindexed logic integration.
111-119
: Introducingone_unindexed_expression_key
Allows external code to detect unindexed keys in expressions. The method name is descriptive, making the API more intuitive.
122-125
: New enumPotentiallyUnindexed
A flexible solution for handling both filters and expressions with the same unindexed key detection logic.
131-134
: Refactoredone_unindexed_key
Switching to the generic parameterPotentiallyUnindexed
ensures this function can evolve to handle multiple input types seamlessly.
137-144
: Match-based call to update methods
Explicitly differentiating betweenFilter
andExpression
withupdate_from_filter_once
orupdate_from_expression
keeps the code clean and cohesive.
154-169
: Public wrappers for filters and expressions
one_unindexed_filter_key
andone_unindexed_expression_key
provide clear, dedicated entry points for each type of unindexed key check.lib/collection/src/problems/unindexed_field.rs (2)
10-20
: Refactored imports
These imports now reference modules insegment
, aligning with the broader reorganization.
171-185
: Extended handling ofValueVariants
Correctly matchesString
,Integer
, andBool
for inference. This approach is consistent with the existing schema inference logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/collection/src/problems/unindexed_field.rs (1)
412-424
: Consider improving index comparison in a follow-upThe TODO comment correctly identifies a limitation in the current approach of comparing just the index type (kind) rather than the full index parameters. This could lead to false negatives as described in the comment.
Consider implementing a more precise comparison for parametrized indexes in a follow-up PR. This would involve:
- Enhancing the inferring step to output valid parametrized indexes
- Comparing those directly rather than just comparing the kind
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/collection/src/problems/unindexed_field.rs
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (4)
lib/collection/src/problems/unindexed_field.rs (4)
10-19
: Imports updated to reflect module relocationThe import paths have been updated to reference the
segment
module directly, which aligns with the PR objective of relocating theproblems
module from thesegment
to thecollection
crate.
171-171
: ValueVariants references updated correctlyReferences to
ValueVariants
have been properly updated to use the fully qualified path from thesegment
module, maintaining consistency with the import changes.Also applies to: 182-182, 185-185
397-431
: Good refactoring of indexing check logicExtracting the indexing check logic into a separate
needs_index
method improves code organization and reusability. This refactoring will make the code easier to maintain.Regarding the TODO comment on lines 412-424: The identified limitation in comparing parametrized indexes could lead to false negatives, particularly with integer indexes that are parametrized with either lookup or range. Consider prioritizing this improvement in a future update.
433-541
: Comprehensive implementation of expression analysisThe new
update_from_expression
method thoroughly handles various expression types for identifying unindexed fields in formula queries. This implementation:
- Correctly processes different expression types including constants, variables, conditions, and mathematical operations
- Appropriately extracts fields and infers schemas for numeric and geo operations
- Recursively handles composite expressions
- Properly updates the unindexed schema collection
This is a key implementation for the PR's objective of enforcing strict mode for formula queries.
match var { | ||
VariableId::Score(_) => return, | ||
VariableId::Payload(json_path) => { | ||
key = json_path; | ||
inferred = vec![ | ||
PayloadFieldSchema::FieldType(PayloadSchemaType::Integer), | ||
PayloadFieldSchema::FieldType(PayloadSchemaType::Float), | ||
]; | ||
} | ||
VariableId::Condition(_) => return, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Consider adding a test for VariableId handling
The parsing and handling of VariableId
types looks correct, but this is a critical part of the formula expression evaluation. Ensure there are comprehensive tests covering the different variable types (Score, Payload, Condition) to validate this logic.
🏁 Script executed:
#!/bin/bash
# Search for tests related to formula expressions and VariableId
echo "Searching for tests related to formula expressions..."
rg -A 3 -B 3 "VariableId" --glob "**/*test*.rs"
echo "Searching for tests that would exercise the update_from_expression method..."
rg -A 3 -B 3 "update_from_expression" --glob "**/*test*.rs"
Length of output: 387
ACTION REQUIRED: Add Comprehensive Tests for VariableId Handling
It appears that our test suite currently lacks coverage for the different VariableId
variants. We were unable to locate any tests exercising the formula expression update logic, including the handling of Score, Payload, and Condition types. Please add comprehensive tests to ensure that each variant is correctly parsed and processed.
- Ensure tests cover:
VariableId::Score
(should bypass further processing)VariableId::Payload
(valid payload schema assignment)VariableId::Condition
(should also bypass processing)
- Consider updating or creating tests in your test suite to validate the behavior in critical formula evaluation paths.
Thank you very much for adding this 👍 Please add an integration test in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/openapi/test_strictmode.py (1)
1357-1357
: Remove debug print statement.There's a debugging print statement that should be removed before merging this PR.
- print(query_fail.json())
lib/collection/src/operations/verification/query.rs (2)
80-97
: Consider using consistent return type.The
check_custom
method forCollectionPrefetch
usesResult<(), crate::operations::types::CollectionError>
as its return type, while the same method for other types uses the type aliasCollectionResult<()>
. For consistency, consider using the type alias here as well.- ) -> Result<(), crate::operations::types::CollectionError> { + ) -> CollectionResult<()> {
40-78
: Consider reducing code duplication in check_custom implementations.The
check_custom
implementations forCollectionQueryRequest
,CollectionPrefetch
, andCollectionQueryGroupsRequest
contain similar logic. Consider extracting the common pattern into a helper function to reduce duplication.You could create a helper trait method that takes a query option and prefetch option:
trait StrictModeHelpers { fn check_query_and_prefetch( &self, query: Option<&Query>, prefetch: Option<&[CollectionPrefetch]>, collection: &Collection, strict_mode_config: &StrictModeConfig, ) -> CollectionResult<()>; } impl StrictModeHelpers for StrictModeVerification { fn check_query_and_prefetch( &self, query: Option<&Query>, prefetch: Option<&[CollectionPrefetch]>, collection: &Collection, strict_mode_config: &StrictModeConfig, ) -> CollectionResult<()> { if let Some(prefetch_items) = prefetch { for prefetch in prefetch_items { Box::pin(prefetch.check_strict_mode(collection, strict_mode_config)).await?; } } if let Some(query) = query { query.check_strict_mode(collection, strict_mode_config)?; } Ok(()) } }Then in each implementation:
async fn check_custom(&self, collection: &Collection, strict_mode_config: &StrictModeConfig) -> CollectionResult<()> { self.check_query_and_prefetch( self.query.as_ref(), Some(&self.prefetch), collection, strict_mode_config ) }This approach would need to be adapted to your specific requirements and may require async syntax adjustments.
Also applies to: 80-118, 120-152
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/operations/verification/query.rs
(3 hunks)tests/openapi/test_strictmode.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/openapi/test_strictmode.py
1326-1326: Redefinition of unused collection_name
from line 3
(F811)
🔇 Additional comments (3)
tests/openapi/test_strictmode.py (2)
1326-1361
: Good addition of test for strict mode formula expressions.This test verifies that formula expressions are properly restricted in strict mode when referencing unindexed fields. It follows the same pattern as other strict mode tests and provides good coverage for the new functionality.
🧰 Tools
🪛 Ruff (0.8.2)
1326-1326: Redefinition of unused
collection_name
from line 3(F811)
1458-1458
: The assertion matches expected behavior.The assertion correctly verifies that unsetting the write rate limit isn't currently supported, as mentioned in the comment on line 1457.
lib/collection/src/operations/verification/query.rs (1)
10-38
: Well-implemented formula expression restriction.The implementation correctly checks for unindexed fields in formula expressions when strict mode is enabled with
unindexed_filtering_retrieve
set tofalse
. The error message is clear and provides helpful guidance on how to resolve the issue.
cb71472
to
c89ca7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/collection/src/operations/verification/query.rs (1)
40-57
: Return type inconsistency in check_custom implementations.While the implementation logic is sound, there's an inconsistency in the return types.
CollectionPrefetch::check_custom
returnsResult<(), crate::operations::types::CollectionError>
while the other implementations returnCollectionResult<()>
. SinceCollectionResult
is likely a type alias for the same Result type, these should be consistent.- ) -> Result<(), crate::operations::types::CollectionError> { + ) -> CollectionResult<()> {Also applies to: 80-97, 120-132
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/operations/verification/query.rs
(3 hunks)tests/openapi/test_strictmode.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/openapi/test_strictmode.py
1326-1326: Redefinition of unused collection_name
from line 3
(F811)
🔇 Additional comments (4)
tests/openapi/test_strictmode.py (2)
1326-1366
: Good addition of a strict mode test for formula expressions!This test effectively verifies that formula expressions containing unindexed payload fields are rejected when strict mode is enabled with unindexed filtering restrictions. The assertion checking for both "discount_price" and "formula expression" in the error message is comprehensive.
🧰 Tools
🪛 Ruff (0.8.2)
1326-1326: Redefinition of unused
collection_name
from line 3(F811)
1462-1462
: Consistent assertion matching implementation behavior.The assertion is now correctly checking that
write_rate_limit
equals 1, which aligns with the comment on line 1461 indicating that write rate limit unset operation is not supported.lib/collection/src/operations/verification/query.rs (2)
10-38
: Well-implemented strict mode check for formula expressions.The implementation effectively identifies unindexed fields in formula expressions and generates a helpful error message that includes both the problematic key and suggested actions for resolution.
60-60
: Fixed return type in query_limit method.The method now correctly returns
Some(self.limit)
instead of justself.limit
, which aligns with the declared return type ofOption<usize>
and makes it consistent with similar implementations for other structs.
c89ca7b
to
4c2bf2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/openapi/test_strictmode.py (1)
1326-1326
: Redefinition of unused parameterThe function parameter
collection_name
is redefined here but is already imported fromconftest
at line 3.-def test_strict_mode_formula_expression(collection_name): +def test_strict_mode_formula_expression():🧰 Tools
🪛 Ruff (0.8.2)
1326-1326: Redefinition of unused
collection_name
from line 3(F811)
lib/collection/src/operations/verification/query.rs (1)
40-57
: Implementation of StrictModeVerification for CollectionQueryRequestThe implementation correctly verifies both prefetch and query fields, passing on the strict mode configuration. The method now returns CollectionResult<()> for consistent error handling throughout the codebase.
Note that there's repeated code across the different implementations (CollectionQueryRequest, CollectionPrefetch, and CollectionQueryGroupsRequest). Consider extracting this common logic to reduce duplication in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/operations/verification/query.rs
(3 hunks)tests/openapi/test_strictmode.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/openapi/test_strictmode.py
1326-1326: Redefinition of unused collection_name
from line 3
(F811)
🔇 Additional comments (8)
tests/openapi/test_strictmode.py (2)
1326-1364
: Well-structured test for formula expression strict mode verificationThe test case is well-designed and follows the established pattern of other strict mode tests in this file:
- First verifies the query works without restrictions
- Then enables strict mode with the relevant setting
- Finally verifies the query fails with appropriate error messages
This test properly validates the new functionality for checking unindexed fields in formula expressions.
🧰 Tools
🪛 Ruff (0.8.2)
1326-1326: Redefinition of unused
collection_name
from line 3(F811)
1461-1461
: Assertion consistent with expected behaviorThe assertion correctly verifies that the write_rate_limit remains 1, which aligns with the comment at line 1460 indicating that unsetting is not currently supported.
lib/collection/src/operations/verification/query.rs (6)
5-5
: Updated import to include CollectionResultAppropriate addition of CollectionResult to the import to support the new return type used in the check_strict_mode function.
7-7
: Added Query to the import listQuery needs to be imported as it's used in the implementation below.
10-37
: Well-implemented formula field verification for strict modeThis implementation correctly checks for unindexed fields in formula expressions, and returns a helpful error message indicating which fields need to be indexed. The error message includes the possible schema types for the field, helping users understand how to fix the issue.
The code follows the existing pattern for strict mode verification while adding this new specific check for formula expressions.
59-61
: Simplified query_limit methodThis change ensures the method always returns an Option by wrapping the limit in Some. This is more consistent with the method signature and seems to be a straightforward improvement.
80-96
: Implementation of StrictModeVerification for CollectionPrefetchThis implementation correctly checks both prefetch and query components. It follows the same pattern as the implementation for CollectionQueryRequest.
120-131
: Implementation of StrictModeVerification for CollectionQueryGroupsRequestThe implementation correctly verifies the query field, passing on the strict mode configuration. Unlike the other implementations, this one doesn't need to check prefetch since that field doesn't exist on this type.
There is still some code duplication across these implementations that could be addressed in the future.
* verify CollectionQuery's Prefetch * remove implementation for REST query * remove implementation for REST query groups
4c2bf2f
to
9390319
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/collection/src/operations/verification/query.rs (1)
80-97
: CollectionPrefetch implementation looks correctYou've properly implemented the check_custom method for CollectionPrefetch, making sure that all prefetch queries and their formulas are checked for unindexed fields.
Consider using the same return type signature across all implementations:
- ) -> Result<(), crate::operations::types::CollectionError> { + ) -> CollectionResult<()> {Since you've imported
CollectionResult
at the top, this would make the code more consistent with the other implementations.lib/collection/src/problems/unindexed_field.rs (2)
433-542
: Consider breaking down the update_from_expression method for better maintainability.While the implementation correctly handles all expression types, the method is quite large and complex. Consider refactoring it into smaller helper methods for each category of expressions (mathematical operations, functions, etc.) to improve readability and maintainability.
You could restructure it like this:
pub fn update_from_expression(&mut self, expression: &ExpressionInternal) { match expression { ExpressionInternal::Constant(_) => return, ExpressionInternal::Variable(variable) => self.process_variable(variable), ExpressionInternal::Condition(condition) => { self.update_from_condition(None, condition); return; }, ExpressionInternal::GeoDistance { origin: _, to } => self.process_geo_distance(to), ExpressionInternal::Datetime(_) => return, ExpressionInternal::DatetimeKey(variable) => self.process_datetime_key(variable), // Group mathematical operations expr @ (ExpressionInternal::Mult(_) | ExpressionInternal::Sum(_) | ExpressionInternal::Div { .. }) => self.process_math_operation(expr), // Group mathematical functions expr @ (ExpressionInternal::Neg(_) | ExpressionInternal::Sqrt(_) | ExpressionInternal::Pow { .. } | ExpressionInternal::Exp(_) | ExpressionInternal::Log10(_) | ExpressionInternal::Ln(_) | ExpressionInternal::Abs(_)) => self.process_math_function(expr), ExpressionInternal::Decay { .. } => self.process_decay_function(expression), } } // Helper methods for each expression type...
412-424
: Address the TODO for better comparisons of parameterized indexes.There's a TODO comment indicating a potential false negative in the index comparison logic. Integer indexes parameterized with "lookup" or "range" might lead to incorrect assessments of whether a field is properly indexed. Consider implementing the suggested improvement or clarifying when this will be addressed.
Would you like me to suggest a more robust comparison approach that could address this TODO?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (16)
lib/api/src/rest/schema.rs
(1 hunks)lib/collection/Cargo.toml
(1 hunks)lib/collection/src/collection/payload_index_schema.rs
(3 hunks)lib/collection/src/collection/search.rs
(1 hunks)lib/collection/src/lib.rs
(1 hunks)lib/collection/src/operations/universal_query/formula.rs
(3 hunks)lib/collection/src/operations/universal_query/shard_query.rs
(1 hunks)lib/collection/src/operations/verification/query.rs
(3 hunks)lib/collection/src/problems/unindexed_field.rs
(5 hunks)lib/collection/src/tests/payload.rs
(5 hunks)lib/segment/Cargo.toml
(0 hunks)lib/segment/src/lib.rs
(0 hunks)lib/storage/src/issues_subscribers.rs
(2 hunks)src/actix/api/query_api.rs
(4 hunks)src/issues_setup.rs
(1 hunks)tests/openapi/test_strictmode.py
(1 hunks)
💤 Files with no reviewable changes (2)
- lib/segment/src/lib.rs
- lib/segment/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (10)
- lib/collection/src/lib.rs
- lib/collection/src/tests/payload.rs
- lib/collection/Cargo.toml
- lib/collection/src/collection/search.rs
- lib/collection/src/operations/universal_query/shard_query.rs
- lib/api/src/rest/schema.rs
- src/issues_setup.rs
- lib/collection/src/operations/universal_query/formula.rs
- src/actix/api/query_api.rs
- lib/storage/src/issues_subscribers.rs
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/openapi/test_strictmode.py
1326-1326: Redefinition of unused collection_name
from line 3
(F811)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (12)
tests/openapi/test_strictmode.py (1)
1326-1364
:✅ Verification successful
Good addition of test for formula expression strict mode!
This new test correctly verifies that formula expressions using unindexed fields are properly restricted when strict mode is enabled with
unindexed_filtering_retrieve
set toFalse
. The test first confirms the query works without restrictions, then correctly verifies that it fails with specific error messages when strict mode is enabled.
🏁 Script executed:
#!/bin/bash # Check the static analysis warning about collection_name redefinition # Look for the collection_name import grep -n "from .conftest import collection_name" tests/openapi/test_strictmode.py # Check the test function signature grep -n "def test_strict_mode_formula_expression" tests/openapi/test_strictmode.pyLength of output: 270
Test correctly validates the strict mode enforcement for formula expressions.
- The test first confirms that a query using an unindexed field succeeds without strict mode.
- Once strict mode is enabled (with
unindexed_filtering_retrieve
set toFalse
), the same query fails, and the error message appropriately mentions both"discount_price"
and"formula expression"
.- The use of the
collection_name
fixture—imported fromconftest
and then injected as a parameter in the test—is intentional and in line with our testing conventions, so the static analysis warning about redefinition is expected.🧰 Tools
🪛 Ruff (0.8.2)
1326-1326: Redefinition of unused
collection_name
from line 3(F811)
lib/collection/src/operations/verification/query.rs (3)
5-38
: Implementation of formula field indexing verification is solidThe implementation correctly adds the check for unindexed fields in formula expressions and provides a helpful error message to guide users. I like that the error message includes both what's wrong and how to fix it.
40-57
: Good implementation of formula verification in CollectionQueryRequestThe changes correctly add formula expression checking for CollectionQueryRequest, ensuring that unindexed fields in formulas are properly validated when strict mode is enabled.
120-131
:✅ Verification successful
CollectionQueryGroupsRequest implementation is good
The check_custom method implementation for CollectionQueryGroupsRequest is correct, checking unindexed fields in formula queries.
🏁 Script executed:
#!/bin/bash # Check if there's similar duplicate code that could be consolidated # See if the method bodies in check_custom implementations are similar echo "Checking CollectionQueryRequest check_custom:" sed -n '41,57p' lib/collection/src/operations/verification/query.rs echo -e "\nChecking CollectionPrefetch check_custom:" sed -n '81,97p' lib/collection/src/operations/verification/query.rs echo -e "\nChecking CollectionQueryGroupsRequest check_custom:" sed -n '121,131p' lib/collection/src/operations/verification/query.rsLength of output: 2087
CollectionQueryGroupsRequest Implementation Confirmed as Correct
After reviewing the implementation against similar methods in the file, the check for unindexed fields in formula queries is performed appropriately. The absence of a prefetch loop compared to the other implementations is intentional and aligns with the expected behavior for this request type.
- File:
lib/collection/src/operations/verification/query.rs
- Lines: 121-131 — The code correctly handles the strict mode check for queries.
lib/collection/src/collection/payload_index_schema.rs (4)
11-13
: Clean import adjustments for module relocation.The import changes correctly reflect the relocation of the
problems
module fromsegment
tocollection
crate as mentioned in the PR description. This architectural change helps facilitate the evaluation of variables withinExpressionInternal
.
107-119
: Well-structured Collection implementation update.The changes to the Collection implementation are clean and maintain backward compatibility while adding support for expression checking. The original method now delegates to the new specialized method, and a parallel method for expressions has been added with a clear, consistent naming convention.
122-125
: Good enum design for type-safe handling.The
PotentiallyUnindexed
enum is a well-designed approach for handling bothFilter
andExpressionInternal
types in a type-safe manner. This creates a clear abstraction that allows the underlying implementation to handle both types without code duplication.
131-169
: Clean method refactoring with good encapsulation.The refactoring of
one_unindexed_key
into a private method with two public wrapper methods is excellent. This approach:
- Maintains a clean public API
- Provides type-specific methods for clients
- Reuses implementation logic to avoid duplication
- Properly encapsulates the implementation details
This is a good example of following the Open/Closed principle - extending functionality without modifying existing behavior.
lib/collection/src/problems/unindexed_field.rs (4)
6-20
: Clean import adjustments reflecting module restructuring.The import changes correctly update the paths following the module relocation. Adding the
http
crate andVariableId
imports supports the new functionality for formula expression evaluation.
169-189
: Consistent path updates in infer_schema_from_match_value.The function has been properly updated to use the fully-qualified path
segment::types::ValueVariants
consistently with the import changes. This ensures the code continues to work correctly after the module restructuring.
405-431
: Good extraction of common logic to needs_index method.Extracting the indexing check logic to a separate method improves code organization by following the DRY principle. This refactoring makes the code more maintainable and allows reuse of this logic in the new
update_from_expression
method.
446-456
: Add tests for VariableId handling in the update_from_expression method.As noted in a previous review, there appears to be a lack of comprehensive tests for the different
VariableId
variants in the formula expression evaluation. This is a critical part of the implementation, especially for ensuring the correct handling of Score, Payload, and Condition types.#!/bin/bash # Search for tests that would validate update_from_expression behavior echo "Searching for tests related to update_from_expression..." rg -A 5 -B 5 "update_from_expression" --glob "**/*test*.rs" echo "Searching for tests related to formula expressions..." rg -A 5 -B 5 "ExpressionInternal" --glob "**/*test*.rs" | grep -i test echo "Checking for test cases covering VariableId variants..." rg -A 5 -B 5 "VariableId::Score|VariableId::Payload|VariableId::Condition" --glob "**/*test*.rs"
…t#6317) * move problems.rs to collection crate * change type on datetime_key expression * extract from expression * check for unindexed fields in formula * clippy * dedup code * Only implement verification for `CollectionQuery` types (qdrant#6319) * verify CollectionQuery's Prefetch * remove implementation for REST query * remove implementation for REST query groups * add basic openapi test
Implements the check for unindexed fields in formula expressions.
Even if formulas are only allowed as a reranking step, there is no guarantee that the preselection will be small enough for making reading payloads directly negligible.
With this check, only correctly-indexed payload fields will be allowed in the formula expressions.
A needed change was to move the
problems
module fromsegment
to thecollection
crate, because we needed to evaluate variables insideExpressionInternal
.Also realized that we could use
JsonPath
directly as the type for REST'sdatetime_key
expression.