[strict mode | score boosting] Strict mode for formula queries #6317

coszio · 2025-04-03T17:27:54Z

Implements the check for unindexed fields in formula expressions.

Even if formulas are only allowed as a reranking step, there is no guarantee that the preselection will be small enough for making reading payloads directly negligible.

With this check, only correctly-indexed payload fields will be allowed in the formula expressions.

A needed change was to move the problems module from segment to the collection crate, because we needed to evaluate variables inside ExpressionInternal.

Also realized that we could use JsonPath directly as the type for REST's datetime_key expression.

coderabbitai · 2025-04-03T17:32:15Z

📝 Walkthrough

Walkthrough

The changes update multiple areas of the codebase. The type for the datetime_key has been changed from String to JsonPath in both the REST schema and the universal query formulas, modifying the parsing logic and error handling for these keys. The PayloadIndexSchema has been refactored: a method has been updated to accept an enum to accommodate both filters and expressions, and two new methods have been introduced to wrap this functionality. Dependency management has been adjusted by adding the http crate in one Cargo.toml while removing it from another. Module interfaces and import paths have also been updated by introducing a new public module in one library and relocating import references for unindexed field handling from the segment module to the collection module. Additionally, new asynchronous methods for strict mode verification and modifications in tests reflect these structural changes consistently across the codebase.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

lib/storage/src/issues_subscribers.rs (1)
18-18: Consistent module path update for method call.

The method call has been updated to use the fully qualified path collection::problems::UnindexedField::submit_possible_suspects instead of just using the imported UnindexedField.

Since you've already imported UnindexedField on line 4, you could simplify this to:
-            collection::problems::UnindexedField::submit_possible_suspects(
+            UnindexedField::submit_possible_suspects(
lib/collection/src/operations/verification/query.rs (2)

108-136: Strict mode check for unindexed fields in CollectionQueryRequest
This method correctly returns an error when an expression references an unindexed field with unindexed_filtering_retrieve == Some(false). As an optional improvement, consider logging or reporting metrics when such errors occur repeatedly.

160-187: Duplicate strict mode logic in CollectionQueryGroupsRequest
This block mirrors the CollectionQueryRequest logic. Factoring out a shared helper method could reduce repetition and potential future divergence.

lib/collection/src/collection/payload_index_schema.rs (1)

107-110: Delegate to one_unindexed_filter_key
The original one_unindexed_key function now hands off filter checks to a more specialized method. Consider renaming for clarity since it’s focused on filter usage.

lib/collection/src/problems/unindexed_field.rs (2)

405-431: needs_index method
Compares inferred schemas to existing indexes, returning true only if the appropriate schema kind is missing. The TODO regarding parameter comparisons for integer indexes is valid.

433-541: update_from_expression coverage
Thoroughly enumerates every ExpressionInternal variant, extracting relevant JSON paths. Consider unit tests for complex nested scenarios to ensure correctness.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 821aa0a and 0ceb2c4.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

lib/api/src/rest/schema.rs (1 hunks)
lib/collection/Cargo.toml (1 hunks)
lib/collection/src/collection/payload_index_schema.rs (3 hunks)
lib/collection/src/collection/search.rs (1 hunks)
lib/collection/src/lib.rs (1 hunks)
lib/collection/src/operations/universal_query/formula.rs (3 hunks)
lib/collection/src/operations/universal_query/shard_query.rs (1 hunks)
lib/collection/src/operations/verification/query.rs (3 hunks)
lib/collection/src/problems/unindexed_field.rs (5 hunks)
lib/collection/src/tests/payload.rs (5 hunks)
lib/segment/Cargo.toml (0 hunks)
lib/segment/src/lib.rs (0 hunks)
lib/storage/src/issues_subscribers.rs (2 hunks)
src/issues_setup.rs (1 hunks)

💤 Files with no reviewable changes (2)

lib/segment/src/lib.rs
lib/segment/Cargo.toml

🧰 Additional context used

🧬 Code Definitions (4)

lib/collection/src/tests/payload.rs (1)

lib/collection/src/collection/payload_index_schema.rs (1)

one_unindexed_filter_key (157-162)

lib/collection/src/collection/search.rs (1)

lib/collection/src/problems/unindexed_field.rs (1)

slow_query_threshold (36-38)

lib/storage/src/issues_subscribers.rs (1)

lib/collection/src/problems/unindexed_field.rs (1)

submit_possible_suspects (90-103)

lib/collection/src/collection/payload_index_schema.rs (1)

lib/collection/src/problems/unindexed_field.rs (2)

new (275-286)

new (331-336)

⏰ Context from checks skipped due to timeout of 90000ms (13)

GitHub Check: Basic TLS/HTTPS tests
GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-low-resources
GitHub Check: test-consistency
GitHub Check: test (macos-latest)
GitHub Check: test-consensus-compose
GitHub Check: test-consensus
GitHub Check: test (windows-latest)
GitHub Check: test
GitHub Check: test (ubuntu-latest)
GitHub Check: test
GitHub Check: test

🔇 Additional comments (22)

lib/collection/src/lib.rs (1)

12-12: Addition of problems module completes the module relocation

This change makes the new problems module publicly accessible from the collection crate, which aligns with the PR objective of relocating the module from segment to collection to facilitate evaluation of variables within ExpressionInternal.

lib/collection/Cargo.toml (1)

61-61: Appropriate addition of HTTP dependency

Adding the HTTP crate as a workspace dependency is necessary to support the JsonPath type usage in the REST API's datetime key expression. This aligns with the PR's goal of enhancing formula query handling.

src/issues_setup.rs (1)

4-4: Import path updated to reflect module relocation

The import path has been correctly updated to use the relocated problems module from the collection crate instead of the segment crate. This change maintains consistency with the architectural changes in this PR.

lib/api/src/rest/schema.rs (1)

685-685: Type enhancement for datetime_key from String to JsonPath

Changing the type from String to JsonPath provides better type safety and validation for datetime keys in formula expressions. This change directly supports the PR objective of implementing strict mode for formula queries, by ensuring that datetime keys are properly validated as JSON paths.

lib/collection/src/collection/search.rs (1)

320-320: Update to import path for consistent module structure.

The code has been updated to use crate::problems::UnindexedField instead of segment::problems::UnindexedField, reflecting the relocation of the problems module from the segment to the collection crate as mentioned in the PR objectives.

lib/collection/src/operations/universal_query/shard_query.rs (1)

404-409: Improved type safety and error handling for datetime keys.

The implementation now parses the dt_key string into a JsonPath object instead of using it directly. This change:

Adds validation at parsing time, rejecting invalid keys

Provides clear error messages for invalid inputs

Uses a strongly-typed JsonPath object instead of a raw string

Aligns with the PR objective of using JsonPath directly for datetime keys

This is a good improvement for type safety and error handling.

lib/storage/src/issues_subscribers.rs (1)

4-4: Updated import to reflect relocated module.

The import has been updated to use collection::problems::UnindexedField instead of segment::problems::UnindexedField, consistent with the module relocation mentioned in the PR objectives.

lib/collection/src/tests/payload.rs (1)

72-72: Updated method name to reflect specialized functionality.

All instances of one_unindexed_key() have been updated to one_unindexed_filter_key() to match the renamed method in PayloadIndexSchema. This aligns with the code changes where the original method was refactored to handle both filters and expressions using the PotentiallyUnindexed enum, with specialized wrapper methods for specific use cases.

The update is consistent across all occurrences in the test, ensuring continued correct functionality while supporting the PR objective to implement checks for unindexed fields in formula expressions.

Also applies to: 91-91, 114-114, 143-143

lib/collection/src/operations/universal_query/formula.rs (3)

11-11: Check import usage
This newly added import for Condition and GeoPoint is straightforward and consistent with the rest of the file’s usage.

32-32: Switch from Datetime(String) to DatetimeKey(JsonPath)
Replacing the string-based approach with a typed JsonPath clarifies the payload reference while reducing parsing overhead.

90-93: New DatetimeKey variant logic
Inserting the JSON path into payload_vars and wrapping it in DatetimeExpression::PayloadVariable looks correct. This allows date-time extraction directly from the payload.

lib/collection/src/operations/verification/query.rs (2)

6-6: Added import for CollectionError and CollectionResult
No issues detected.

8-8: Importing new query types
Brings CollectionQueryGroupsRequest, CollectionQueryRequest, and Query into scope for strict mode checks.

lib/collection/src/collection/payload_index_schema.rs (7)

11-11: Added import for ExpressionInternal
Enables expression-based logic within payload index checks.

13-13: Import unindexed_field
Aligns with the new unindexed logic integration.

111-119: Introducing one_unindexed_expression_key
Allows external code to detect unindexed keys in expressions. The method name is descriptive, making the API more intuitive.

122-125: New enum PotentiallyUnindexed
A flexible solution for handling both filters and expressions with the same unindexed key detection logic.

131-134: Refactored one_unindexed_key
Switching to the generic parameter PotentiallyUnindexed ensures this function can evolve to handle multiple input types seamlessly.

137-144: Match-based call to update methods
Explicitly differentiating between Filter and Expression with update_from_filter_once or update_from_expression keeps the code clean and cohesive.

154-169: Public wrappers for filters and expressions
one_unindexed_filter_key and one_unindexed_expression_key provide clear, dedicated entry points for each type of unindexed key check.

lib/collection/src/problems/unindexed_field.rs (2)

10-20: Refactored imports
These imports now reference modules in segment, aligning with the broader reorganization.

171-185: Extended handling of ValueVariants
Correctly matches String, Integer, and Bool for inference. This approach is consistent with the existing schema inference logic.

lib/collection/src/operations/verification/query.rs

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

lib/collection/src/problems/unindexed_field.rs (1)

412-424: Consider improving index comparison in a follow-up

The TODO comment correctly identifies a limitation in the current approach of comparing just the index type (kind) rather than the full index parameters. This could lead to false negatives as described in the comment.

Consider implementing a more precise comparison for parametrized indexes in a follow-up PR. This would involve:

Enhancing the inferring step to output valid parametrized indexes

Comparing those directly rather than just comparing the kind

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ceb2c4 and 9a3ccc9.

📒 Files selected for processing (1)

lib/collection/src/problems/unindexed_field.rs (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (13)

GitHub Check: Basic TLS/HTTPS tests
GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-low-resources
GitHub Check: test-consistency
GitHub Check: test (macos-latest)
GitHub Check: test-consensus-compose
GitHub Check: test (windows-latest)
GitHub Check: test-consensus
GitHub Check: test (ubuntu-latest)
GitHub Check: test
GitHub Check: test
GitHub Check: test

🔇 Additional comments (4)

lib/collection/src/problems/unindexed_field.rs (4)

10-19: Imports updated to reflect module relocation

The import paths have been updated to reference the segment module directly, which aligns with the PR objective of relocating the problems module from the segment to the collection crate.

171-171: ValueVariants references updated correctly

References to ValueVariants have been properly updated to use the fully qualified path from the segment module, maintaining consistency with the import changes.

Also applies to: 182-182, 185-185

397-431: Good refactoring of indexing check logic

Extracting the indexing check logic into a separate needs_index method improves code organization and reusability. This refactoring will make the code easier to maintain.

Regarding the TODO comment on lines 412-424: The identified limitation in comparing parametrized indexes could lead to false negatives, particularly with integer indexes that are parametrized with either lookup or range. Consider prioritizing this improvement in a future update.

433-541: Comprehensive implementation of expression analysis

The new update_from_expression method thoroughly handles various expression types for identifying unindexed fields in formula queries. This implementation:

Correctly processes different expression types including constants, variables, conditions, and mathematical operations

Appropriately extracts fields and infers schemas for numeric and geo operations

Recursively handles composite expressions

Properly updates the unindexed schema collection

This is a key implementation for the PR's objective of enforcing strict mode for formula queries.

coderabbitai · 2025-04-03T17:47:46Z

lib/collection/src/problems/unindexed_field.rs

+                match var {
+                    VariableId::Score(_) => return,
+                    VariableId::Payload(json_path) => {
+                        key = json_path;
+                        inferred = vec![
+                            PayloadFieldSchema::FieldType(PayloadSchemaType::Integer),
+                            PayloadFieldSchema::FieldType(PayloadSchemaType::Float),
+                        ];
+                    }
+                    VariableId::Condition(_) => return,
+                }


💡 Verification agent

🧩 Analysis chain

Consider adding a test for VariableId handling

The parsing and handling of VariableId types looks correct, but this is a critical part of the formula expression evaluation. Ensure there are comprehensive tests covering the different variable types (Score, Payload, Condition) to validate this logic.

🏁 Script executed:

#!/bin/bash # Search for tests related to formula expressions and VariableId echo "Searching for tests related to formula expressions..." rg -A 3 -B 3 "VariableId" --glob "**/*test*.rs" echo "Searching for tests that would exercise the update_from_expression method..." rg -A 3 -B 3 "update_from_expression" --glob "**/*test*.rs"

Length of output: 387

ACTION REQUIRED: Add Comprehensive Tests for VariableId Handling

It appears that our test suite currently lacks coverage for the different VariableId variants. We were unable to locate any tests exercising the formula expression update logic, including the handling of Score, Payload, and Condition types. Please add comprehensive tests to ensure that each variant is correctly parsed and processed.

Ensure tests cover:

VariableId::Score (should bypass further processing)

VariableId::Payload (valid payload schema assignment)

VariableId::Condition (should also bypass processing)

Consider updating or creating tests in your test suite to validate the behavior in critical formula evaluation paths.

agourlay · 2025-04-04T06:44:36Z

Thank you very much for adding this 👍

Please add an integration test in tests/openapi/test_strictmode.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

tests/openapi/test_strictmode.py (1)
1357-1357: Remove debug print statement.

There's a debugging print statement that should be removed before merging this PR.
-    print(query_fail.json())
lib/collection/src/operations/verification/query.rs (2)
80-97: Consider using consistent return type.

The check_custom method for CollectionPrefetch uses Result<(), crate::operations::types::CollectionError> as its return type, while the same method for other types uses the type alias CollectionResult<()>. For consistency, consider using the type alias here as well.
-    ) -> Result<(), crate::operations::types::CollectionError> {
+    ) -> CollectionResult<()> {
40-78: Consider reducing code duplication in check_custom implementations.

The check_custom implementations for CollectionQueryRequest, CollectionPrefetch, and CollectionQueryGroupsRequest contain similar logic. Consider extracting the common pattern into a helper function to reduce duplication.

You could create a helper trait method that takes a query option and prefetch option:
trait StrictModeHelpers {
    fn check_query_and_prefetch(
        &self,
        query: Option<&Query>,
        prefetch: Option<&[CollectionPrefetch]>,
        collection: &Collection,
        strict_mode_config: &StrictModeConfig,
    ) -> CollectionResult<()>;
}

impl StrictModeHelpers for StrictModeVerification {
    fn check_query_and_prefetch(
        &self,
        query: Option<&Query>,
        prefetch: Option<&[CollectionPrefetch]>,
        collection: &Collection, 
        strict_mode_config: &StrictModeConfig,
    ) -> CollectionResult<()> {
        if let Some(prefetch_items) = prefetch {
            for prefetch in prefetch_items {
                Box::pin(prefetch.check_strict_mode(collection, strict_mode_config)).await?;
            }
        }
        
        if let Some(query) = query {
            query.check_strict_mode(collection, strict_mode_config)?;
        }
        
        Ok(())
    }
}
Then in each implementation:
async fn check_custom(&self, collection: &Collection, strict_mode_config: &StrictModeConfig) -> CollectionResult<()> {
    self.check_query_and_prefetch(
        self.query.as_ref(),
        Some(&self.prefetch),
        collection,
        strict_mode_config
    )
}
This approach would need to be adapted to your specific requirements and may require async syntax adjustments.

Also applies to: 80-118, 120-152

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0879fd and cb71472.

📒 Files selected for processing (2)

lib/collection/src/operations/verification/query.rs (3 hunks)
tests/openapi/test_strictmode.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

tests/openapi/test_strictmode.py

1326-1326: Redefinition of unused collection_name from line 3

(F811)

🔇 Additional comments (3)

tests/openapi/test_strictmode.py (2)

1326-1361: Good addition of test for strict mode formula expressions.

This test verifies that formula expressions are properly restricted in strict mode when referencing unindexed fields. It follows the same pattern as other strict mode tests and provides good coverage for the new functionality.

🧰 Tools

🪛 Ruff (0.8.2)

1326-1326: Redefinition of unused collection_name from line 3

(F811)

1458-1458: The assertion matches expected behavior.

The assertion correctly verifies that unsetting the write rate limit isn't currently supported, as mentioned in the comment on line 1457.

lib/collection/src/operations/verification/query.rs (1)

10-38: Well-implemented formula expression restriction.

The implementation correctly checks for unindexed fields in formula expressions when strict mode is enabled with unindexed_filtering_retrieve set to false. The error message is clear and provides helpful guidance on how to resolve the issue.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

lib/collection/src/operations/verification/query.rs (1)
40-57: Return type inconsistency in check_custom implementations.

While the implementation logic is sound, there's an inconsistency in the return types. CollectionPrefetch::check_custom returns Result<(), crate::operations::types::CollectionError> while the other implementations return CollectionResult<()>. Since CollectionResult is likely a type alias for the same Result type, these should be consistent.
-    ) -> Result<(), crate::operations::types::CollectionError> {
+    ) -> CollectionResult<()> {
Also applies to: 80-97, 120-132

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb71472 and c89ca7b.

📒 Files selected for processing (2)

lib/collection/src/operations/verification/query.rs (3 hunks)
tests/openapi/test_strictmode.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

tests/openapi/test_strictmode.py

1326-1326: Redefinition of unused collection_name from line 3

(F811)

🔇 Additional comments (4)

tests/openapi/test_strictmode.py (2)

1326-1366: Good addition of a strict mode test for formula expressions!

This test effectively verifies that formula expressions containing unindexed payload fields are rejected when strict mode is enabled with unindexed filtering restrictions. The assertion checking for both "discount_price" and "formula expression" in the error message is comprehensive.

🧰 Tools

🪛 Ruff (0.8.2)

1326-1326: Redefinition of unused collection_name from line 3

(F811)

1462-1462: Consistent assertion matching implementation behavior.

The assertion is now correctly checking that write_rate_limit equals 1, which aligns with the comment on line 1461 indicating that write rate limit unset operation is not supported.

lib/collection/src/operations/verification/query.rs (2)

10-38: Well-implemented strict mode check for formula expressions.

The implementation effectively identifies unindexed fields in formula expressions and generates a helpful error message that includes both the problematic key and suggested actions for resolution.

60-60: Fixed return type in query_limit method.

The method now correctly returns Some(self.limit) instead of just self.limit, which aligns with the declared return type of Option<usize> and makes it consistent with similar implementations for other structs.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/openapi/test_strictmode.py (1)
1326-1326: Redefinition of unused parameter

The function parameter collection_name is redefined here but is already imported from conftest at line 3.
-def test_strict_mode_formula_expression(collection_name):
+def test_strict_mode_formula_expression():
🧰 Tools

🪛 Ruff (0.8.2)

1326-1326: Redefinition of unused collection_name from line 3

(F811)
lib/collection/src/operations/verification/query.rs (1)

40-57: Implementation of StrictModeVerification for CollectionQueryRequest

The implementation correctly verifies both prefetch and query fields, passing on the strict mode configuration. The method now returns CollectionResult<()> for consistent error handling throughout the codebase.

Note that there's repeated code across the different implementations (CollectionQueryRequest, CollectionPrefetch, and CollectionQueryGroupsRequest). Consider extracting this common logic to reduce duplication in the future.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c89ca7b and 4c2bf2f.

📒 Files selected for processing (2)

lib/collection/src/operations/verification/query.rs (3 hunks)
tests/openapi/test_strictmode.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

tests/openapi/test_strictmode.py

1326-1326: Redefinition of unused collection_name from line 3

(F811)

🔇 Additional comments (8)

tests/openapi/test_strictmode.py (2)

1326-1364: Well-structured test for formula expression strict mode verification

The test case is well-designed and follows the established pattern of other strict mode tests in this file:

First verifies the query works without restrictions

Then enables strict mode with the relevant setting

Finally verifies the query fails with appropriate error messages

This test properly validates the new functionality for checking unindexed fields in formula expressions.

🧰 Tools

🪛 Ruff (0.8.2)

1326-1326: Redefinition of unused collection_name from line 3

(F811)

1461-1461: Assertion consistent with expected behavior

The assertion correctly verifies that the write_rate_limit remains 1, which aligns with the comment at line 1460 indicating that unsetting is not currently supported.

lib/collection/src/operations/verification/query.rs (6)

5-5: Updated import to include CollectionResult

Appropriate addition of CollectionResult to the import to support the new return type used in the check_strict_mode function.

7-7: Added Query to the import list

Query needs to be imported as it's used in the implementation below.

10-37: Well-implemented formula field verification for strict mode

This implementation correctly checks for unindexed fields in formula expressions, and returns a helpful error message indicating which fields need to be indexed. The error message includes the possible schema types for the field, helping users understand how to fix the issue.

The code follows the existing pattern for strict mode verification while adding this new specific check for formula expressions.

59-61: Simplified query_limit method

This change ensures the method always returns an Option by wrapping the limit in Some. This is more consistent with the method signature and seems to be a straightforward improvement.

80-96: Implementation of StrictModeVerification for CollectionPrefetch

This implementation correctly checks both prefetch and query components. It follows the same pattern as the implementation for CollectionQueryRequest.

120-131: Implementation of StrictModeVerification for CollectionQueryGroupsRequest

The implementation correctly verifies the query field, passing on the strict mode configuration. Unlike the other implementations, this one doesn't need to check prefetch since that field doesn't exist on this type.

There is still some code duplication across these implementations that could be addressed in the future.

* verify CollectionQuery's Prefetch * remove implementation for REST query * remove implementation for REST query groups

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

lib/collection/src/operations/verification/query.rs (1)
80-97: CollectionPrefetch implementation looks correct

You've properly implemented the check_custom method for CollectionPrefetch, making sure that all prefetch queries and their formulas are checked for unindexed fields.

Consider using the same return type signature across all implementations:
-    ) -> Result<(), crate::operations::types::CollectionError> {
+    ) -> CollectionResult<()> {
Since you've imported CollectionResult at the top, this would make the code more consistent with the other implementations.
lib/collection/src/problems/unindexed_field.rs (2)
433-542: Consider breaking down the update_from_expression method for better maintainability.

While the implementation correctly handles all expression types, the method is quite large and complex. Consider refactoring it into smaller helper methods for each category of expressions (mathematical operations, functions, etc.) to improve readability and maintainability.

You could restructure it like this:
pub fn update_from_expression(&mut self, expression: &ExpressionInternal) {
    match expression {
        ExpressionInternal::Constant(_) => return,
        ExpressionInternal::Variable(variable) => self.process_variable(variable),
        ExpressionInternal::Condition(condition) => {
            self.update_from_condition(None, condition);
            return;
        },
        ExpressionInternal::GeoDistance { origin: _, to } => self.process_geo_distance(to),
        ExpressionInternal::Datetime(_) => return,
        ExpressionInternal::DatetimeKey(variable) => self.process_datetime_key(variable),
        // Group mathematical operations
        expr @ (ExpressionInternal::Mult(_) | 
                ExpressionInternal::Sum(_) | 
                ExpressionInternal::Div { .. }) => self.process_math_operation(expr),
        // Group mathematical functions
        expr @ (ExpressionInternal::Neg(_) |
                ExpressionInternal::Sqrt(_) |
                ExpressionInternal::Pow { .. } |
                ExpressionInternal::Exp(_) |
                ExpressionInternal::Log10(_) |
                ExpressionInternal::Ln(_) |
                ExpressionInternal::Abs(_)) => self.process_math_function(expr),
        ExpressionInternal::Decay { .. } => self.process_decay_function(expression),
    }
}

// Helper methods for each expression type...
412-424: Address the TODO for better comparisons of parameterized indexes.

There's a TODO comment indicating a potential false negative in the index comparison logic. Integer indexes parameterized with "lookup" or "range" might lead to incorrect assessments of whether a field is properly indexed. Consider implementing the suggested improvement or clarifying when this will be addressed.

Would you like me to suggest a more robust comparison approach that could address this TODO?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c2bf2f and 9390319.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (16)

lib/api/src/rest/schema.rs (1 hunks)
lib/collection/Cargo.toml (1 hunks)
lib/collection/src/collection/payload_index_schema.rs (3 hunks)
lib/collection/src/collection/search.rs (1 hunks)
lib/collection/src/lib.rs (1 hunks)
lib/collection/src/operations/universal_query/formula.rs (3 hunks)
lib/collection/src/operations/universal_query/shard_query.rs (1 hunks)
lib/collection/src/operations/verification/query.rs (3 hunks)
lib/collection/src/problems/unindexed_field.rs (5 hunks)
lib/collection/src/tests/payload.rs (5 hunks)
lib/segment/Cargo.toml (0 hunks)
lib/segment/src/lib.rs (0 hunks)
lib/storage/src/issues_subscribers.rs (2 hunks)
src/actix/api/query_api.rs (4 hunks)
src/issues_setup.rs (1 hunks)
tests/openapi/test_strictmode.py (1 hunks)

💤 Files with no reviewable changes (2)

lib/segment/src/lib.rs
lib/segment/Cargo.toml

🚧 Files skipped from review as they are similar to previous changes (10)

lib/collection/src/lib.rs
lib/collection/src/tests/payload.rs
lib/collection/Cargo.toml
lib/collection/src/collection/search.rs
lib/collection/src/operations/universal_query/shard_query.rs
lib/api/src/rest/schema.rs
src/issues_setup.rs
lib/collection/src/operations/universal_query/formula.rs
src/actix/api/query_api.rs
lib/storage/src/issues_subscribers.rs

🧰 Additional context used

🪛 Ruff (0.8.2)

tests/openapi/test_strictmode.py

1326-1326: Redefinition of unused collection_name from line 3

(F811)

⏰ Context from checks skipped due to timeout of 90000ms (13)

GitHub Check: Basic TLS/HTTPS tests
GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-low-resources
GitHub Check: test-consistency
GitHub Check: test-consensus-compose
GitHub Check: test (macos-latest)
GitHub Check: test-consensus
GitHub Check: test (windows-latest)
GitHub Check: test
GitHub Check: test (ubuntu-latest)
GitHub Check: test
GitHub Check: test

🔇 Additional comments (12)

tests/openapi/test_strictmode.py (1)

1326-1364:
✅ Verification successful

Good addition of test for formula expression strict mode!

This new test correctly verifies that formula expressions using unindexed fields are properly restricted when strict mode is enabled with unindexed_filtering_retrieve set to False. The test first confirms the query works without restrictions, then correctly verifies that it fails with specific error messages when strict mode is enabled.

🏁 Script executed:
#!/bin/bash
# Check the static analysis warning about collection_name redefinition

# Look for the collection_name import
grep -n "from .conftest import collection_name" tests/openapi/test_strictmode.py
# Check the test function signature 
grep -n "def test_strict_mode_formula_expression" tests/openapi/test_strictmode.py
Length of output: 270
Test correctly validates the strict mode enforcement for formula expressions.

The test first confirms that a query using an unindexed field succeeds without strict mode.

Once strict mode is enabled (with unindexed_filtering_retrieve set to False), the same query fails, and the error message appropriately mentions both "discount_price" and "formula expression".

The use of the collection_name fixture—imported from conftest and then injected as a parameter in the test—is intentional and in line with our testing conventions, so the static analysis warning about redefinition is expected.

🧰 Tools

🪛 Ruff (0.8.2)

1326-1326: Redefinition of unused collection_name from line 3

(F811)
lib/collection/src/operations/verification/query.rs (3)

5-38: Implementation of formula field indexing verification is solid

The implementation correctly adds the check for unindexed fields in formula expressions and provides a helpful error message to guide users. I like that the error message includes both what's wrong and how to fix it.

40-57: Good implementation of formula verification in CollectionQueryRequest

The changes correctly add formula expression checking for CollectionQueryRequest, ensuring that unindexed fields in formulas are properly validated when strict mode is enabled.

120-131:
✅ Verification successful

CollectionQueryGroupsRequest implementation is good

The check_custom method implementation for CollectionQueryGroupsRequest is correct, checking unindexed fields in formula queries.

🏁 Script executed:
#!/bin/bash
# Check if there's similar duplicate code that could be consolidated

# See if the method bodies in check_custom implementations are similar
echo "Checking CollectionQueryRequest check_custom:"
sed -n '41,57p' lib/collection/src/operations/verification/query.rs

echo -e "\nChecking CollectionPrefetch check_custom:"
sed -n '81,97p' lib/collection/src/operations/verification/query.rs

echo -e "\nChecking CollectionQueryGroupsRequest check_custom:"
sed -n '121,131p' lib/collection/src/operations/verification/query.rs
Length of output: 2087
CollectionQueryGroupsRequest Implementation Confirmed as Correct

After reviewing the implementation against similar methods in the file, the check for unindexed fields in formula queries is performed appropriately. The absence of a prefetch loop compared to the other implementations is intentional and aligns with the expected behavior for this request type.

File: lib/collection/src/operations/verification/query.rs

Lines: 121-131 — The code correctly handles the strict mode check for queries.
lib/collection/src/collection/payload_index_schema.rs (4)

11-13: Clean import adjustments for module relocation.

The import changes correctly reflect the relocation of the problems module from segment to collection crate as mentioned in the PR description. This architectural change helps facilitate the evaluation of variables within ExpressionInternal.

107-119: Well-structured Collection implementation update.

The changes to the Collection implementation are clean and maintain backward compatibility while adding support for expression checking. The original method now delegates to the new specialized method, and a parallel method for expressions has been added with a clear, consistent naming convention.

122-125: Good enum design for type-safe handling.

The PotentiallyUnindexed enum is a well-designed approach for handling both Filter and ExpressionInternal types in a type-safe manner. This creates a clear abstraction that allows the underlying implementation to handle both types without code duplication.

131-169: Clean method refactoring with good encapsulation.

The refactoring of one_unindexed_key into a private method with two public wrapper methods is excellent. This approach:

Maintains a clean public API

Provides type-specific methods for clients

Reuses implementation logic to avoid duplication

Properly encapsulates the implementation details

This is a good example of following the Open/Closed principle - extending functionality without modifying existing behavior.
lib/collection/src/problems/unindexed_field.rs (4)

6-20: Clean import adjustments reflecting module restructuring.

The import changes correctly update the paths following the module relocation. Adding the http crate and VariableId imports supports the new functionality for formula expression evaluation.

169-189: Consistent path updates in infer_schema_from_match_value.

The function has been properly updated to use the fully-qualified path segment::types::ValueVariants consistently with the import changes. This ensures the code continues to work correctly after the module restructuring.

405-431: Good extraction of common logic to needs_index method.

Extracting the indexing check logic to a separate method improves code organization by following the DRY principle. This refactoring makes the code more maintainable and allows reuse of this logic in the new update_from_expression method.

446-456: Add tests for VariableId handling in the update_from_expression method.

As noted in a previous review, there appears to be a lack of comprehensive tests for the different VariableId variants in the formula expression evaluation. This is a critical part of the implementation, especially for ensuring the correct handling of Score, Payload, and Condition types.
#!/bin/bash
# Search for tests that would validate update_from_expression behavior
echo "Searching for tests related to update_from_expression..."
rg -A 5 -B 5 "update_from_expression" --glob "**/*test*.rs"

echo "Searching for tests related to formula expressions..."
rg -A 5 -B 5 "ExpressionInternal" --glob "**/*test*.rs" | grep -i test

echo "Checking for test cases covering VariableId variants..."
rg -A 5 -B 5 "VariableId::Score|VariableId::Payload|VariableId::Condition" --glob "**/*test*.rs"

…t#6317) * move problems.rs to collection crate * change type on datetime_key expression * extract from expression * check for unindexed fields in formula * clippy * dedup code * Only implement verification for `CollectionQuery` types (qdrant#6319) * verify CollectionQuery's Prefetch * remove implementation for REST query * remove implementation for REST query groups * add basic openapi test

coszio requested review from agourlay and JojiiOfficial April 3, 2025 17:27

coszio modified the milestones: score boosting, Strict mode Apr 3, 2025

coderabbitai bot reviewed Apr 3, 2025

View reviewed changes

JojiiOfficial approved these changes Apr 3, 2025

View reviewed changes

lib/collection/src/operations/verification/query.rs Outdated Show resolved Hide resolved

coderabbitai bot reviewed Apr 3, 2025

View reviewed changes

coderabbitai bot reviewed Apr 4, 2025

View reviewed changes

coszio force-pushed the strict-mode-for-formulas branch from cb71472 to c89ca7b Compare April 4, 2025 14:16

coderabbitai bot reviewed Apr 4, 2025

View reviewed changes

coszio force-pushed the strict-mode-for-formulas branch from c89ca7b to 4c2bf2f Compare April 4, 2025 14:28

coderabbitai bot reviewed Apr 4, 2025

View reviewed changes

coszio and others added 8 commits April 4, 2025 11:34

move problems.rs to collection crate

a2c2b91

change type on datetime_key expression

799f94c

extract from expression

9c330a7

check for unindexed fields in formula

c78be8b

clippy

45f2a05

dedup code

4e5ae6a

Only implement verification for CollectionQuery types (#6319)

247c58e

* verify CollectionQuery's Prefetch * remove implementation for REST query * remove implementation for REST query groups

add basic openapi test

9390319

coszio force-pushed the strict-mode-for-formulas branch from 4c2bf2f to 9390319 Compare April 4, 2025 14:35

coderabbitai bot reviewed Apr 4, 2025

View reviewed changes

agourlay approved these changes Apr 4, 2025

View reviewed changes

coszio merged commit f9a5371 into dev Apr 4, 2025
17 checks passed

coszio deleted the strict-mode-for-formulas branch April 4, 2025 16:52

coderabbitai bot mentioned this pull request Apr 9, 2025

Strict mode prevents fullscan for multivector #6351

Closed

This was referenced Apr 29, 2025

lib/api/src/rest/models : There’s a typo in the code #6464

Closed

Strict mode detects full scan on query #6473

Merged

coderabbitai bot mentioned this pull request May 14, 2025

Fix UnindexedField infra to handle lookup&range index requirements #6496

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[strict mode | score boosting] Strict mode for formula queries #6317

[strict mode | score boosting] Strict mode for formula queries #6317

Uh oh!

coszio commented Apr 3, 2025

Uh oh!

coderabbitai bot commented Apr 3, 2025 •

edited

Loading

Walkthrough

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 3, 2025

Uh oh!

agourlay commented Apr 4, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

[strict mode | score boosting] Strict mode for formula queries #6317

[strict mode | score boosting] Strict mode for formula queries #6317

Uh oh!

Conversation

coszio commented Apr 3, 2025

Uh oh!

coderabbitai bot commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

agourlay commented Apr 4, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Apr 3, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)