Skip to content

Conversation

agourlay
Copy link
Member

@agourlay agourlay commented May 6, 2025

In a previous PR, fullscan was prohibited in strict mode when using search_allow_exact=false.

However we want to allow querying when no global HNSW is present if the query leverage a multitenant HNSW based payload index.

Copy link
Contributor

coderabbitai bot commented May 7, 2025

📝 Walkthrough

"""

Walkthrough

The check_fullscan method in the Query implementation was updated to accept an optional filter parameter and now includes logic to allow full scans when filtering on multitenant indexed payload keys, even if HNSW vector indexing is disabled. The method retrieves vector-specific and payload-specific HNSW configuration values, falling back to global settings. It checks whether the filter's conditions target a multitenant payload key using the collection's payload key index schema and enforces strict mode restrictions accordingly. All invocations of check_fullscan were updated to pass the optional filter. Additionally, a targeted_key method was added to the Condition enum to identify the payload key targeted by a condition. Two new tests were added to verify strict mode behavior for full scans and filtered queries involving multitenant payload indexes under various HNSW index configurations.

Possibly related PRs

Suggested reviewers

  • coszio
  • JojiiOfficial
    """
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@agourlay agourlay force-pushed the strict-mode-allow-fullscan-multitenant branch from bcd1b0a to fbcb82a Compare May 7, 2025 17:11
@agourlay agourlay requested a review from generall May 7, 2025 20:16
@agourlay
Copy link
Member Author

Dear reviewers, please review this PR 🙏

Copy link
Contributor

@JojiiOfficial JojiiOfficial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one minor nit but otherwise ✅

@agourlay agourlay force-pushed the strict-mode-allow-fullscan-multitenant branch from 3e4d3c0 to a707190 Compare May 14, 2025 10:13
Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see test

@agourlay agourlay force-pushed the strict-mode-allow-fullscan-multitenant branch from fc57363 to ae02365 Compare May 19, 2025 08:06
@agourlay agourlay requested a review from generall May 19, 2025 09:13
generall

This comment was marked as resolved.

Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see tests

@agourlay agourlay force-pushed the strict-mode-allow-fullscan-multitenant branch from 44f9bc9 to 4e69463 Compare May 20, 2025 10:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/openapi/test_strictmode.py (1)

2131-2244: Complete validation of multitenant payload index with strict mode fullscans.

This comprehensive test covers the core functionality of this PR by validating that:

  1. Filtered queries work initially with disabled HNSW indexes
  2. Enabling strict mode prevents filtered queries when both vector and payload HNSW are disabled
  3. Adding standard or multitenant payload indexes alone doesn't allow queries in strict mode
  4. Enabling HNSW payload-based indexing (payload_m=1) allows filtered queries to succeed

The test thoroughly verifies all combinations, ensuring the feature works as expected.

Consider adding a few inline comments explaining the key test stages, especially for the multitenant payload index verification steps, to make the test's logic more immediately clear to future readers.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44f9bc9 and 4e69463.

📒 Files selected for processing (3)
  • lib/collection/src/operations/verification/query.rs (6 hunks)
  • lib/segment/src/types.rs (1 hunks)
  • tests/openapi/test_strictmode.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/collection/src/operations/verification/query.rs
  • lib/segment/src/types.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests-consensus
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: storage-compat-test
  • GitHub Check: lint
🔇 Additional comments (1)
tests/openapi/test_strictmode.py (1)

2073-2129: Nicely structured test for basic fullscan behavior in strict mode.

This test effectively validates that fullscans are allowed in strict mode with search_allow_exact=False only when the HNSW index is enabled. It confirms that disabling the HNSW index (m=0) properly prevents fullscans when strict mode is enabled.

@agourlay agourlay force-pushed the strict-mode-allow-fullscan-multitenant branch from 5c6d387 to 19d0ce3 Compare May 21, 2025 12:55
Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PATCH collections/benchmark
{
    "strict_mode_config": {
        "enabled": true,
        "search_allow_exact": false
    }
}


PATCH collections/benchmark
{
    "vectors": {
        "": {
            "hnsw_config": {
                "ef_construct": 128
            }
        }
    }
}

POST /collections/benchmark/points/query
{
  "query": 1
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
lib/collection/src/operations/verification/query.rs (3)

73-77: Consider adding a comment explaining the .flatten() operation

The .flatten() method is used to simplify the nested Option structure that results from the vector configuration. While the code is correct, adding a clarifying comment would help explain the double-option unwrapping pattern you're using.

 let vector_hnsw_m = vector_hnsw_config
     .map(|hnsw_config| hnsw_config.m)
-    .flatten()
+    .flatten() // Flatten nested Option from hnsw_config.m
     .unwrap_or(config.hnsw_config.m);

78-81: Consider adding a comment to explain the fallback logic for payload_m

The fallback logic for vector_hnsw_payload_m is intuitive but somewhat complex. Adding a comment would help future maintainers understand why we fall back to vector_hnsw_m if the payload-specific setting isn't found.

 let vector_hnsw_payload_m = vector_hnsw_config
     .map(|hnsw_config| hnsw_config.payload_m)
     .flatten()
-    .unwrap_or(config.hnsw_config.payload_m.unwrap_or(vector_hnsw_m));
+    .unwrap_or(config.hnsw_config.payload_m.unwrap_or(vector_hnsw_m)); // Fall back to global payload_m or vector_hnsw_m

116-124: Fix comment on line 117

The comment on line 117 states "HNSW disabled AND no filters" which is incorrect at this point in the code. By the time we reach this check, we've already confirmed that a multitenant filter exists (passed the check on line 106).

 if vector_hnsw_payload_m == 0 {
-    // HNSW disabled AND no filters
+    // HNSW disabled even with multitenant filter present
     return Err(CollectionError::strict_mode(
         format!(
             "Request is forbidden{vector_error_label} because vector indexing is disabled (hnsw_config.m = 0 and hnsw_config.payload_m = 0)"
         ),
         "Enable vector indexing, use a prefetch query with indexed vectors or enable strict mode `search_allow_exact` option",
     ));
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95c71fd and 993c366.

📒 Files selected for processing (1)
  • lib/collection/src/operations/verification/query.rs (6 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/collection/src/operations/verification/query.rs (6)
lib/segment/src/types.rs (1)
  • filter (2708-2710)
lib/segment/src/entry/entry_point.rs (1)
  • config (251-251)
lib/segment/src/segment/entry.rs (1)
  • config (583-585)
lib/collection/src/collection_manager/holders/proxy_segment.rs (1)
  • config (1086-1088)
lib/collection/src/operations/types.rs (1)
  • strict_mode (1120-1123)
lib/collection/src/collection/collection_ops.rs (1)
  • strict_mode_config (277-283)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consensus-compose
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: lint
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: storage-compat-test
  • GitHub Check: test-consistency
  • GitHub Check: test-low-resources
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: integration-tests-consensus
🔇 Additional comments (2)
lib/collection/src/operations/verification/query.rs (2)

95-104: Good implementation of multitenant filter detection

The approach to detect whether a filter targets multitenant indexed payload keys is clean and well-structured. This is the core functionality that enables the PR's objective of allowing fullscan for multitenant payload indexes in strict mode.


151-156: Correctly updated all method invocations

All invocations of the check_fullscan method have been properly updated to include the filter parameter. This ensures consistency throughout the codebase.

Also applies to: 203-208, 250-255

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/openapi/test_strictmode.py (2)

2237-2260: Consider adding descriptive comments for test sections

While the test is well-structured, adding comments that explain the purpose of each section would improve readability and make it easier for future developers to understand the test's intent.

 def test_strict_mode_multitenant_full_scan(full_collection_name):
     collection_name = full_collection_name
 
+    # Helper function that performs a filtered query using city=Berlin
     def filtered_query():
         return request_with_validation(
             api='/collections/{collection_name}/points/query',
             method="POST",
             path_params={'collection_name': collection_name},
             body={
                 "query": 2,
                 "filter": {
                     "must": [
                         {
                             "key": "city",
                             "match": {
                                 "value": "Berlin"
                             }
                         }
                     ]
                 },
                 "using": "dense-multi",
                 "limit": 5
             }
         )

2310-2324: Consider verifying the index state after configuration

Adding assertions to verify that the multitenant payload index was successfully created would make the test more robust against potential configuration failures.

     # add multitenant payload index
-    request_with_validation(
+    response = request_with_validation(
         api='/collections/{collection_name}/index',
         method="PUT",
         path_params={'collection_name': collection_name},
         query_params={'wait': 'true'},
         body={
             "field_name": "city",
             "field_schema": {
                 "type": "keyword",
                 "is_tenant": True,
             }
         }
-    ).raise_for_status()
+    )
+    assert response.ok
+    
+    # Verify the index was created with multitenant flag
+    indices_response = request_with_validation(
+        api='/collections/{collection_name}/indexes',
+        method="GET",
+        path_params={'collection_name': collection_name},
+    )
+    assert indices_response.ok
+    city_index = next((idx for idx in indices_response.json()['result'] if idx['field_name'] == 'city'), None)
+    assert city_index is not None
+    assert city_index['options'].get('is_tenant', False) is True
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 993c366 and 1eed5a8.

📒 Files selected for processing (1)
  • tests/openapi/test_strictmode.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: lint
  • GitHub Check: integration-tests-consensus
  • GitHub Check: storage-compat-test
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: integration-tests
🔇 Additional comments (4)
tests/openapi/test_strictmode.py (4)

2120-2121: Updated error message provides better guidance

The improved error message now offers more specific guidance by suggesting multiple ways to resolve the issue, including using tenant-specific filters, enabling global vector indexing, or enabling the strict mode search_allow_exact option. This is a helpful improvement for users encountering this restriction.


2175-2176: Similar error message update for consistency

This error message update follows the same pattern as the previous one, maintaining consistency in the error messaging throughout the codebase.


2178-2235: LGTM: New test verifies basic strict mode full scan behavior

This new test clearly demonstrates that full scan queries are initially allowed but are forbidden with a 400 error after disabling the HNSW index when strict mode disallows exact search. It establishes the baseline behavior before testing the multitenant index exception.


2236-2350: Well-structured test for multitenant payload index exception

This comprehensive test correctly verifies the PR's core functionality by:

  1. Confirming filtered queries initially work
  2. Showing they fail with strict mode + disabled HNSW
  3. Proving standard payload indexing doesn't resolve the issue
  4. Demonstrating that multitenant payload indexing requires payload_m > 0
  5. Verifying that the combination of multitenant payload index + enabled HNSW payload indexing allows queries to succeed

The test methodology is thorough and covers all the key scenarios needed to validate the new behavior.

@agourlay agourlay merged commit 2bba101 into dev May 22, 2025
17 checks passed
@agourlay agourlay deleted the strict-mode-allow-fullscan-multitenant branch May 22, 2025 08:42
generall added a commit that referenced this pull request May 22, 2025
* Strict mode allows fullscan for multitenant payload index

* different error for missing payload index in multitenant case

* handle payload_m

* add simple test

* handle hnsw.m not set

* add another simple test

* fix test

* fallback to global HSNW config

* new error status code

* do not block on global HNSW and improver error reporting

* clearer error reporting

* review fixes

* fix test error messages

---------

Co-authored-by: generall <andrey@vasnetsov.com>
n0x29a pushed a commit that referenced this pull request May 23, 2025
* Strict mode allows fullscan for multitenant payload index

* different error for missing payload index in multitenant case

* handle payload_m

* add simple test

* handle hnsw.m not set

* add another simple test

* fix test

* fallback to global HSNW config

* new error status code

* do not block on global HNSW and improver error reporting

* clearer error reporting

* review fixes

* fix test error messages

---------

Co-authored-by: generall <andrey@vasnetsov.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants