Strict mode allows fullscan for multitenant payload index #6498

agourlay · 2025-05-06T14:38:39Z

In a previous PR, fullscan was prohibited in strict mode when using search_allow_exact=false.

However we want to allow querying when no global HNSW is present if the query leverage a multitenant HNSW based payload index.

coderabbitai · 2025-05-07T11:35:22Z

📝 Walkthrough

"""

Walkthrough

The check_fullscan method in the Query implementation was updated to accept an optional filter parameter and now includes logic to allow full scans when filtering on multitenant indexed payload keys, even if HNSW vector indexing is disabled. The method retrieves vector-specific and payload-specific HNSW configuration values, falling back to global settings. It checks whether the filter's conditions target a multitenant payload key using the collection's payload key index schema and enforces strict mode restrictions accordingly. All invocations of check_fullscan were updated to pass the optional filter. Additionally, a targeted_key method was added to the Condition enum to identify the payload key targeted by a condition. Two new tests were added to verify strict mode behavior for full scans and filtered queries involving multitenant payload indexes under various HNSW index configurations.

Possibly related PRs

Strict mode detects full scan on query #6473: Introduced the original check_fullscan method and strict mode verification logic, which this PR extends by adding filter awareness and multitenant key checks.

Suggested reviewers

coszio
JojiiOfficial
"""

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

lib/collection/src/operations/verification/query.rs

agourlay · 2025-05-13T09:26:00Z

Dear reviewers, please review this PR 🙏

JojiiOfficial

Only one minor nit but otherwise ✅

lib/segment/src/types.rs

generall

see test

generall

see tests

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/openapi/test_strictmode.py (1)

2131-2244: Complete validation of multitenant payload index with strict mode fullscans.

This comprehensive test covers the core functionality of this PR by validating that:

Filtered queries work initially with disabled HNSW indexes

Enabling strict mode prevents filtered queries when both vector and payload HNSW are disabled

Adding standard or multitenant payload indexes alone doesn't allow queries in strict mode

Enabling HNSW payload-based indexing (payload_m=1) allows filtered queries to succeed

The test thoroughly verifies all combinations, ensuring the feature works as expected.

Consider adding a few inline comments explaining the key test stages, especially for the multitenant payload index verification steps, to make the test's logic more immediately clear to future readers.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44f9bc9 and 4e69463.

📒 Files selected for processing (3)

lib/collection/src/operations/verification/query.rs (6 hunks)
lib/segment/src/types.rs (1 hunks)
tests/openapi/test_strictmode.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

lib/collection/src/operations/verification/query.rs
lib/segment/src/types.rs

⏰ Context from checks skipped due to timeout of 90000ms (13)

GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-low-resources
GitHub Check: test-consistency
GitHub Check: Basic TLS/HTTPS tests
GitHub Check: test-consensus-compose
GitHub Check: integration-tests-consensus
GitHub Check: integration-tests
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: storage-compat-test
GitHub Check: lint

🔇 Additional comments (1)

tests/openapi/test_strictmode.py (1)

2073-2129: Nicely structured test for basic fullscan behavior in strict mode.

This test effectively validates that fullscans are allowed in strict mode with search_allow_exact=False only when the HNSW index is enabled. It confirms that disabling the HNSW index (m=0) properly prevents fullscans when strict mode is enabled.

generall

PATCH collections/benchmark
{
    "strict_mode_config": {
        "enabled": true,
        "search_allow_exact": false
    }
}


PATCH collections/benchmark
{
    "vectors": {
        "": {
            "hnsw_config": {
                "ef_construct": 128
            }
        }
    }
}

POST /collections/benchmark/points/query
{
  "query": 1
}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

lib/collection/src/operations/verification/query.rs (3)
73-77: Consider adding a comment explaining the .flatten() operation

The .flatten() method is used to simplify the nested Option structure that results from the vector configuration. While the code is correct, adding a clarifying comment would help explain the double-option unwrapping pattern you're using.
 let vector_hnsw_m = vector_hnsw_config
     .map(|hnsw_config| hnsw_config.m)
-    .flatten()
+    .flatten() // Flatten nested Option from hnsw_config.m
     .unwrap_or(config.hnsw_config.m);
78-81: Consider adding a comment to explain the fallback logic for payload_m

The fallback logic for vector_hnsw_payload_m is intuitive but somewhat complex. Adding a comment would help future maintainers understand why we fall back to vector_hnsw_m if the payload-specific setting isn't found.
 let vector_hnsw_payload_m = vector_hnsw_config
     .map(|hnsw_config| hnsw_config.payload_m)
     .flatten()
-    .unwrap_or(config.hnsw_config.payload_m.unwrap_or(vector_hnsw_m));
+    .unwrap_or(config.hnsw_config.payload_m.unwrap_or(vector_hnsw_m)); // Fall back to global payload_m or vector_hnsw_m
116-124: Fix comment on line 117

The comment on line 117 states "HNSW disabled AND no filters" which is incorrect at this point in the code. By the time we reach this check, we've already confirmed that a multitenant filter exists (passed the check on line 106).
 if vector_hnsw_payload_m == 0 {
-    // HNSW disabled AND no filters
+    // HNSW disabled even with multitenant filter present
     return Err(CollectionError::strict_mode(
         format!(
             "Request is forbidden{vector_error_label} because vector indexing is disabled (hnsw_config.m = 0 and hnsw_config.payload_m = 0)"
         ),
         "Enable vector indexing, use a prefetch query with indexed vectors or enable strict mode `search_allow_exact` option",
     ));
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95c71fd and 993c366.

📒 Files selected for processing (1)

lib/collection/src/operations/verification/query.rs (6 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

lib/collection/src/operations/verification/query.rs (6)

lib/segment/src/types.rs (1)

filter (2708-2710)

lib/segment/src/entry/entry_point.rs (1)

config (251-251)

lib/segment/src/segment/entry.rs (1)

config (583-585)

lib/collection/src/collection_manager/holders/proxy_segment.rs (1)

config (1086-1088)

lib/collection/src/operations/types.rs (1)

strict_mode (1120-1123)

lib/collection/src/collection/collection_ops.rs (1)

strict_mode_config (277-283)

⏰ Context from checks skipped due to timeout of 90000ms (13)

GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-consensus-compose
GitHub Check: Basic TLS/HTTPS tests
GitHub Check: lint
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: storage-compat-test
GitHub Check: test-consistency
GitHub Check: test-low-resources
GitHub Check: rust-tests (windows-latest)
GitHub Check: integration-tests
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: integration-tests-consensus

🔇 Additional comments (2)

lib/collection/src/operations/verification/query.rs (2)

95-104: Good implementation of multitenant filter detection

The approach to detect whether a filter targets multitenant indexed payload keys is clean and well-structured. This is the core functionality that enables the PR's objective of allowing fullscan for multitenant payload indexes in strict mode.

151-156: Correctly updated all method invocations

All invocations of the check_fullscan method have been properly updated to include the filter parameter. This ensures consistency throughout the codebase.

Also applies to: 203-208, 250-255

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/openapi/test_strictmode.py (2)

2237-2260: Consider adding descriptive comments for test sections

While the test is well-structured, adding comments that explain the purpose of each section would improve readability and make it easier for future developers to understand the test's intent.

 def test_strict_mode_multitenant_full_scan(full_collection_name):
     collection_name = full_collection_name
 
+    # Helper function that performs a filtered query using city=Berlin
     def filtered_query():
         return request_with_validation(
             api='/collections/{collection_name}/points/query',
             method="POST",
             path_params={'collection_name': collection_name},
             body={
                 "query": 2,
                 "filter": {
                     "must": [
                         {
                             "key": "city",
                             "match": {
                                 "value": "Berlin"
                             }
                         }
                     ]
                 },
                 "using": "dense-multi",
                 "limit": 5
             }
         )

2310-2324: Consider verifying the index state after configuration

Adding assertions to verify that the multitenant payload index was successfully created would make the test more robust against potential configuration failures.

     # add multitenant payload index
-    request_with_validation(
+    response = request_with_validation(
         api='/collections/{collection_name}/index',
         method="PUT",
         path_params={'collection_name': collection_name},
         query_params={'wait': 'true'},
         body={
             "field_name": "city",
             "field_schema": {
                 "type": "keyword",
                 "is_tenant": True,
             }
         }
-    ).raise_for_status()
+    )
+    assert response.ok
+    
+    # Verify the index was created with multitenant flag
+    indices_response = request_with_validation(
+        api='/collections/{collection_name}/indexes',
+        method="GET",
+        path_params={'collection_name': collection_name},
+    )
+    assert indices_response.ok
+    city_index = next((idx for idx in indices_response.json()['result'] if idx['field_name'] == 'city'), None)
+    assert city_index is not None
+    assert city_index['options'].get('is_tenant', False) is True

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 993c366 and 1eed5a8.

📒 Files selected for processing (1)

tests/openapi/test_strictmode.py (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (11)

GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-low-resources
GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-consistency
GitHub Check: lint
GitHub Check: integration-tests-consensus
GitHub Check: storage-compat-test
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: integration-tests

🔇 Additional comments (4)

tests/openapi/test_strictmode.py (4)

2120-2121: Updated error message provides better guidance

The improved error message now offers more specific guidance by suggesting multiple ways to resolve the issue, including using tenant-specific filters, enabling global vector indexing, or enabling the strict mode search_allow_exact option. This is a helpful improvement for users encountering this restriction.

2175-2176: Similar error message update for consistency

This error message update follows the same pattern as the previous one, maintaining consistency in the error messaging throughout the codebase.

2178-2235: LGTM: New test verifies basic strict mode full scan behavior

This new test clearly demonstrates that full scan queries are initially allowed but are forbidden with a 400 error after disabling the HNSW index when strict mode disallows exact search. It establishes the baseline behavior before testing the multitenant index exception.

2236-2350: Well-structured test for multitenant payload index exception

This comprehensive test correctly verifies the PR's core functionality by:

Confirming filtered queries initially work

Showing they fail with strict mode + disabled HNSW

Proving standard payload indexing doesn't resolve the issue

Demonstrating that multitenant payload indexing requires payload_m > 0

Verifying that the combination of multitenant payload index + enabled HNSW payload indexing allows queries to succeed

The test methodology is thorough and covers all the key scenarios needed to validate the new behavior.

* Strict mode allows fullscan for multitenant payload index * different error for missing payload index in multitenant case * handle payload_m * add simple test * handle hnsw.m not set * add another simple test * fix test * fallback to global HSNW config * new error status code * do not block on global HNSW and improver error reporting * clearer error reporting * review fixes * fix test error messages --------- Co-authored-by: generall <andrey@vasnetsov.com>

github-actions bot mentioned this pull request May 6, 2025

Flaky test multivector_filtrable_hnsw_test::test_multi_filterable_hnsw::case_8_recosumscores_multi #6313

Closed

agourlay marked this pull request as ready for review May 7, 2025 11:33

agourlay requested review from JojiiOfficial, coszio and generall May 7, 2025 11:33

github-actions bot mentioned this pull request May 7, 2025

Flaky test index::tests::hw_counter_test::test_hw_counter_for_plain_sparse_search #6231

Closed

generall reviewed May 7, 2025

View reviewed changes

lib/collection/src/operations/verification/query.rs Outdated Show resolved Hide resolved

agourlay force-pushed the strict-mode-allow-fullscan-multitenant branch from bcd1b0a to fbcb82a Compare May 7, 2025 17:11

agourlay requested a review from generall May 7, 2025 20:16

JojiiOfficial approved these changes May 14, 2025

View reviewed changes

lib/segment/src/types.rs Show resolved Hide resolved

agourlay force-pushed the strict-mode-allow-fullscan-multitenant branch from 3e4d3c0 to a707190 Compare May 14, 2025 10:13

generall requested changes May 18, 2025

View reviewed changes

agourlay force-pushed the strict-mode-allow-fullscan-multitenant branch from fc57363 to ae02365 Compare May 19, 2025 08:06

agourlay requested a review from generall May 19, 2025 09:13

This comment was marked as resolved.

Sign in to view

generall requested changes May 19, 2025

View reviewed changes

agourlay force-pushed the strict-mode-allow-fullscan-multitenant branch from 44f9bc9 to 4e69463 Compare May 20, 2025 10:44

coderabbitai bot reviewed May 20, 2025

View reviewed changes

agourlay and others added 8 commits May 21, 2025 14:45

Strict mode allows fullscan for multitenant payload index

0a240eb

different error for missing payload index in multitenant case

54d6f7c

handle payload_m

5c516b2

add simple test

adcfa3d

handle hnsw.m not set

2d9f6a4

add another simple test

41dad78

fix test

dc39f40

fallback to global HSNW config

19d0ce3

agourlay force-pushed the strict-mode-allow-fullscan-multitenant branch from 5c6d387 to 19d0ce3 Compare May 21, 2025 12:55

new error status code

eaa3768

agourlay added 2 commits May 21, 2025 16:14

do not block on global HNSW and improver error reporting

680acf0

clearer error reporting

95c71fd

generall requested changes May 21, 2025

View reviewed changes

review fixes

993c366

coderabbitai bot reviewed May 21, 2025

View reviewed changes

fix test error messages

1eed5a8

coderabbitai bot reviewed May 21, 2025

View reviewed changes

generall approved these changes May 22, 2025

View reviewed changes

agourlay merged commit 2bba101 into dev May 22, 2025
17 checks passed

agourlay deleted the strict-mode-allow-fullscan-multitenant branch May 22, 2025 08:42

coderabbitai bot mentioned this pull request May 23, 2025

Add inference usage tracking to REST API #6563

Merged

Strict mode allows fullscan for multitenant payload index #6498

Strict mode allows fullscan for multitenant payload index #6498

Uh oh!

Conversation

agourlay commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Uh oh!

agourlay commented May 13, 2025

Uh oh!

JojiiOfficial left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

generall left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

generall left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

generall left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

agourlay commented May 6, 2025 •

edited

Loading

coderabbitai bot commented May 7, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)