-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Strict mode allows fullscan for multitenant payload index #6498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 Walkthrough""" WalkthroughThe Possibly related PRs
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
bcd1b0a
to
fbcb82a
Compare
Dear reviewers, please review this PR 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one minor nit but otherwise ✅
3e4d3c0
to
a707190
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see test
fc57363
to
ae02365
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see tests
44f9bc9
to
4e69463
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/openapi/test_strictmode.py (1)
2131-2244
: Complete validation of multitenant payload index with strict mode fullscans.This comprehensive test covers the core functionality of this PR by validating that:
- Filtered queries work initially with disabled HNSW indexes
- Enabling strict mode prevents filtered queries when both vector and payload HNSW are disabled
- Adding standard or multitenant payload indexes alone doesn't allow queries in strict mode
- Enabling HNSW payload-based indexing (
payload_m=1
) allows filtered queries to succeedThe test thoroughly verifies all combinations, ensuring the feature works as expected.
Consider adding a few inline comments explaining the key test stages, especially for the multitenant payload index verification steps, to make the test's logic more immediately clear to future readers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/collection/src/operations/verification/query.rs
(6 hunks)lib/segment/src/types.rs
(1 hunks)tests/openapi/test_strictmode.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/collection/src/operations/verification/query.rs
- lib/segment/src/types.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: storage-compat-test
- GitHub Check: lint
🔇 Additional comments (1)
tests/openapi/test_strictmode.py (1)
2073-2129
: Nicely structured test for basic fullscan behavior in strict mode.This test effectively validates that fullscans are allowed in strict mode with
search_allow_exact=False
only when the HNSW index is enabled. It confirms that disabling the HNSW index (m=0) properly prevents fullscans when strict mode is enabled.
5c6d387
to
19d0ce3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PATCH collections/benchmark
{
"strict_mode_config": {
"enabled": true,
"search_allow_exact": false
}
}
PATCH collections/benchmark
{
"vectors": {
"": {
"hnsw_config": {
"ef_construct": 128
}
}
}
}
POST /collections/benchmark/points/query
{
"query": 1
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/collection/src/operations/verification/query.rs (3)
73-77
: Consider adding a comment explaining the.flatten()
operationThe
.flatten()
method is used to simplify the nestedOption
structure that results from the vector configuration. While the code is correct, adding a clarifying comment would help explain the double-option unwrapping pattern you're using.let vector_hnsw_m = vector_hnsw_config .map(|hnsw_config| hnsw_config.m) - .flatten() + .flatten() // Flatten nested Option from hnsw_config.m .unwrap_or(config.hnsw_config.m);
78-81
: Consider adding a comment to explain the fallback logic for payload_mThe fallback logic for
vector_hnsw_payload_m
is intuitive but somewhat complex. Adding a comment would help future maintainers understand why we fall back tovector_hnsw_m
if the payload-specific setting isn't found.let vector_hnsw_payload_m = vector_hnsw_config .map(|hnsw_config| hnsw_config.payload_m) .flatten() - .unwrap_or(config.hnsw_config.payload_m.unwrap_or(vector_hnsw_m)); + .unwrap_or(config.hnsw_config.payload_m.unwrap_or(vector_hnsw_m)); // Fall back to global payload_m or vector_hnsw_m
116-124
: Fix comment on line 117The comment on line 117 states "HNSW disabled AND no filters" which is incorrect at this point in the code. By the time we reach this check, we've already confirmed that a multitenant filter exists (passed the check on line 106).
if vector_hnsw_payload_m == 0 { - // HNSW disabled AND no filters + // HNSW disabled even with multitenant filter present return Err(CollectionError::strict_mode( format!( "Request is forbidden{vector_error_label} because vector indexing is disabled (hnsw_config.m = 0 and hnsw_config.payload_m = 0)" ), "Enable vector indexing, use a prefetch query with indexed vectors or enable strict mode `search_allow_exact` option", )); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/collection/src/operations/verification/query.rs
(6 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/collection/src/operations/verification/query.rs (6)
lib/segment/src/types.rs (1)
filter
(2708-2710)lib/segment/src/entry/entry_point.rs (1)
config
(251-251)lib/segment/src/segment/entry.rs (1)
config
(583-585)lib/collection/src/collection_manager/holders/proxy_segment.rs (1)
config
(1086-1088)lib/collection/src/operations/types.rs (1)
strict_mode
(1120-1123)lib/collection/src/collection/collection_ops.rs (1)
strict_mode_config
(277-283)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: lint
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: storage-compat-test
- GitHub Check: test-consistency
- GitHub Check: test-low-resources
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests-consensus
🔇 Additional comments (2)
lib/collection/src/operations/verification/query.rs (2)
95-104
: Good implementation of multitenant filter detectionThe approach to detect whether a filter targets multitenant indexed payload keys is clean and well-structured. This is the core functionality that enables the PR's objective of allowing fullscan for multitenant payload indexes in strict mode.
151-156
: Correctly updated all method invocationsAll invocations of the
check_fullscan
method have been properly updated to include the filter parameter. This ensures consistency throughout the codebase.Also applies to: 203-208, 250-255
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/openapi/test_strictmode.py (2)
2237-2260
: Consider adding descriptive comments for test sectionsWhile the test is well-structured, adding comments that explain the purpose of each section would improve readability and make it easier for future developers to understand the test's intent.
def test_strict_mode_multitenant_full_scan(full_collection_name): collection_name = full_collection_name + # Helper function that performs a filtered query using city=Berlin def filtered_query(): return request_with_validation( api='/collections/{collection_name}/points/query', method="POST", path_params={'collection_name': collection_name}, body={ "query": 2, "filter": { "must": [ { "key": "city", "match": { "value": "Berlin" } } ] }, "using": "dense-multi", "limit": 5 } )
2310-2324
: Consider verifying the index state after configurationAdding assertions to verify that the multitenant payload index was successfully created would make the test more robust against potential configuration failures.
# add multitenant payload index - request_with_validation( + response = request_with_validation( api='/collections/{collection_name}/index', method="PUT", path_params={'collection_name': collection_name}, query_params={'wait': 'true'}, body={ "field_name": "city", "field_schema": { "type": "keyword", "is_tenant": True, } } - ).raise_for_status() + ) + assert response.ok + + # Verify the index was created with multitenant flag + indices_response = request_with_validation( + api='/collections/{collection_name}/indexes', + method="GET", + path_params={'collection_name': collection_name}, + ) + assert indices_response.ok + city_index = next((idx for idx in indices_response.json()['result'] if idx['field_name'] == 'city'), None) + assert city_index is not None + assert city_index['options'].get('is_tenant', False) is True
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/openapi/test_strictmode.py
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consistency
- GitHub Check: lint
- GitHub Check: integration-tests-consensus
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests
🔇 Additional comments (4)
tests/openapi/test_strictmode.py (4)
2120-2121
: Updated error message provides better guidanceThe improved error message now offers more specific guidance by suggesting multiple ways to resolve the issue, including using tenant-specific filters, enabling global vector indexing, or enabling the strict mode
search_allow_exact
option. This is a helpful improvement for users encountering this restriction.
2175-2176
: Similar error message update for consistencyThis error message update follows the same pattern as the previous one, maintaining consistency in the error messaging throughout the codebase.
2178-2235
: LGTM: New test verifies basic strict mode full scan behaviorThis new test clearly demonstrates that full scan queries are initially allowed but are forbidden with a 400 error after disabling the HNSW index when strict mode disallows exact search. It establishes the baseline behavior before testing the multitenant index exception.
2236-2350
: Well-structured test for multitenant payload index exceptionThis comprehensive test correctly verifies the PR's core functionality by:
- Confirming filtered queries initially work
- Showing they fail with strict mode + disabled HNSW
- Proving standard payload indexing doesn't resolve the issue
- Demonstrating that multitenant payload indexing requires payload_m > 0
- Verifying that the combination of multitenant payload index + enabled HNSW payload indexing allows queries to succeed
The test methodology is thorough and covers all the key scenarios needed to validate the new behavior.
* Strict mode allows fullscan for multitenant payload index * different error for missing payload index in multitenant case * handle payload_m * add simple test * handle hnsw.m not set * add another simple test * fix test * fallback to global HSNW config * new error status code * do not block on global HNSW and improver error reporting * clearer error reporting * review fixes * fix test error messages --------- Co-authored-by: generall <andrey@vasnetsov.com>
* Strict mode allows fullscan for multitenant payload index * different error for missing payload index in multitenant case * handle payload_m * add simple test * handle hnsw.m not set * add another simple test * fix test * fallback to global HSNW config * new error status code * do not block on global HNSW and improver error reporting * clearer error reporting * review fixes * fix test error messages --------- Co-authored-by: generall <andrey@vasnetsov.com>
In a previous PR, fullscan was prohibited in strict mode when using
search_allow_exact=false
.However we want to allow querying when no global HNSW is present if the query leverage a multitenant HNSW based payload index.