-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[query] Optimize fetching payloads/vectors at local shard #6279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
with_vector: Some(false.into()), // will be fetched after aggregating from segments | ||
with_payload: Some(false.into()), // will be fetched after aggregating from segments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can look later into removing these params from CoreSearchRequest
and QueryScrollRequest
, but for now we can just disable them here.
📝 WalkthroughWalkthroughThis pull request restructures the query planning and rescoring logic by introducing a new Possibly related PRs
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/collection/src/shards/local_shard/query.rs (1)
90-100
: Consider concurrency-limiting if the number of root plans is large.
Mapping each root plan into its own future and then joining them is valid. However, ifrequest.root_plans
is large, this might spawn too many concurrent tasks, which can degrade performance. Consider using a concurrency-limited approach like a bounded semaphore or a streaming combinator if it becomes an issue.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
lib/collection/src/operations/universal_query/planned_query.rs
(18 hunks)lib/collection/src/shards/local_shard/formula_rescore.rs
(1 hunks)lib/collection/src/shards/local_shard/query.rs
(10 hunks)lib/segment/src/data_types/query_context.rs
(1 hunks)lib/segment/src/segment/entry.rs
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/collection/src/operations/universal_query/planned_query.rs (3)
lib/collection/src/shards/local_shard/query.rs (1)
rescore
(271-386)lib/collection/src/operations/universal_query/shard_query.rs (8)
from
(500-505)from
(509-514)from
(518-522)from
(526-530)from
(566-585)from
(589-609)from
(613-633)from
(637-667)lib/collection/src/operations/universal_query/collection_query.rs (2)
from
(609-614)from
(618-622)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test (ubuntu-latest)
🔇 Additional comments (28)
lib/segment/src/segment/entry.rs (1)
100-100
: Confirm the forced disable of payload and vector retrieval
By callingself.process_search_result(..., &false.into(), &false.into(), hw_counter)
, the code effectively disables both payload and vector fetching. This is likely intentional to avoid unnecessary data retrieval during the rescore step. However, please verify there are no downstream consumers that rely on payload or vector data in this scenario.lib/collection/src/shards/local_shard/formula_rescore.rs (1)
7-7
: Minor import addition
The import ofScoredPoint
appears aligned with the function’s return type and usage. No issues detected.lib/segment/src/data_types/query_context.rs (1)
13-14
: Removal of WithPayload and WithVector from imports
EliminatingWithPayload
andWithVector
from these imports is consistent with removing those fields fromFormulaContext
. This simplification avoids unused imports.lib/collection/src/operations/universal_query/planned_query.rs (15)
21-21
: New field for root plans
Introducingpub root_plans: Vec<RootPlan>
clarifies how merges and final retrieval parameters (vectors/payload) are handled at the root level of each query. This aligns well with the PR’s restructuring.
71-76
: Introduction of RootPlan struct
DefiningRootPlan
withmerge_plan
,with_vector
, andwith_payload
centralizes key query configuration in one place. This design appears maintainable and consistent with the shift away from passing these parameters everywhere.
119-126
: Construct MergePlan followed by RootPlan
Here,rescore_params
is set toNone
withinMergePlan
, then wrapped in a newRootPlan
. This flow supports passing the final retrieval flags (with_vector
,with_payload
) separately from any scoring logic.
140-150
: MergePlan with RescoreParams
When the scoring query needs intermediate results,Some(RescoreParams)
is created, and then wrapped inRootPlan
. The approach neatly partitions the scoring logic from data retrieval details.
164-166
: Explicitly disabling vector/payload in the initial search
Usingwith_vector: Some(false.into())
andwith_payload: Some(false.into())
offloads retrieval until after the partial results are combined. This prevents redundant data loading, matching the PR’s performance goals.
187-188
: OrderBy scroll request with no immediate fetch
Similarly,with_vector: false.into()
andwith_payload: false.into()
ensure that large data is not needlessly fetched.
223-224
: Random scroll request with no immediate fetch
Again, disabling payload and vector here avoids unnecessary overhead until final results are determined.
262-264
: Defaulting prefetch payload/vector to false
Establishingwith_payload
andwith_vector
asfalse
withinprefetches
is consistent with deferring retrieval until post-merge or final scoring steps.
283-284
: Recursion into nested prefetches
Invokingrecurse_prefetches
here continues the prefetch logic. The updated function signature nicely ensures all nested sources follow the same approach for deferring payload/vector retrieval.
494-529
: Unit test verifying multiple vector scoring layers
This test ensures that nested merges and final retrieval flags (with_vector
/with_payload
) are handled properly. The adjusted assertions confirm that final queries use the newRootPlan
model.
658-658
: Disable immediate vector retrieval in test
Applyingwith_vector: Some(WithVector::Bool(false))
aligns with the new pattern of delaying vector data retrieval to later steps.
671-672
: Disable immediate vector retrieval in the second core search
Same reasoning as above: skipping vector retrieval at this stage can improve performance by deferring large data loading.
989-995
: First RootPlan in mass test
The creation ofRootPlan
references an initial search index. No rescore parameters are applied. Well-structured for a simpler top-level query scenario.
997-1005
: Second RootPlan in mass test
Same pattern: building a root plan with no rescore. Helps confirm that multiple queries can coexist without retrieving vector/payload prematurely.
1006-1023
: Third RootPlan with nested Prefetch
Demonstrates a more complex scenario that includes an internalMergePlan
plus final assembly without immediate retrieval. Good coverage of multi-layer merges.lib/collection/src/shards/local_shard/query.rs (10)
12-12
: Use of new types for payload and vector handling looks proper.
No issues found. This import line cleanly aligns with the refactoring approach described in the summary.
121-121
: Flattening nested structures is appropriate.
Flattening the results is correct givenShardQueryResponse
is aVec<Vec<ScoredPoint>>
. No issues found with this approach.
141-154
: Re-collecting and enforcing record consistency looks good.
The logic properly filters out points that no longer exist or have been deleted. This maintains data integrity during retrieval. Implementation details appear sound.
161-199
: Consolidated plan resolution structure is clear and maintainable.
Introducingresolve_plan
provides a concise, centralized approach to merge prefetch results and fill payloads/vectors. This design improves readability and ensures consistent handling of plan logic.
289-289
: Method signature update for fusion rescore is consistent.
The updated call tofusion_rescore()
reflects the new scoring approach. No issues detected.
301-302
: Using default false for payload/vector is aligned with refactoring.
Hardcodingwith_payload
andwith_vector
to false here matches the design goal of deferring payload/vector fetching until final resolution.
329-330
: Switching to None for payload and vector selection is consistent.
This change matches the newly introduced optional approach, ensuring these fields are no longer forced on every search request.
353-354
: Invoking formula-based rescoring appears valid.
Calls toself.rescore_with_formula
now align with the revised parameters. Confirm that the new formula-based approach is tested for all relevant use cases.
412-412
: No substantial changes in logic.
This line simply returns the fused data. No issues to report.
108-113
:✅ Verification successful
Validate external calls to updated function signature.
The parameter and return type have changed from aVec<ScoredPoint>
toShardQueryResponse
, which is expected given the restructured design. Ensure that all callers handle the updated type without breaking any downstream logic.
🏁 Script executed:
#!/bin/bash # Verify that all calls to fill_with_payload_or_vectors have been updated accordingly rg -A 3 "fill_with_payload_or_vectors"Length of output: 678
Verified Update: Function Signature Change Confirmed
The updates in
lib/collection/src/shards/local_shard/query.rs
reflect the new signature forfill_with_payload_or_vectors
correctly. The function now accepts aShardQueryResponse
rather than aVec<ScoredPoint>
, and the call site within the same file passes parameters consistently with this change. No additional caller appears out of sync.Please ensure that all downstream logic that consumes the return value of this function properly handles the new
ShardQueryResponse
type.
with_vector: Some(false.into()), // will be fetched after aggregating from segments | ||
with_payload: Some(false.into()), // will be fetched after aggregating from segments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
with_vector: Some(false.into()), // will be fetched after aggregating from segments | |
with_payload: Some(false.into()), // will be fetched after aggregating from segments | |
with_vector: Some(false.into()), // if requested, will be fetched after aggregating from segments | |
with_payload: Some(false.into()), // if requested, will be fetched after aggregating from segments |
@@ -99,7 +97,7 @@ impl SegmentEntry for Segment { | |||
hw_counter, | |||
)?; | |||
|
|||
self.process_search_result(internal_results, with_payload, with_vector, hw_counter) | |||
self.process_search_result(internal_results, &false.into(), &false.into(), hw_counter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a parameter before, and it's false now.
Can we use false here, because if a user specified true, it'll be fetched in the separate step that is implemented in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Tested the scenario I had before, using hardware counters to measure the number of reads. This fixes the scenario, reading the expected amount of payloads. 🙌
* introduce `RootPlan` * handle changes at local shard
Changes the structure of the planned query to make sure we fetch payloads and vectors only at the last step before returning the results from a shard.
This improvement makes sure that the impact of fetching many large payloads is mitigated.
For instance, before this change, a single search of limit 100 in 500 segments would mean that 50,000 payloads would be fetched, while we only needed 100 of them