[query] Optimize fetching payloads/vectors at local shard #6279

coszio · 2025-03-31T16:24:19Z

Changes the structure of the planned query to make sure we fetch payloads and vectors only at the last step before returning the results from a shard.

This improvement makes sure that the impact of fetching many large payloads is mitigated.

For instance, before this change, a single search of limit 100 in 500 segments would mean that 50,000 payloads would be fetched, while we only needed 100 of them

coszio · 2025-03-31T16:30:58Z

lib/collection/src/operations/universal_query/planned_query.rs

+                        with_vector: Some(false.into()), // will be fetched after aggregating from segments
+                        with_payload: Some(false.into()), // will be fetched after aggregating from segments


we can look later into removing these params from CoreSearchRequest and QueryScrollRequest, but for now we can just disable them here.

coderabbitai · 2025-03-31T16:35:33Z

📝 Walkthrough

Walkthrough

This pull request restructures the query planning and rescoring logic by introducing a new RootPlan structure. The RootPlan encapsulates the previous MergePlan along with additional fields for vector and payload handling, which were formerly part of the RescoreParams struct. As a result, several methods across different modules—such as in planned queries, local shard formula rescore, and query execution—have been updated to remove redundant parameters (with_payload and with_vector) from their signatures. The changes also include modifying the handling of query responses through type adjustments and the addition of a plan resolution helper method. Other parts of the code, including the FormulaContext and the segment entry method, have been simplified by eliminating references to these parameters.

Possibly related PRs

[score boosting] fix proxy segment leakage #6170: Similar adjustments were made to remove with_payload and with_vector parameters, indicating a parallel effort in refactoring the method signatures.

Suggested reviewers

timvisee
generall
KShivendu

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

lib/collection/src/shards/local_shard/query.rs (1)

90-100: Consider concurrency-limiting if the number of root plans is large.
Mapping each root plan into its own future and then joining them is valid. However, if request.root_plans is large, this might spawn too many concurrent tasks, which can degrade performance. Consider using a concurrency-limited approach like a bounded semaphore or a streaming combinator if it becomes an issue.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00e2436 and b07e264.

📒 Files selected for processing (5)

lib/collection/src/operations/universal_query/planned_query.rs (18 hunks)
lib/collection/src/shards/local_shard/formula_rescore.rs (1 hunks)
lib/collection/src/shards/local_shard/query.rs (10 hunks)
lib/segment/src/data_types/query_context.rs (1 hunks)
lib/segment/src/segment/entry.rs (1 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

lib/collection/src/operations/universal_query/planned_query.rs (3)

lib/collection/src/shards/local_shard/query.rs (1)

rescore (271-386)

lib/collection/src/operations/universal_query/shard_query.rs (8)

from (500-505)

from (509-514)

from (518-522)

from (526-530)

from (566-585)

from (589-609)

from (613-633)

from (637-667)

lib/collection/src/operations/universal_query/collection_query.rs (2)

from (609-614)

from (618-622)

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-low-resources
GitHub Check: Basic TLS/HTTPS tests
GitHub Check: test-consensus
GitHub Check: test (macos-latest)
GitHub Check: test-consensus-compose
GitHub Check: test (windows-latest)
GitHub Check: test (ubuntu-latest)

🔇 Additional comments (28)

lib/segment/src/segment/entry.rs (1)

100-100: Confirm the forced disable of payload and vector retrieval
By calling self.process_search_result(..., &false.into(), &false.into(), hw_counter), the code effectively disables both payload and vector fetching. This is likely intentional to avoid unnecessary data retrieval during the rescore step. However, please verify there are no downstream consumers that rely on payload or vector data in this scenario.

lib/collection/src/shards/local_shard/formula_rescore.rs (1)

7-7: Minor import addition
The import of ScoredPoint appears aligned with the function’s return type and usage. No issues detected.

lib/segment/src/data_types/query_context.rs (1)

13-14: Removal of WithPayload and WithVector from imports
Eliminating WithPayload and WithVector from these imports is consistent with removing those fields from FormulaContext. This simplification avoids unused imports.

lib/collection/src/operations/universal_query/planned_query.rs (15)

21-21: New field for root plans
Introducing pub root_plans: Vec<RootPlan> clarifies how merges and final retrieval parameters (vectors/payload) are handled at the root level of each query. This aligns well with the PR’s restructuring.

71-76: Introduction of RootPlan struct
Defining RootPlan with merge_plan, with_vector, and with_payload centralizes key query configuration in one place. This design appears maintainable and consistent with the shift away from passing these parameters everywhere.

119-126: Construct MergePlan followed by RootPlan
Here, rescore_params is set to None within MergePlan, then wrapped in a new RootPlan. This flow supports passing the final retrieval flags (with_vector, with_payload) separately from any scoring logic.

140-150: MergePlan with RescoreParams
When the scoring query needs intermediate results, Some(RescoreParams) is created, and then wrapped in RootPlan. The approach neatly partitions the scoring logic from data retrieval details.

164-166: Explicitly disabling vector/payload in the initial search
Using with_vector: Some(false.into()) and with_payload: Some(false.into()) offloads retrieval until after the partial results are combined. This prevents redundant data loading, matching the PR’s performance goals.

187-188: OrderBy scroll request with no immediate fetch
Similarly, with_vector: false.into() and with_payload: false.into() ensure that large data is not needlessly fetched.

223-224: Random scroll request with no immediate fetch
Again, disabling payload and vector here avoids unnecessary overhead until final results are determined.

262-264: Defaulting prefetch payload/vector to false
Establishing with_payload and with_vector as false within prefetches is consistent with deferring retrieval until post-merge or final scoring steps.

283-284: Recursion into nested prefetches
Invoking recurse_prefetches here continues the prefetch logic. The updated function signature nicely ensures all nested sources follow the same approach for deferring payload/vector retrieval.

494-529: Unit test verifying multiple vector scoring layers
This test ensures that nested merges and final retrieval flags (with_vector/with_payload) are handled properly. The adjusted assertions confirm that final queries use the new RootPlan model.

658-658: Disable immediate vector retrieval in test
Applying with_vector: Some(WithVector::Bool(false)) aligns with the new pattern of delaying vector data retrieval to later steps.

671-672: Disable immediate vector retrieval in the second core search
Same reasoning as above: skipping vector retrieval at this stage can improve performance by deferring large data loading.

989-995: First RootPlan in mass test
The creation of RootPlan references an initial search index. No rescore parameters are applied. Well-structured for a simpler top-level query scenario.

997-1005: Second RootPlan in mass test
Same pattern: building a root plan with no rescore. Helps confirm that multiple queries can coexist without retrieving vector/payload prematurely.

1006-1023: Third RootPlan with nested Prefetch
Demonstrates a more complex scenario that includes an internal MergePlan plus final assembly without immediate retrieval. Good coverage of multi-layer merges.
lib/collection/src/shards/local_shard/query.rs (10)

12-12: Use of new types for payload and vector handling looks proper.
No issues found. This import line cleanly aligns with the refactoring approach described in the summary.

121-121: Flattening nested structures is appropriate.
Flattening the results is correct given ShardQueryResponse is a Vec<Vec<ScoredPoint>>. No issues found with this approach.

141-154: Re-collecting and enforcing record consistency looks good.
The logic properly filters out points that no longer exist or have been deleted. This maintains data integrity during retrieval. Implementation details appear sound.

161-199: Consolidated plan resolution structure is clear and maintainable.
Introducing resolve_plan provides a concise, centralized approach to merge prefetch results and fill payloads/vectors. This design improves readability and ensures consistent handling of plan logic.

289-289: Method signature update for fusion rescore is consistent.
The updated call to fusion_rescore() reflects the new scoring approach. No issues detected.

301-302: Using default false for payload/vector is aligned with refactoring.
Hardcoding with_payload and with_vector to false here matches the design goal of deferring payload/vector fetching until final resolution.

329-330: Switching to None for payload and vector selection is consistent.
This change matches the newly introduced optional approach, ensuring these fields are no longer forced on every search request.

353-354: Invoking formula-based rescoring appears valid.
Calls to self.rescore_with_formula now align with the revised parameters. Confirm that the new formula-based approach is tested for all relevant use cases.

412-412: No substantial changes in logic.
This line simply returns the fused data. No issues to report.

108-113:
✅ Verification successful

Validate external calls to updated function signature.
The parameter and return type have changed from a Vec<ScoredPoint> to ShardQueryResponse, which is expected given the restructured design. Ensure that all callers handle the updated type without breaking any downstream logic.

🏁 Script executed:
#!/bin/bash
# Verify that all calls to fill_with_payload_or_vectors have been updated accordingly
rg -A 3 "fill_with_payload_or_vectors"
Length of output: 678
Verified Update: Function Signature Change Confirmed

The updates in lib/collection/src/shards/local_shard/query.rs reflect the new signature for fill_with_payload_or_vectors correctly. The function now accepts a ShardQueryResponse rather than a Vec<ScoredPoint>, and the call site within the same file passes parameters consistently with this change. No additional caller appears out of sync.

Please ensure that all downstream logic that consumes the return value of this function properly handles the new ShardQueryResponse type.

timvisee · 2025-03-31T17:57:54Z

lib/collection/src/operations/universal_query/planned_query.rs

+                        with_vector: Some(false.into()), // will be fetched after aggregating from segments
+                        with_payload: Some(false.into()), // will be fetched after aggregating from segments


Nit:

Suggested change

with_vector: Some(false.into()), // will be fetched after aggregating from segments

with_payload: Some(false.into()), // will be fetched after aggregating from segments

with_vector: Some(false.into()), // if requested, will be fetched after aggregating from segments

with_payload: Some(false.into()), // if requested, will be fetched after aggregating from segments

timvisee · 2025-03-31T18:11:47Z

lib/segment/src/segment/entry.rs

@@ -99,7 +97,7 @@ impl SegmentEntry for Segment {
            hw_counter,
        )?;

-        self.process_search_result(internal_results, with_payload, with_vector, hw_counter)
+        self.process_search_result(internal_results, &false.into(), &false.into(), hw_counter)


It was a parameter before, and it's false now.

Can we use false here, because if a user specified true, it'll be fetched in the separate step that is implemented in this PR?

timvisee

Thanks!

Tested the scenario I had before, using hardware counters to measure the number of reads. This fixes the scenario, reading the expected amount of payloads. 🙌

* introduce `RootPlan` * handle changes at local shard

coszio added 2 commits March 31, 2025 12:11

introduce RootPlan

83fd92b

handle changes at local shard

b07e264

coszio commented Mar 31, 2025

View reviewed changes

coszio marked this pull request as ready for review March 31, 2025 16:31

coszio requested review from generall and timvisee March 31, 2025 16:31

coderabbitai bot reviewed Mar 31, 2025

View reviewed changes

github-actions bot mentioned this pull request Mar 31, 2025

Flaky test hnsw_discover_test::hnsw_discover_precision #2973

Open

timvisee mentioned this pull request Mar 31, 2025

Bump version to 1.13.6 #6277

Merged

timvisee reviewed Mar 31, 2025

View reviewed changes

timvisee approved these changes Mar 31, 2025

View reviewed changes

generall approved these changes Mar 31, 2025

View reviewed changes

coszio merged commit 162ab1f into dev Mar 31, 2025
17 checks passed

coszio deleted the optimize-fetching-at-local-shard branch March 31, 2025 18:50

timvisee pushed a commit that referenced this pull request Mar 31, 2025

[query] Optimize fetching payloads/vectors at local shard (#6279)

17bcc79

* introduce `RootPlan` * handle changes at local shard

coderabbitai bot mentioned this pull request Apr 2, 2025

A new test check that payload_io_read is withing some limit for a larger amount of segments #6301

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[query] Optimize fetching payloads/vectors at local shard #6279

[query] Optimize fetching payloads/vectors at local shard #6279

Uh oh!

coszio commented Mar 31, 2025

Uh oh!

coszio Mar 31, 2025

Uh oh!

coderabbitai bot commented Mar 31, 2025

Walkthrough

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

timvisee Mar 31, 2025

Uh oh!

timvisee Mar 31, 2025

Uh oh!

timvisee left a comment

Uh oh!

Uh oh!

Uh oh!

		with_vector: Some(false.into()), // will be fetched after aggregating from segments
		with_payload: Some(false.into()), // will be fetched after aggregating from segments

[query] Optimize fetching payloads/vectors at local shard #6279

[query] Optimize fetching payloads/vectors at local shard #6279

Uh oh!

Conversation

coszio commented Mar 31, 2025

Uh oh!

coszio Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Mar 31, 2025

Walkthrough

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

timvisee Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

timvisee Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

timvisee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CodeRabbit Configuration File (`.coderabbit.yaml`)