Shard-level limit undersampling #6056

generall · 2025-02-24T23:11:50Z

Apply the same under-sampling as we do for segments, but on the shard level.
Motivation: large-scale deployments with high-limit queries could benefit from this optimization, as data transfer between shards becoming a bottleneck.

Some considerations:

Unlike segments, shards are guaranteed to be randomly and independently distributed
Re-requests in shards are too expensive, so we don't make them
Without this optimization, high limit queries can not efficiently scale horizontally

coderabbitai · 2025-02-24T23:11:58Z

📝 Walkthrough

Walkthrough

The changes introduce two new constants and a method in the collection query logic to manage undersampling when high limits are requested across multiple shards. The method checks conditions such as the number of shards and auto-sharding, adjusting batch query requests using a probabilistic sampling function if necessary. Additionally, the visibility of a module related to probabilistic search sampling has been updated from private to public, and the corresponding import path in the segments searcher file has been modified to reflect the renamed module.

Changes

File(s)	Change Summary
`lib/.../collection/query.rs`	Added constants `SHARD_QUERY_SUBSAMPLING_LIMIT` and `MORE_ENSURANCE_FACTOR`, and new method `modify_shard_query_for_undersampling_limits` to adjust batch requests for undersampling. Integrated this method into `batch_query_shards_concurrently`.
`lib/.../collection_manager/mod.rs` `lib/.../collection_manager/segments_searcher.rs`	Updated module declaration from `probabilistic_segment_search_sampling` (private) to `probabilistic_search_sampling` (public) and changed the import path for `find_search_sampling_over_point_distribution` accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Collection as Collection
    participant Modifier as modify_shard_query_for_undersampling_limits
    participant Shards as Shard Queries

    Client->>Collection: batch_query_shards_concurrently(batch_request, num_shards, auto_sharding)
    Collection->>Modifier: Invoke modify_shard_query_for_undersampling_limits(...)
    Modifier-->>Collection: Return modified batch request
    Collection->>Shards: Dispatch modified queries concurrently

✨ Finishing Touches

📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

generall · 2025-02-24T23:14:19Z

lib/collection/src/collection/query.rs

+        let batch_request = Self::modify_shard_query_for_undersampling_limits(
+            batch_request,
+            target_shards.len(),
+            is_auto_sharding,
+        );


I was trying to find a better place for this modification, but was not able to, because:

Higher level functions require to know original limit and offset of the query for proper merge operation, so we can't modify original query higher in the call stack

Modification of the limit lower in the call stack also problematic, as we need to know "context" which depends on the shard selection. And we should not know about shard selector inside a shard.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

lib/collection/src/collection/query.rs (2)

64-64: Consider making the subsampling limit configurable.

SHARD_QUERY_SUBSAMPLING_LIMIT is currently a constant (128). If different deployments require tuning for optimal performance, consider making it a parameter in configuration or environment variables, rather than a hard-coded constant.

143-148: Check potential overhead of counting unique shard keys.

Using iter().unique().count() is appropriate here but introduces a small overhead. Since this runs frequently, ensure performance is acceptable for large numbers of shards.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ec07b6 and ef1c6ca.

📒 Files selected for processing (3)

lib/collection/src/collection/query.rs (3 hunks)
lib/collection/src/collection_manager/mod.rs (1 hunks)
lib/collection/src/collection_manager/segments_searcher.rs (1 hunks)

✅ Files skipped from review due to trivial changes (1)

lib/collection/src/collection_manager/segments_searcher.rs

⏰ Context from checks skipped due to timeout of 90000ms (13)

GitHub Check: Basic TLS/HTTPS tests
GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-low-resources
GitHub Check: test
GitHub Check: test (macos-latest)
GitHub Check: test-consistency
GitHub Check: test (windows-latest)
GitHub Check: test-consensus
GitHub Check: test
GitHub Check: test (ubuntu-latest)
GitHub Check: test-consensus-compose
GitHub Check: test

🔇 Additional comments (4)

lib/collection/src/collection_manager/mod.rs (1)

6-6: Make sure public module visibility is intentional.

Changing the module from private to public allows external crates or modules to directly reference its contents. Verify that this increased visibility is desired, and consider adding module-level documentation for clarity.

lib/collection/src/collection/query.rs (3)

18-18: Consistent import path rename.

Importing find_search_sampling_over_point_distribution from the renamed probabilistic_search_sampling module aligns with the new public module declaration.

66-128: Ensure offset handling remains correct under subsampling.

This function resets the offset to zero, assuming offsets are handled at the collection level. Confirm that the final result set accurately respects user-specified offsets after merging shards.

151-158: Auto-sharding logic clarity.

is_auto_sharding is set to true when there is exactly one unique shard key. This condition works for typical auto-sharding scenarios, but ensure it also covers corner cases (e.g., partial user-specified shard subsets).

generall · 2025-02-24T23:29:34Z

How to test:

create collection with 3 shards (can be local) and 10000 points. Disable hnsw index.

Compare exact search vs non-exact search with limit > 128, the difference should be within 0.999 (1 error per 100 requests with 10 shards distribution)

I am not sure how and if to build an automated test, since it would require move high volumes of fixture data

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

lib/collection/src/collection/query.rs (2)

120-120: Consider making the ensurance factor a constant.

The magic number 1.2 would be better as a named constant to improve maintainability and make it easier to tune if needed.

+    const MORE_ENSURANCE_FACTOR: f64 = 1.2;
-    let more_ensurance_factor = 1.2;

152-160: Consider extracting auto-sharding detection logic.

The auto-sharding detection logic and its explanation in comments would be clearer as a separate method with a descriptive name.

+    fn is_auto_sharding_enabled(target_shards: &[(Arc<Shard>, Option<ShardKey>)]) -> bool {
+        // Auto-sharding happens when we are only querying shards with the same shard key.
+        // This occurs either when:
+        // 1. Querying a specific shard key, or
+        // 2. Querying all shards with no shard keys specified
+        target_shards
+            .iter()
+            .map(|(_, shard_key)| shard_key)
+            .unique()
+            .count() == 1
+    }
+
     let num_unique_shard_keys = target_shards
-        .iter()
-        .map(|(_, shard_key)| shard_key)
-        .unique()
-        .count();
-    // Auto-sharding happens when we are only querying shards with _the_same_ shard key.
-    // It either might be when we are querying a specific shard key
-    // OR when we are querying all shards with no shard keys specified.
-    let is_auto_sharding = num_unique_shard_keys == 1;
+    let is_auto_sharding = Self::is_auto_sharding_enabled(&target_shards);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef1c6ca and 1afba85.

📒 Files selected for processing (1)

lib/collection/src/collection/query.rs (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (13)

GitHub Check: Basic TLS/HTTPS tests
GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-low-resources
GitHub Check: test-consistency
GitHub Check: test (macos-latest)
GitHub Check: test-consensus
GitHub Check: test (windows-latest)
GitHub Check: test-consensus-compose
GitHub Check: test
GitHub Check: test
GitHub Check: test
GitHub Check: test (ubuntu-latest)

🔇 Additional comments (4)

lib/collection/src/collection/query.rs (4)

18-18: LGTM!

The import is correctly placed and aligns with the PR's objective of implementing shard-level limit undersampling.

66-78: LGTM! Clear and well-documented rationale.

The documentation clearly explains the optimization strategy and its benefits over segment-level sampling, particularly noting the guaranteed random distribution in auto-sharding.

131-133:
✅ Verification successful

Verify offset handling implications.

The comment states that "Offset is handled on the collection level", but we should verify this doesn't affect result accuracy:

Does resetting offset to 0 maintain result ordering?

Could this affect pagination in edge cases?

🏁 Script executed:
#!/bin/bash
# Search for offset handling in collection-level code
rg -A 5 "offset.*collection level" --type rust
Length of output: 1685
Offset Handling Implementation Validated

The investigation confirms that resetting new_request.offset to 0 in lib/collection/src/collection/query.rs is intentional. The code in lib/collection/src/operations/universal_query/planned_query.rs further clarifies that final offset adjustments occur at the collection level. This design ensures that result ordering and pagination are preserved without adverse side effects.

64-64:
❓ Verification inconclusive

Verify the subsampling limit threshold.

The threshold of 128 seems reasonable, but it would be helpful to validate this choice:

Is this value derived from empirical testing?

How does it compare to similar thresholds in segment-level sampling?

🌐 Web query:
What is a typical threshold for query result sampling in distributed search systems?
💡 Result:

Threshold estimation for query result sampling in distributed search systems often involves trade-offs between accuracy, computational cost, and overestimate risks. Common approaches include:

1. Sample Rate Thresholds

Random sampling typically uses rates between 1%–10% of the collection[1][2]. For example:

A 1% sample might be used for preliminary estimates in large-scale systems[1]

Higher rates (5–10%) improve accuracy at increased computational cost[2][4]

2. Statistical Confidence Thresholds
Methods like QBS-PL (Query-Based Sampling with Predictive Likelihood) use improvement thresholds (ε) to determine when to stop sampling:

ε = 0.05–0.10 is common for judging sufficient convergence[4]

This means sampling continues until additional iterations improve estimates by <5–10%

3. Hybrid Quantile-Sampling Methods
Modern approaches combine:

Quantile thresholds (precomputed top-k scores for term combinations)[1][2]

Adaptive sampling budgets (e.g., retrieving 80% of target documents from sample prefixes)[1][2]

For top-k processing algorithms like WAND/MaxScore:
# Typical parameterization from research
sample_rate = 0.01  # 1% sampling
k_prime = max(100, int(1.5 * k))  # Adjusted sample target
epsilon = 0.05      # 5% improvement threshold[4]
Key considerations:

Lower thresholds (ε=0.01) increase accuracy but require 2–3× more samples[4]

Systems handling diverse queries often use dynamic thresholds based on query complexity[1][2]

State-of-the-art methods achieve mean underprediction fractions (MUF) of 0.15–0.25 compared to ideal thresholds[2]

Current research shows optimal results with hybrid approaches combining quantile-based lower bounds with controlled random sampling[2][4].

Citations:

1: http://engineering.nyu.edu/~suel/papers/thresh-cikm20.pdf

2: https://arxiv.org/html/2412.10701v1

3: https://hal.science/hal-04373202/document

4: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=6fefbc6c9602c089dce4112f8abb859f9603fe33

5: https://community.splunk.com/t5/Monitoring-Splunk/Is-it-possible-to-get-a-quot-sample-quot-set-of-search-results/m-p/15697

6: https://www.ibm.com/docs/bg/SSEV5M_6.0.0/com.ibm.iis.gnm.searchingnames.doc/topics/gnr_nh_api_compparms_name_threshold.html

7: https://academic.oup.com/comjnl/article-pdf/34/6/551/984688/34-6-551.pdf

8: http://www.cs.iit.edu/~yee/classes/cs595f03/100903_3.html
Confirm the rationale behind the subsampling threshold.

The constant SHARD_QUERY_SUBSAMPLING_LIMIT = 128 is set as an absolute limit for when undersampling is applied. However, typical distributed search systems often base such thresholds on dynamic, percentage-based, or empirically derived values—with sample rates usually ranging from 1% to 10% or using adaptive mechanisms for convergence. Please clarify:

Is the value 128 derived from empirical testing or performance benchmarks?

How does this fixed threshold compare to similar values (or adaptive thresholds) used in segment-level sampling?

Would a dynamic or configurable threshold be more appropriate given the trade-offs between accuracy and computational cost?

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

lib/collection/src/collection/query.rs (2)

73-139: Well-structured implementation with room for documentation improvements.

The implementation is solid with appropriate early returns and clear logic flow. Consider these documentation improvements:

 /// Unlike segments, however, the cost of re-requesting the data is much higher for shards.
-/// So we "accept" the risk of not getting enough results.
+/// So we accept a small probability of getting fewer results than requested,
+/// as the cost of re-requesting data from shards outweighs the benefit of perfect accuracy.
 
-/// Example: 1000 limit, 10 shards
-/// 1.0 / 10 * 1.2 = 0.12
-/// lambda = 0.12 * 1000 = 120
-/// Which is equal to 171 limit per shard
+/// Example calculation:
+/// - Input: limit = 1000, num_shards = 10
+/// - Base probability: 1.0 / num_shards = 0.1
+/// - Adjusted probability: 0.1 * MORE_ENSURANCE_FACTOR = 0.12
+/// - Lambda: 0.12 * limit = 120
+/// - Result: undersample_limit = 171 per shard
+/// This ensures we get enough results while reducing data transfer.
 
+/// Note: Offset is handled at the collection level to ensure correct
+/// global ordering of results across all shards.

154-168: Consider clarifying the auto-sharding detection logic.

The auto-sharding detection is correct but could benefit from more explicit documentation.

-        // Auto-sharding happens when we are only querying shards with _the_same_ shard key.
-        // It either might be when we are querying a specific shard key
-        // OR when we are querying all shards with no shard keys specified.
+        // Auto-sharding is detected when all target shards have the same shard key.
+        // This occurs in two scenarios:
+        // 1. When explicitly querying a specific shard key
+        // 2. When querying all shards with no shard key specified (default case)
+        // In both cases, we can safely apply undersampling as the data distribution
+        // is guaranteed to be random across these shards.
         let is_auto_sharding = num_unique_shard_keys == 1;

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1afba85 and 46f2379.

📒 Files selected for processing (1)

lib/collection/src/collection/query.rs (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (13)

GitHub Check: test-snapshot-operations-s3-minio
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-low-resources
GitHub Check: Basic TLS/HTTPS tests
GitHub Check: test-consistency
GitHub Check: test-consensus-compose
GitHub Check: test (macos-latest)
GitHub Check: test
GitHub Check: test (windows-latest)
GitHub Check: test-consensus
GitHub Check: test (ubuntu-latest)
GitHub Check: test
GitHub Check: test

🔇 Additional comments (2)

lib/collection/src/collection/query.rs (2)

18-18: LGTM!

The import is correctly placed and aligns with the PR's objective of implementing shard-level undersampling.

64-71: LGTM! Constants are well-chosen and documented.

The values align with the PR objectives:

SHARD_QUERY_SUBSAMPLING_LIMIT = 128 matches the testing threshold mentioned in the PR

MORE_ENSURANCE_FACTOR = 1.2 provides a reasonable 20% safety margin

Please verify these constants through the testing approach mentioned in the PR:

Create a collection with 3 shards and 10,000 points

Compare exact vs non-exact search results with limit > 128

Confirm difference is within 0.999 (max 1 error per 100 requests)

timvisee

I have two high level questions:

timvisee · 2025-02-25T09:17:53Z

lib/collection/src/collection/query.rs

+    /// Unlike segments, however, the cost of re-requesting the data is much higher for shards.
+    /// So we "accept" the risk of not getting enough results.


If it's so unlikely, can we still do it in case we don't get enough results?

I imagine a latency spike to be a bit better than getting less results. Getting less results might make the user incorrectly think they have 'reached the end' or 'exhausted all'.

If we basically never expect this to happen, a latency spike might be a 'safer' option.

Wdyt?

We can, it is a bit annoying to implement with our batching structure, would be at least twice as many changes

Fair enough. And, a user would still be able to use exact=true to bypass this.

timvisee · 2025-02-25T09:36:53Z

lib/collection/src/collection/query.rs

+
+        for request in batch_request.iter() {
+            let mut new_request = request.clone();
+            let request_limit = new_request.limit + new_request.offset;


Should we not go lower than the offset, to not destabilize results before our offset?

I understand that this wasn't 100% accurate or deterministic before, but I wonder if this might make it worse, amplifying inaccuracies, making the results all over the place.

To compensate the accuracy user might increase hnsw_ef, it will have the same effect as higher limits, but won't cause network/IO saturation as bad as limit

agourlay · 2025-02-25T10:31:17Z

Do you maybe have some performance numbers to share based on a large realistic test deployment?

generall · 2025-02-25T11:36:17Z

Do you maybe have some performance numbers to share based on a large realistic test deployment?

yes, 10 machines cluster with 10k limit takes 2 seconds per request, with 1k limit it takes 0.1 second, everything in RAM (including payload)

coszio · 2025-02-25T16:04:46Z

lib/collection/src/collection/query.rs

@@ -60,6 +61,83 @@ impl Collection {
        Ok(results.into_iter().next().unwrap())
    }

+    /// If the query limit above this value, it will be a subject to undersampling.
+    const SHARD_QUERY_SUBSAMPLING_LIMIT: usize = 128;


Why specifically this number? For segments we use params.hnsw_ef

no specific reason

hnsw_ef was right choise for segment, cause we wanted to minimize HNSW overhead for large limits. This is not relevant for shard-level, where bottleneck is the networking

generall · 2025-02-25T18:48:39Z

@agourlay added a debug message

* poisson-based undersampling for shard level search * fmt * give a bit more ensurance to undersampling * review fixes * add debug log in case undersampling can lead to accuracy loss

generall added 2 commits February 25, 2025 00:04

poisson-based undersampling for shard level search

d51ffbb

fmt

ef1c6ca

generall requested review from agourlay and timvisee February 24, 2025 23:11

generall commented Feb 24, 2025

View reviewed changes

coderabbitai bot reviewed Feb 24, 2025

View reviewed changes

github-actions bot mentioned this pull request Feb 24, 2025

Flaky test multivector_filtrable_hnsw_test::test_multi_filterable_hnsw::case_5_recommend_eq #5902

Closed

give a bit more ensurance to undersampling

1afba85

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

review fixes

46f2379

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

timvisee reviewed Feb 25, 2025

View reviewed changes

coszio reviewed Feb 25, 2025

View reviewed changes

add debug log in case undersampling can lead to accuracy loss

25c31a2

timvisee approved these changes Feb 26, 2025

View reviewed changes

agourlay approved these changes Feb 26, 2025

View reviewed changes

generall merged commit ff9a21f into dev Feb 26, 2025
17 checks passed

generall deleted the poisson-shard-search branch February 26, 2025 09:14

timvisee mentioned this pull request Mar 21, 2025

Bump version to 1.13.5 #6223

Merged

		/// Unlike segments, however, the cost of re-requesting the data is much higher for shards.
		/// So we "accept" the risk of not getting enough results.

Shard-level limit undersampling #6056

Shard-level limit undersampling #6056

Uh oh!

Conversation

generall commented Feb 24, 2025

Uh oh!

coderabbitai bot commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

generall commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

timvisee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agourlay commented Feb 25, 2025

Uh oh!

generall commented Feb 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

generall commented Feb 25, 2025

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Feb 24, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

generall commented Feb 24, 2025 •

edited

Loading