Skip to content

Conversation

generall
Copy link
Member

This PR fixes rare but possible loss of points in scroll results:

due to 2-stage search+retrieve approach in scroll, we can't guarantee that points retrieved on the first stage will be available and/or unchanged on the second stage. This might create a problem of either returning changed payload, which doesn't satisfy filters anymore, or compromise number of returned points.

It is especially critical for scroll request, which relies on number of returned points to generate a "next_page" offset, and is used in internal operations like shard streaming.

What I have tried and didn't work:

Current approach drawbacks:

  • Usage of the scroll-lock is not enforced by the compiler. Have to rely on carefulness
  • Update operations might be slightly delayed compared to original version

@generall generall requested review from timvisee and coszio March 31, 2025 23:26
Copy link
Contributor

coderabbitai bot commented Mar 31, 2025

📝 Walkthrough

Walkthrough

The changes introduce a new synchronization mechanism across several components handling collection and segment operations. A write lock is now acquired in the update process by using a blocking mechanism to lock a shared resource before proceeding with update actions. Additionally, a new public field for the scroll lock has been added to the SegmentHolder structure, exposing an asynchronous read-write lock for managing concurrent access. The scroll operations in the LocalShard now explicitly acquire a lock when reading segments, ensuring that segment data is safely accessed and then promptly releasing the lock after use. Furthermore, a function has been modified to be a constant function, enabling compile-time evaluation. Overall, these changes focus on enhancing thread safety during concurrent read and write operations without modifying the high-level control flow or public APIs beyond the addition of a new field.

Suggested reviewers

  • generall

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b65d00 and 7f19e87.

📒 Files selected for processing (3)
  • lib/collection/src/operations/verification/mod.rs (1 hunks)
  • lib/storage/src/content_manager/snapshots/recover.rs (1 hunks)
  • lib/storage/src/content_manager/toc/mod.rs (2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/storage/src/content_manager/snapshots/recover.rs (1)
lib/storage/src/dispatcher.rs (1)
  • toc (48-54)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (3)
lib/storage/src/content_manager/toc/mod.rs (1)

690-692: New method added for runtime handle access.

This addition exposes the general_runtime handle to other parts of the crate, enabling better control over where asynchronous tasks are executed. This aligns with the PR's goal of ensuring operation consistency by giving more control over task scheduling.

lib/collection/src/operations/verification/mod.rs (1)

23-23: Conversion to a const function enhances compile-time evaluation.

Converting new_unchecked_verification_pass to a const fn allows it to be evaluated at compile time, potentially offering performance improvements in contexts where this function is called frequently, like during scrolling operations.

lib/storage/src/content_manager/snapshots/recover.rs (1)

62-71: Improved task scheduling for snapshot recovery operations.

This change ensures that the snapshot recovery task is explicitly scheduled on the TableOfContent's general runtime rather than the default tokio runtime. This is important because:

  1. It provides more controlled and predictable execution by using a specific runtime
  2. It ensures consistent runtime context for operations that may interact with other components
  3. It supports the overall PR goal of maintaining operation consistency by better managing concurrency

The implementation correctly handles both the potential failure of task spawning and the internal operation result.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
lib/collection/src/collection_manager/holders/segment_holder.rs (1)

159-166: Good addition for scroll operation consistency.

The introduction of a dedicated lock for scroll operations is a sound approach to ensure data consistency. This helps prevent the problem where points might change between the scroll and retrieve phases.

Some implementation considerations:

  • The lock is an async RwLock which allows for efficient concurrent scrolling operations while blocking updates.
  • Documentation clearly explains the purpose and usage pattern.

The empty tuple () as a lock resource is a common pattern for mutually exclusive access without the need for data. This approach ensures proper synchronization with minimal overhead.

lib/collection/src/collection_manager/collection_updater.rs (1)

47-48: Acquire write lock on scroll operations for thread safety.

This implementation ensures that no scroll operations can run while an update is in progress, preventing potential race conditions. The lock is held for the entire duration of the update operation.

Consider renaming _scroll_lock to _scroll_lock_guard to make it clear this is a guard pattern and the lock remains held until this variable is dropped. Additionally, since this is a blocking operation in an otherwise non-blocking context, it might be worth adding a comment explaining why this synchronous pattern is necessary.

- let _scroll_lock = futures::executor::block_on(scroll_lock.write());
+ // Block current thread to acquire write lock, ensuring no ongoing scroll operations
+ // can be affected by this update
+ let _scroll_lock_guard = futures::executor::block_on(scroll_lock.write());
lib/collection/src/shards/local_shard/scroll.rs (1)

463-463: Consistent lock release pattern applied to all scroll methods.

The same explicit lock dropping pattern is applied here, which is excellent for consistency.

Consider adding a brief comment in each scroll method explaining the importance of holding the lock throughout the entire scroll+retrieve operation. This would help future maintainers understand why this locking pattern is crucial.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2df0404 and ed55783.

📒 Files selected for processing (3)
  • lib/collection/src/collection_manager/collection_updater.rs (1 hunks)
  • lib/collection/src/collection_manager/holders/segment_holder.rs (1 hunks)
  • lib/collection/src/shards/local_shard/scroll.rs (6 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/collection/src/collection_manager/collection_updater.rs (2)
lib/collection/src/collection_manager/holders/segment_holder.rs (2)
  • segments (1447-1450)
  • segments (1996-1999)
lib/collection/src/shards/local_shard/mod.rs (1)
  • segments (241-243)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
🔇 Additional comments (3)
lib/collection/src/shards/local_shard/scroll.rs (3)

152-159: Consistent scroll lock implementation for thread safety.

This implementation safely acquires the lock and segment state in a way that prevents race conditions. First retrieving all necessary data within a read lock scope, then acquiring a longer-term read lock on the scroll_read_lock specifically.

The same pattern is consistently applied to all scroll methods in this file, which is excellent for maintainability.


215-215: Explicit lock release improves resource management.

Explicitly dropping the scroll lock as soon as retrieve is complete ensures that update operations aren't blocked any longer than necessary.

This pattern is consistently applied across all scroll methods, ensuring resource locks are held for the minimum required time.


317-317: Consistent explicit lock release pattern.

Same pattern as in scroll_by_id - explicitly releasing the lock when no longer needed.

@timvisee
Copy link
Member

timvisee commented Apr 1, 2025

Tried but failed: Read version along with point ids and check version on retrieve - won't help with deleted points

Could you elaborate on why this doesn't help with deleted points? Can't we detect deleted points by comparing point count between the two calls?

The only way we could resolve the situation in that case is to re-fetch. I don't suggest that to be a good idea, but I'm just curious on why it didn't work out.

@timvisee timvisee self-assigned this Apr 1, 2025
@@ -44,6 +44,9 @@ impl CollectionUpdater {
) -> CollectionResult<usize> {
// Allow only one update at a time, ensure no data races between segments.
// let _lock = self.update_lock.lock().unwrap();
let scroll_lock = segments.read().scroll_read_lock.clone();
let _scroll_lock = futures::executor::block_on(scroll_lock.write());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can deadlock, because it might block the current async exec thread

@coszio
Copy link
Contributor

coszio commented Apr 1, 2025

Could you elaborate on why this doesn't help with deleted points? Can't we detect deleted points by comparing point count between the two calls?

I think the versions solve the problem of returning different payload than what the initial selection stated. However, it doesn't solve the problem of returning less points than initially selected. In regular queries it shouldn't be problematic, because we don't rely on the last point to provide a next_page_offset, but in scroll we do.

@timvisee
Copy link
Member

timvisee commented Apr 1, 2025

I think this solves the problem of returning different payload than what the initial selection stated. However, it doesn't solve the problem of returning less points than initially selected. In regular queries it shouldn't be problematic, because we don't rely on the last point to provide a next_page_offset, but in scroll we do.

@coszio I don't get why this is a problem.

We might only select less points in the second call. We can detect it and act accordingly. So what's the problem?

I imagine this:

  • scroll IDs (got 50)
  • delete some point
  • fetch payloads for IDs (got 49)
  • we detect deleted point

@timvisee timvisee requested review from ffuugoo and coszio April 1, 2025 15:09
Copy link
Contributor

@coszio coszio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we rely on scrolling for transfers, we need them to be accurate, so I cannot oppose this change.

However, we should be aware that this can hurt some performance under specific workloads.

In manual testing, using perf builds, I created a collection with

bfb --segments 100 -n 100000 --on-disk-payload --text-payloads --text-payload-length 500

Then I ran scrolls and upserts simultaneously

  • Upsert: bfb --skip-create -n 100000 --text-payloads --text-payload-length 500 --batch-size 1 --throttle 1000.
    • this didn't trigger optimizations, so that it reflects only the upsert effects
  • Scroll: bfb --skip-create --skip-upload --skip-wait-index --scroll

I observed some degradation in RPS

Branch Upsert RPS (thottled to 1000) Scroll RPS
dev 999.97 1,996.85
scroll-lock 866.80 1,722.63

As a baseline, scrolls without upserting simultaneously reached up to around 2,300 RPS

@timvisee
Copy link
Member

timvisee commented Apr 2, 2025

Since we rely on scrolling for transfers, we need them to be accurate, so I cannot oppose this change.

However, we should be aware that this can hurt some performance under specific workloads.

In manual testing, using perf builds, I created a collection with

bfb --segments 100 -n 100000 --on-disk-payload --text-payloads --text-payload-length 500

Then I ran scrolls and upserts simultaneously

* Upsert: `bfb --skip-create -n 100000 --text-payloads --text-payload-length 500 --batch-size 1 --throttle 1000`.
  
  * this didn't trigger optimizations, so that it reflects only the upsert effects

* Scroll: `bfb --skip-create --skip-upload --skip-wait-index --scroll`

I observed some degradation in RPS
Branch Upsert RPS (thottled to 1000) Scroll RPS
dev 999.97 1,996.85
scroll-lock 866.80 1,722.63

As a baseline, scrolls without upserting simultaneously reached up to around 2,300 RPS

Thank you very much for the careful review. It'd be nice to not have this impact, but consistency is more important here so we don't really have a choice.

Maybe some of our future efforts to clean locks around this area up a bit will have a positive effect 🤞

@timvisee timvisee merged commit 4d107b8 into dev Apr 2, 2025
17 checks passed
@timvisee timvisee deleted the scroll-lock branch April 2, 2025 08:38
pull bot pushed a commit to kp-forks/qdrant that referenced this pull request Apr 21, 2025
…trieve (qdrant#6289)

* Scroll lock: make sure no segments are modified between scroll and retrieve

* Take write lock in block_in_place to prevent blocking tokio runtime

* Use blocking write

* Recover collection snapshot on general runtime

* Make function const

---------

Co-authored-by: timvisee <tim@visee.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants