Skip to content

Conversation

generall
Copy link
Member

@generall generall commented Apr 4, 2025

Done:

  • explicit functions for populate and clear mmap cache for: Vector storage, Vector Index, Payload Storage, Payload Index, QuantizedVectorStorage
  • Remove delayed mmap population, as it seems it pollutes cache
  • Implement cache clearing for dropping segment

How it was tested:

  1. Create collection with bfb:
docker run --rm -it --network=host qdrant/bfb:dev ./bfb -n 500000 --segments 20 --quantization scalar --quantization-in-ram true -d 128 --indexing-threshold 100 --on-disk-vectors true --on-disk-payload
  1. Reload service with clear caches and mem limit:
sudo bash -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
docker run --rm -it -m 600Mb --network=host -v $(pwd)/qdrant-storage:/qdrant/storage qdrant-cache-test

(here caches are full)

sudo python3 smaps-view.py $(pidof qdrant) '.*/vector_index/.*' -v
  1. Run single-threaded indexation:
PATCH collections/benchmark
{
  "hnsw_config": {
    "ef_construct": 101
  },
  "optimizers_config": {
    "max_optimization_threads": 1
  }
}
  1. Wait indexation finish, observe loss of disk cache:
sudo python3 smaps-view.py $(pidof qdrant) '.*/vector_index/.*' -v

> Cache percentages for pattern '.*/vector_index/.*':
>  r--s: 33.23%

With this PR cache is not lost.

Comment on lines 177 to 198
pub fn clear_disk_cache(file_path: &Path) -> io::Result<()> {
#[cfg(target_os = "linux")]
{
use std::os::unix::io::AsRawFd;

use libc::{POSIX_FADV_DONTNEED, posix_fadvise};

let file = std::fs::OpenOptions::new().read(true).open(file_path)?;
let fd = file.as_raw_fd();

// Clear the cache, offset 0, length 0 (whole file)
let ret = unsafe { posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED) };

if ret != 0 {
Err(io::Error::from_raw_os_error(ret))
} else {
Ok(())
}
}
#[cfg(not(target_os = "linux"))]
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar function here:

/// Advise the operating system that the file is no longer needed to be in the page cache.
pub fn advise_dontneed(path: &Path) -> Result<()> {

Copy link
Member Author

@generall generall Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

advise_dontneed doesn't seem to be used anywhere. I propose to remove it and keep a new one.

Otherwise we would need to make cross-crate dependencies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I propose to copy the old definition:

  • It uses safe wrapper nix::fcntl instead of unsafe libc call.
  • It works on any platform that support posix_fadvise, not just Linux.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@generall generall marked this pull request as ready for review April 7, 2025 23:08
Copy link
Contributor

coderabbitai bot commented Apr 7, 2025

📝 Walkthrough

Walkthrough

This change set makes modifications across various parts of the codebase, primarily affecting memory mapping and cache management. In the build configuration, updates to the Docker ignore file and Cargo manifests add new dependencies. Numerous modules have removed the old prefaulting mechanism and replaced it with new methods such as populate and clear_cache to explicitly manage memory-mapped pages and disk caches. Methods like is_on_disk, along with new storage accessors in several structures, have been introduced across index, vector storage, payload storage, and quantization modules. These modifications standardize and centralize how memory mapping, disk caching, and on-disk state checks are handled, affecting various components including grid store, segment management, HNSW indexing, and multiple storage backends. Overall, the changes refactor memory-related operations without altering the core control flows or underlying data structures.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58ecd87 and 6d5c0d6.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • lib/common/io/Cargo.toml (0 hunks)
  • lib/common/io/src/file_operations.rs (0 hunks)
  • lib/common/memory/Cargo.toml (1 hunks)
  • lib/common/memory/src/madvise.rs (2 hunks)
💤 Files with no reviewable changes (2)
  • lib/common/io/Cargo.toml
  • lib/common/io/src/file_operations.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/common/memory/Cargo.toml
  • lib/common/memory/src/madvise.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test
  • GitHub Check: lint
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
lib/common/memory/src/madvise.rs (1)

174-198: New function to clear disk cache

Well-implemented function that provides platform-specific behavior for clearing disk cache, using posix_fadvise with POSIX_FADV_DONTNEED on Linux and gracefully handling non-Linux platforms. This function is crucial for the PR's objective of improving disk cache hygiene.

Note: As previously mentioned in review comments, there appears to be a similar function elsewhere in the codebase (advise_dontneed), but this implementation is kept to avoid cross-crate dependencies.

🧹 Nitpick comments (4)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (1)

121-123: Consider consolidating duplicate accessor methods

The new storage() method appears to provide similar functionality to the existing inner_storage() method on line 221. While this addition is likely part of standardizing interfaces across storage types for the cache management improvements, consider consolidating these methods or documenting the difference between them to avoid confusion.

pub fn storage(&self) -> &QuantizedStorage {
    &self.quantized_storage
}

- pub fn inner_storage(&self) -> &QuantizedStorage {
-     &self.quantized_storage
- }
+ // Use storage() method instead
+ #[deprecated(since = "1.13.7", note = "Use storage() method instead")]
+ pub fn inner_storage(&self) -> &QuantizedStorage {
+     self.storage()
+ }
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (1)

5-5: Imported Madviseable trait.

This trait is imported but doesn't appear to be directly used in the changes. It might be a requirement for calling populate() on the mmap field.

Consider removing this import if it's not being used, or add a comment explaining why it's necessary.

lib/gridstore/src/page.rs (1)

116-120: Consider handling potential errors in populate method

Unlike the clear_cache method and some other populate implementations in the codebase that return Result types and handle errors, this implementation doesn't check or propagate potential errors from self.mmap.populate().

For consistency with other implementations in the codebase, consider modifying the method to:

-pub fn populate(&self) {
-    self.mmap.populate();
-}
+pub fn populate(&self) -> std::io::Result<()> {
+    self.mmap.populate()?;
+    Ok(())
+}
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (1)

240-242: Consider returning an OperationResult for consistency

This populate method calls self.mmap.populate() but doesn't return an OperationResult<()> like other similar methods in the codebase. While this might work correctly with the underlying implementation, it creates a slight inconsistency with the pattern followed elsewhere in the codebase.

Consider updating the method signature to return an OperationResult<()> for consistency with other components:

-pub fn populate(&self) {
+pub fn populate(&self) -> OperationResult<()> {
    self.mmap.populate();
+   Ok(())
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d46c4bf and 6f0ea90.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (62)
  • .dockerignore (1 hunks)
  • Cargo.toml (1 hunks)
  • lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (0 hunks)
  • lib/collection/src/shards/local_shard/mod.rs (1 hunks)
  • lib/common/common/Cargo.toml (1 hunks)
  • lib/common/common/benches/mmap_hashmap.rs (1 hunks)
  • lib/common/common/src/mmap_hashmap.rs (5 hunks)
  • lib/common/memory/Cargo.toml (1 hunks)
  • lib/common/memory/src/chunked_utils.rs (2 hunks)
  • lib/common/memory/src/madvise.rs (2 hunks)
  • lib/common/memory/src/mmap_ops.rs (1 hunks)
  • lib/common/memory/src/mmap_type.rs (4 hunks)
  • lib/gridstore/src/bitmask/gaps.rs (2 hunks)
  • lib/gridstore/src/bitmask/mod.rs (2 hunks)
  • lib/gridstore/src/gridstore.rs (1 hunks)
  • lib/gridstore/src/page.rs (2 hunks)
  • lib/quantization/src/encoded_vectors_binary.rs (1 hunks)
  • lib/quantization/src/encoded_vectors_pq.rs (1 hunks)
  • lib/quantization/src/encoded_vectors_u8.rs (1 hunks)
  • lib/segment/benches/multi_vector_search.rs (1 hunks)
  • lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (1 hunks)
  • lib/segment/src/index/field_index/bool_index/mod.rs (2 hunks)
  • lib/segment/src/index/field_index/field_index_base.rs (1 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (2 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (3 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (1 hunks)
  • lib/segment/src/index/field_index/full_text_index/text_index.rs (1 hunks)
  • lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (3 hunks)
  • lib/segment/src/index/field_index/geo_index/mod.rs (1 hunks)
  • lib/segment/src/index/field_index/map_index/mmap_map_index.rs (3 hunks)
  • lib/segment/src/index/field_index/map_index/mod.rs (1 hunks)
  • lib/segment/src/index/field_index/mmap_point_to_values.rs (5 hunks)
  • lib/segment/src/index/field_index/null_index/mmap_null_index.rs (1 hunks)
  • lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (3 hunks)
  • lib/segment/src/index/field_index/numeric_index/mod.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/graph_links.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/hnsw.rs (7 hunks)
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs (1 hunks)
  • lib/segment/src/index/struct_payload_index.rs (1 hunks)
  • lib/segment/src/index/vector_index_base.rs (2 hunks)
  • lib/segment/src/payload_storage/mmap_payload_storage.rs (1 hunks)
  • lib/segment/src/payload_storage/payload_storage_enum.rs (1 hunks)
  • lib/segment/src/segment/mod.rs (1 hunks)
  • lib/segment/src/segment/segment_ops.rs (1 hunks)
  • lib/segment/src/segment_constructor/segment_builder.rs (4 hunks)
  • lib/segment/src/types.rs (2 hunks)
  • lib/segment/src/vector_storage/chunked_mmap_vectors.rs (2 hunks)
  • lib/segment/src/vector_storage/chunked_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (2 hunks)
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2 hunks)
  • lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (2 hunks)
  • lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (1 hunks)
  • lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2 hunks)
  • lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/vector_storage_base.rs (2 hunks)
  • lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (2 hunks)
  • lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (2 hunks)
💤 Files with no reviewable changes (1)
  • lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
🧰 Additional context used
🧬 Code Definitions (44)
lib/quantization/src/encoded_vectors_pq.rs (3)
lib/quantization/src/encoded_vectors_binary.rs (1)
  • storage (170-172)
lib/quantization/src/encoded_vectors_u8.rs (1)
  • storage (37-39)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (1)
  • storage (121-123)
lib/quantization/src/encoded_vectors_binary.rs (3)
lib/quantization/src/encoded_vectors_pq.rs (1)
  • storage (45-47)
lib/quantization/src/encoded_vectors_u8.rs (1)
  • storage (37-39)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (1)
  • storage (121-123)
lib/common/memory/src/chunked_utils.rs (9)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • std (141-141)
  • populate (433-460)
lib/gridstore/src/bitmask/gaps.rs (1)
  • populate (259-262)
lib/gridstore/src/gridstore.rs (1)
  • populate (533-539)
lib/common/memory/src/mmap_type.rs (3)
  • populate (186-189)
  • populate (304-307)
  • populate (410-413)
lib/segment/src/vector_storage/chunked_vector_storage.rs (1)
  • populate (66-66)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (1)
  • populate (116-118)
lib/gridstore/src/bitmask/mod.rs (1)
  • populate (506-510)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)
  • populate (61-65)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
  • populate (180-184)
lib/segment/src/vector_storage/chunked_vector_storage.rs (7)
lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (2)
  • populate (290-293)
  • clear_cache (296-300)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
  • populate (116-118)
  • clear_cache (120-122)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
  • populate (61-65)
  • clear_cache (68-72)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)
  • populate (76-80)
  • clear_cache (83-87)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • populate (433-460)
  • clear_cache (462-489)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
  • clear_cache (1142-1147)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (4)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
  • populate (45-50)
  • clear_cache (53-57)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • populate (433-460)
  • clear_cache (462-489)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (16)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (1)
  • populate (290-295)
lib/segment/src/index/field_index/bool_index/mod.rs (1)
  • populate (95-101)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (1)
  • populate (146-151)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (1)
  • populate (70-73)
lib/segment/src/index/field_index/map_index/mod.rs (1)
  • populate (448-455)
lib/segment/src/index/field_index/field_index_base.rs (1)
  • populate (441-455)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
  • populate (407-413)
lib/segment/src/index/field_index/full_text_index/text_index.rs (1)
  • populate (274-281)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
  • populate (343-347)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
  • populate (365-369)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (1)
  • populate (187-191)
lib/segment/src/index/field_index/geo_index/mod.rs (1)
  • populate (356-363)
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
  • populate (435-442)
lib/segment/src/index/hnsw_index/graph_links.rs (1)
  • populate (149-155)
lib/common/common/src/mmap_hashmap.rs (1)
  • populate (342-345)
lib/segment/src/index/field_index/mmap_point_to_values.rs (1)
  • populate (373-375)
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (5)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
  • populate (941-961)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
  • populate (45-50)
lib/segment/src/index/vector_index_base.rs (1)
  • populate (119-134)
lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (1)
  • populate (282-285)
lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (1)
  • populate (352-355)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (3)
lib/segment/src/index/field_index/bool_index/mod.rs (3)
  • is_on_disk (86-91)
  • populate (95-101)
  • clear_cache (104-110)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (3)
  • is_on_disk (64-66)
  • populate (70-73)
  • clear_cache (76-79)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)
  • is_on_disk (181-183)
  • populate (187-191)
  • clear_cache (194-199)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (2)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (3)
  • is_on_disk (284-286)
  • populate (290-295)
  • clear_cache (298-303)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (3)
  • is_on_disk (140-142)
  • populate (146-151)
  • clear_cache (154-161)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)
lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
  • populate (65-68)
  • clear_cache (71-74)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
  • populate (116-118)
  • clear_cache (120-122)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
  • populate (61-65)
  • clear_cache (68-72)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • populate (433-460)
  • clear_cache (462-489)
lib/gridstore/src/bitmask/mod.rs (5)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/gridstore/src/bitmask/gaps.rs (2)
  • populate (259-262)
  • clear_cache (265-268)
lib/gridstore/src/gridstore.rs (2)
  • populate (533-539)
  • clear_cache (542-548)
lib/gridstore/src/page.rs (2)
  • populate (118-120)
  • clear_cache (123-126)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (8)
lib/common/memory/src/madvise.rs (5)
  • madvise (94-96)
  • madvise (102-102)
  • madvise (108-114)
  • madvise (132-138)
  • clear_disk_cache (177-198)
lib/segment/src/index/hnsw_index/hnsw.rs (2)
  • populate (1137-1139)
  • clear_cache (1142-1147)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
  • populate (116-118)
  • clear_cache (120-122)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • populate (433-460)
  • clear_cache (462-489)
lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (1)
  • clear_cache (288-290)
lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (1)
  • clear_cache (358-360)
lib/gridstore/src/page.rs (6)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/gridstore/src/bitmask/gaps.rs (2)
  • populate (259-262)
  • clear_cache (265-268)
lib/gridstore/src/gridstore.rs (2)
  • populate (533-539)
  • clear_cache (542-548)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
  • populate (116-118)
  • clear_cache (120-122)
lib/gridstore/src/bitmask/mod.rs (2)
  • populate (506-510)
  • clear_cache (513-517)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/segment/src/index/hnsw_index/hnsw.rs (2)
  • populate (1137-1139)
  • clear_cache (1142-1147)
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (1)
  • populate (13-15)
lib/segment/src/index/struct_payload_index.rs (16)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (3)
  • populate (290-295)
  • index (556-556)
  • clear_cache (298-303)
lib/segment/src/index/field_index/bool_index/mod.rs (2)
  • populate (95-101)
  • clear_cache (104-110)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (2)
  • populate (70-73)
  • clear_cache (76-79)
lib/segment/src/index/field_index/map_index/mod.rs (2)
  • populate (448-455)
  • clear_cache (458-465)
lib/segment/src/index/field_index/field_index_base.rs (2)
  • populate (441-455)
  • clear_cache (458-472)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (2)
  • populate (407-413)
  • clear_cache (416-430)
lib/segment/src/index/field_index/full_text_index/text_index.rs (2)
  • populate (274-281)
  • clear_cache (284-291)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (2)
  • populate (343-347)
  • clear_cache (350-359)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (2)
  • populate (365-369)
  • clear_cache (372-382)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (2)
  • populate (187-191)
  • clear_cache (194-199)
lib/segment/src/index/field_index/geo_index/mod.rs (2)
  • populate (356-363)
  • clear_cache (366-373)
lib/segment/src/index/field_index/numeric_index/mod.rs (3)
  • populate (435-442)
  • index (143-159)
  • clear_cache (445-452)
lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
  • populate (65-68)
  • clear_cache (71-74)
lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (2)
  • populate (290-293)
  • clear_cache (296-300)
lib/segment/src/payload_storage/payload_storage_enum.rs (2)
  • populate (213-222)
  • clear_cache (225-234)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)
  • populate (180-184)
  • clear_cache (187-191)
lib/segment/src/vector_storage/vector_storage_base.rs (10)
lib/segment/src/index/hnsw_index/hnsw.rs (2)
  • populate (1137-1139)
  • clear_cache (1142-1147)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
  • populate (45-50)
  • clear_cache (53-57)
lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (2)
  • populate (290-293)
  • clear_cache (296-300)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
  • populate (116-118)
  • clear_cache (120-122)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
  • populate (61-65)
  • clear_cache (68-72)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)
  • populate (180-184)
  • clear_cache (187-191)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)
  • populate (76-80)
  • clear_cache (83-87)
lib/segment/src/index/vector_index_base.rs (2)
  • populate (119-134)
  • clear_cache (136-151)
lib/segment/src/index/vector_index_base.rs (12)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
  • inverted_index (243-245)
lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (3)
  • is_on_disk (64-66)
  • populate (282-285)
  • clear_cache (288-290)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (1)
  • is_on_disk (344-346)
lib/sparse/src/index/inverted_index/inverted_index_immutable_ram.rs (1)
  • is_on_disk (27-29)
lib/sparse/src/index/inverted_index/inverted_index_compressed_immutable_ram.rs (1)
  • is_on_disk (36-38)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1)
  • is_on_disk (211-213)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (1)
  • is_on_disk (181-183)
lib/segment/src/fixtures/index_fixtures.rs (1)
  • is_on_disk (63-65)
lib/segment/src/index/hnsw_index/graph_links.rs (1)
  • populate (149-155)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
  • populate (344-347)
lib/segment/src/vector_storage/chunked_vector_storage.rs (1)
  • clear_cache (69-69)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (1)
  • clear_cache (120-122)
lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (10)
lib/common/memory/src/madvise.rs (8)
  • madvise (94-96)
  • madvise (102-102)
  • madvise (108-114)
  • madvise (132-138)
  • clear_disk_cache (177-198)
  • populate (104-104)
  • populate (116-128)
  • populate (140-152)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
  • populate (1137-1139)
lib/segment/src/index/hnsw_index/graph_links.rs (1)
  • populate (149-155)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
  • populate (344-347)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (1)
  • populate (13-15)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)
  • populate (61-65)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
  • populate (180-184)
lib/segment/src/index/vector_index_base.rs (1)
  • populate (119-134)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • populate (433-460)
  • std (141-141)
lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (9)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/segment/src/index/field_index/bool_index/mod.rs (2)
  • populate (95-101)
  • clear_cache (104-110)
lib/segment/src/index/field_index/field_index_base.rs (2)
  • populate (441-455)
  • clear_cache (458-472)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
  • populate (61-65)
  • clear_cache (68-72)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)
  • populate (180-184)
  • clear_cache (187-191)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)
  • populate (76-80)
  • clear_cache (83-87)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (1)
  • clear_cache (120-122)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (1)
  • clear_cache (434-440)
lib/segment/src/payload_storage/payload_storage_enum.rs (3)
lib/segment/src/index/struct_payload_index.rs (2)
  • populate (445-452)
  • clear_cache (454-461)
lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
  • populate (65-68)
  • clear_cache (71-74)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)
  • populate (180-184)
  • clear_cache (187-191)
lib/common/memory/src/mmap_type.rs (12)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (1)
  • populate (240-242)
lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (2)
  • populate (290-293)
  • std (46-46)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/gridstore/src/bitmask/gaps.rs (1)
  • populate (259-262)
lib/gridstore/src/gridstore.rs (1)
  • populate (533-539)
lib/segment/src/vector_storage/chunked_vector_storage.rs (1)
  • populate (66-66)
lib/gridstore/src/page.rs (1)
  • populate (118-120)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (1)
  • populate (116-118)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)
  • populate (61-65)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
  • populate (180-184)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)
  • populate (76-80)
  • std (144-144)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • populate (433-460)
  • std (141-141)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (4)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (4)
  • open (26-34)
  • is_on_disk (64-66)
  • populate (70-73)
  • clear_cache (76-79)
lib/common/common/src/mmap_hashmap.rs (2)
  • open (197-223)
  • populate (342-345)
lib/segment/src/types.rs (3)
  • is_on_disk (552-557)
  • is_on_disk (1187-1189)
  • is_on_disk (1324-1329)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (4)
lib/common/memory/src/madvise.rs (6)
  • madvise (94-96)
  • madvise (102-102)
  • madvise (108-114)
  • madvise (132-138)
  • clear_disk_cache (177-198)
  • populate (104-104)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (1)
  • populate (13-15)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
  • clear_cache (963-968)
lib/segment/src/index/field_index/map_index/mod.rs (7)
lib/segment/src/index/field_index/bool_index/mod.rs (1)
  • populate (95-101)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (1)
  • populate (146-151)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (1)
  • populate (70-73)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
  • populate (343-347)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
  • populate (365-369)
lib/segment/src/index/field_index/geo_index/mod.rs (1)
  • populate (356-363)
lib/common/common/src/mmap_hashmap.rs (1)
  • populate (342-345)
lib/gridstore/src/gridstore.rs (2)
lib/gridstore/src/page.rs (2)
  • populate (118-120)
  • clear_cache (123-126)
lib/gridstore/src/bitmask/mod.rs (2)
  • populate (506-510)
  • clear_cache (513-517)
lib/segment/src/index/hnsw_index/graph_links.rs (11)
lib/common/memory/src/madvise.rs (7)
  • madvise (94-96)
  • madvise (102-102)
  • madvise (108-114)
  • madvise (132-138)
  • populate (104-104)
  • populate (116-128)
  • populate (140-152)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
  • populate (1137-1139)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
  • populate (344-347)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
  • populate (45-50)
lib/common/common/src/mmap_hashmap.rs (1)
  • populate (342-345)
lib/segment/src/index/field_index/mmap_point_to_values.rs (1)
  • populate (373-375)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (1)
  • populate (427-432)
lib/segment/src/index/vector_index_base.rs (1)
  • populate (119-134)
lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (1)
  • populate (282-285)
lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (1)
  • populate (352-355)
lib/gridstore/src/bitmask/gaps.rs (5)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/gridstore/src/gridstore.rs (2)
  • populate (533-539)
  • clear_cache (542-548)
lib/gridstore/src/page.rs (2)
  • populate (118-120)
  • clear_cache (123-126)
lib/gridstore/src/bitmask/mod.rs (2)
  • populate (506-510)
  • clear_cache (513-517)
lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (6)
lib/common/memory/src/madvise.rs (8)
  • madvise (94-96)
  • madvise (102-102)
  • madvise (108-114)
  • madvise (132-138)
  • clear_disk_cache (177-198)
  • populate (104-104)
  • populate (116-128)
  • populate (140-152)
lib/segment/src/index/hnsw_index/hnsw.rs (2)
  • populate (1137-1139)
  • clear_cache (1142-1147)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
  • populate (344-347)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (1)
  • populate (13-15)
lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (2)
  • populate (282-285)
  • clear_cache (288-290)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (12)
lib/common/memory/src/madvise.rs (7)
  • madvise (94-96)
  • madvise (102-102)
  • madvise (108-114)
  • madvise (132-138)
  • populate (104-104)
  • populate (116-128)
  • populate (140-152)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
  • populate (1137-1139)
lib/segment/src/index/hnsw_index/graph_links.rs (1)
  • populate (149-155)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
  • populate (344-347)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
  • populate (45-50)
lib/segment/src/vector_storage/chunked_vector_storage.rs (1)
  • populate (66-66)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (1)
  • populate (116-118)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)
  • populate (61-65)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (1)
  • populate (427-432)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
  • populate (76-80)
lib/segment/src/index/vector_index_base.rs (1)
  • populate (119-134)
lib/segment/src/vector_storage/vector_storage_base.rs (1)
  • populate (433-460)
lib/segment/src/index/field_index/full_text_index/text_index.rs (1)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (3)
  • is_on_disk (64-66)
  • populate (70-73)
  • clear_cache (76-79)
lib/segment/src/types.rs (10)
lib/segment/src/index/field_index/bool_index/mod.rs (1)
  • is_on_disk (86-91)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (1)
  • is_on_disk (140-142)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
  • is_on_disk (178-180)
lib/segment/src/index/field_index/map_index/mod.rs (1)
  • is_on_disk (438-444)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
  • is_on_disk (401-403)
lib/segment/src/index/field_index/full_text_index/text_index.rs (1)
  • is_on_disk (264-270)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
  • is_on_disk (337-339)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
  • is_on_disk (359-361)
lib/segment/src/index/field_index/geo_index/mod.rs (1)
  • is_on_disk (346-352)
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
  • is_on_disk (425-431)
lib/segment/src/index/hnsw_index/hnsw.rs (5)
lib/common/memory/src/madvise.rs (8)
  • madvise (94-96)
  • madvise (102-102)
  • madvise (108-114)
  • madvise (132-138)
  • clear_disk_cache (177-198)
  • populate (104-104)
  • populate (116-128)
  • populate (140-152)
lib/segment/src/vector_storage/vector_storage_base.rs (4)
  • is_on_disk (42-42)
  • is_on_disk (551-577)
  • populate (433-460)
  • clear_cache (462-489)
lib/segment/src/index/hnsw_index/graph_layers.rs (2)
  • load (272-286)
  • populate (344-347)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (1)
  • clear_cache (358-360)
lib/segment/src/index/field_index/bool_index/mod.rs (2)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (4)
  • is_on_disk (284-286)
  • index (556-556)
  • populate (290-295)
  • clear_cache (298-303)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (3)
  • is_on_disk (64-66)
  • populate (70-73)
  • clear_cache (76-79)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (3)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (3)
  • is_on_disk (337-339)
  • populate (343-347)
  • clear_cache (350-359)
lib/segment/src/index/field_index/numeric_index/mod.rs (3)
  • is_on_disk (425-431)
  • populate (435-442)
  • clear_cache (445-452)
lib/common/common/src/mmap_hashmap.rs (10)
lib/common/memory/src/mmap_ops.rs (1)
  • open_read_mmap (44-62)
lib/segment/src/index/field_index/bool_index/mod.rs (1)
  • populate (95-101)
lib/segment/src/index/field_index/map_index/mod.rs (1)
  • populate (448-455)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
  • populate (407-413)
lib/segment/src/index/field_index/full_text_index/text_index.rs (1)
  • populate (274-281)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
  • populate (343-347)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
  • populate (365-369)
lib/segment/src/index/field_index/geo_index/mod.rs (1)
  • populate (356-363)
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
  • populate (435-442)
lib/segment/src/index/hnsw_index/graph_links.rs (1)
  • populate (149-155)
lib/segment/src/payload_storage/mmap_payload_storage.rs (7)
lib/segment/src/index/struct_payload_index.rs (2)
  • populate (445-452)
  • clear_cache (454-461)
lib/segment/src/payload_storage/payload_storage_enum.rs (2)
  • populate (213-222)
  • clear_cache (225-234)
lib/gridstore/src/bitmask/gaps.rs (2)
  • populate (259-262)
  • clear_cache (265-268)
lib/gridstore/src/gridstore.rs (2)
  • populate (533-539)
  • clear_cache (542-548)
lib/gridstore/src/page.rs (2)
  • populate (118-120)
  • clear_cache (123-126)
lib/gridstore/src/bitmask/mod.rs (2)
  • populate (506-510)
  • clear_cache (513-517)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)
  • populate (180-184)
  • clear_cache (187-191)
lib/segment/src/index/field_index/numeric_index/mod.rs (5)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (4)
  • is_on_disk (284-286)
  • index (556-556)
  • populate (290-295)
  • clear_cache (298-303)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (3)
  • is_on_disk (64-66)
  • populate (70-73)
  • clear_cache (76-79)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (3)
  • is_on_disk (401-403)
  • populate (407-413)
  • clear_cache (416-430)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (3)
  • is_on_disk (337-339)
  • populate (343-347)
  • clear_cache (350-359)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)
  • is_on_disk (181-183)
  • populate (187-191)
  • clear_cache (194-199)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (3)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (3)
  • is_on_disk (337-339)
  • populate (343-347)
  • clear_cache (350-359)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (3)
  • is_on_disk (359-361)
  • populate (365-369)
  • clear_cache (372-382)
lib/segment/src/index/field_index/field_index_base.rs (5)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (4)
  • is_on_disk (284-286)
  • index (556-556)
  • populate (290-295)
  • clear_cache (298-303)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (3)
  • is_on_disk (140-142)
  • populate (146-151)
  • clear_cache (154-161)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)
  • is_on_disk (181-183)
  • populate (187-191)
  • clear_cache (194-199)
lib/segment/src/index/field_index/geo_index/mod.rs (3)
  • is_on_disk (346-352)
  • populate (356-363)
  • clear_cache (366-373)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (2)
  • populate (70-73)
  • clear_cache (76-79)
lib/segment/src/segment_constructor/segment_builder.rs (1)
lib/segment/src/segment_constructor/segment_constructor_base.rs (3)
  • build_vector_index (418-448)
  • get_vector_index_path (94-96)
  • create_sparse_vector_index (457-505)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (8)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (2)
  • populate (290-295)
  • clear_cache (298-303)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (2)
  • populate (146-151)
  • clear_cache (154-161)
lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (2)
  • populate (70-73)
  • clear_cache (76-79)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (2)
  • populate (343-347)
  • clear_cache (350-359)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (2)
  • populate (365-369)
  • clear_cache (372-382)
lib/segment/src/index/hnsw_index/hnsw.rs (2)
  • populate (1137-1139)
  • clear_cache (1142-1147)
lib/common/common/src/mmap_hashmap.rs (1)
  • populate (342-345)
lib/segment/src/index/field_index/mmap_point_to_values.rs (1)
  • clear_cache (378-381)
lib/segment/src/index/field_index/geo_index/mod.rs (1)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (3)
  • is_on_disk (401-403)
  • populate (407-413)
  • clear_cache (416-430)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (4)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/segment/src/index/field_index/map_index/mod.rs (3)
  • is_on_disk (438-444)
  • populate (448-455)
  • clear_cache (458-465)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (3)
  • is_on_disk (359-361)
  • populate (365-369)
  • clear_cache (372-382)
lib/segment/src/types.rs (3)
  • is_on_disk (552-557)
  • is_on_disk (1187-1189)
  • is_on_disk (1324-1329)
lib/common/memory/src/madvise.rs (3)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (2)
  • std (63-63)
  • std (118-118)
lib/segment/src/vector_storage/vector_storage_base.rs (1)
  • std (141-141)
lib/gridstore/src/bitmask/gaps.rs (1)
  • file_path (84-86)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: test (ubuntu-latest)
🔇 Additional comments (145)
.dockerignore (1)

4-4: Appropriate addition to dockerignore

Adding qdrant-storage/ to the .dockerignore file is a good practice as it prevents potentially large storage directories from being included in Docker images, which aligns with the PR's focus on disk cache management and hygiene.

lib/common/memory/Cargo.toml (1)

22-22: Adding libc dependency for memory management

The addition of the libc dependency is appropriate for implementing low-level memory management functions needed for the disk cache hygiene improvements. This will enable system calls related to memory mapping and cache control.

Cargo.toml (1)

224-224: Consistent libc dependency in workspace

Adding libc = "0.2" to the workspace dependencies ensures consistent versioning across the project. This complements the memory management changes being implemented across the codebase.

lib/common/common/Cargo.toml (1)

39-39: Good architectural change for memory management

Adding the memory dependency to the common crate indicates an architectural improvement where memory-related functionality is being centralized. This supports the PR's goal of standardizing memory mapping and disk cache management across components.

lib/quantization/src/encoded_vectors_binary.rs (1)

170-172: Good addition for consistent storage access

This new accessor method aligns with the PR's goal of enhancing disk cache management by providing explicit access to the underlying storage.

lib/segment/benches/multi_vector_search.rs (1)

132-132: Great addition for explicit cache population

This line explicitly populates the HNSW index's memory-mapped pages into memory, which aligns perfectly with the PR objective to replace delayed mmap population with explicit functions. This ensures the benchmark will have consistent memory usage patterns.

lib/quantization/src/encoded_vectors_u8.rs (1)

37-39: Consistent storage access pattern

This addition follows the same pattern implemented in other encoded vector types, providing a clean way to access the underlying storage for cache management operations.

lib/quantization/src/encoded_vectors_pq.rs (1)

45-47: Maintains API consistency across vector implementations

This storage accessor completes the pattern across all encoded vector implementations, providing uniform access to the underlying storage components for cache management operations.

lib/common/memory/src/chunked_utils.rs (2)

2-2: Import added for new io methods.

This import of io module supports the return type of the newly added populate method.


45-47: Adds a method to populate the mmap cache.

The new populate method allows explicitly loading memory pages into the cache, which aligns with the PR objectives of improving disk cache hygiene. The function correctly delegates to the underlying mmap.populate() and propagates any I/O errors to the caller.

lib/common/common/benches/mmap_hashmap.rs (1)

21-21: Updated benchmark to use the new API signature.

The benchmark has been modified to pass a new parameter (true) to the MmapHashMap::open method, indicating immediate population of the memory mapping. This ensures the benchmark reflects the updated API and aligns with the PR objectives of explicit cache population.

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3)

11-11: Import added for clear_disk_cache function.

The import from memory::madvise has been updated to include the clear_disk_cache function needed for the new cache management capabilities.


427-432: Added populate method to ChunkedVectorStorage trait implementation.

This implementation correctly iterates through all chunks, calling populate() on each one and properly handling potential errors. The method aligns with the PR objective of providing explicit functions for populating the mmap cache across components.


434-440: Added clear_cache method to ChunkedVectorStorage trait implementation.

This method implements cache clearing functionality by iterating through all chunk file paths and calling clear_disk_cache on each. This directly supports the PR objective of introducing cache clearing functionality for dropped segments.

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)

116-118: Implementation correctly delegates population to the underlying storage

The implementation follows the trait requirement pattern established in the codebase, properly delegating to the underlying mmap_storage implementation.


120-122: Implementation correctly delegates cache clearing to the underlying storage

The implementation follows the same pattern as other caching-related methods in the codebase, maintaining separation of concerns by delegating to the underlying storage.

lib/segment/src/payload_storage/mmap_payload_storage.rs (2)

63-68: Well-documented and properly implemented population method

The implementation is clear and follows the established pattern for memory management across the codebase. The documentation effectively explains that this method populates all pages and will block until complete. The implementation safely accesses the storage through a read lock.


70-74: Well-documented and properly implemented cache clearing method

The method follows the consistent pattern used throughout the codebase for clearing disk caches. The implementation correctly uses a read lock to access the storage, which is appropriate since clearing the cache doesn't modify the data structure.

lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)

243-245: Appropriate getter method for API completeness

This getter method provides access to the internal inverted_index field, which is necessary for the public API. The method appears to have been relocated within the file (previously around lines 74-78 according to the summary) but maintains the same functionality.

lib/gridstore/src/page.rs (2)

4-4: Import updated to include necessary functions

The import now includes Madviseable and clear_disk_cache which are needed for the new methods.


122-126: Cache clearing implementation follows established pattern

The implementation properly handles errors from clear_disk_cache and follows the consistent pattern used throughout the codebase.

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)

180-184: Well-implemented populate method

The populate method correctly calls populate on both the deleted flags and the storage components, ensuring all memory-mapped pages are loaded into memory. The implementation follows the same pattern used throughout the codebase and properly propagates errors.


187-191: Well-implemented clear_cache method

The clear_cache method correctly calls clear_cache on both the deleted flags and the storage components, ensuring the disk cache is properly cleared. The implementation follows the same pattern used throughout the codebase and properly propagates errors.

lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (1)

10-10: Updated import to include Madviseable

The import statement has been updated to include Madviseable from the memory::madvise module, which is necessary for memory page management functionality.

lib/segment/src/index/hnsw_index/graph_layers.rs (1)

344-347: Well-implemented populate method

The populate method correctly calls the populate method on the underlying links structure and properly propagates any errors. This implementation is consistent with other similar methods in the codebase.

lib/segment/src/vector_storage/chunked_vector_storage.rs (1)

64-69: Well-defined trait methods for cache management

Adding these two methods to the ChunkedVectorStorage trait establishes a consistent interface for memory management across all implementors. The method declarations are well-documented and follow the standard pattern for error handling in the codebase.

lib/segment/src/types.rs (1)

1385-1395: Good implementation for explicit disk state checking

The added is_on_disk() method correctly identifies both storage types as disk-based, while maintaining explicit checks for future extensibility. This approach ensures that if new storage types are added in the future, a developer would need to consciously decide their disk status rather than relying on a default.

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)

59-65: Appropriate addition of explicit memory mapping control

The populate() method provides a clear way to load all memory-mapped data into RAM upfront, which aligns with the PR's goal of improving disk cache hygiene. This implementation correctly delegates to both the deleted and vectors components of the storage.


67-72: Complementary cache clearing functionality

The clear_cache() method provides the necessary counterpart to populate(), allowing explicit control over memory management. This implementation ensures both components of the storage have their caches cleared, maintaining consistency.

lib/segment/src/segment/segment_ops.rs (2)

5-5: Simplified import for thread handling

Removal of the self qualifier in the import improves readability without changing functionality. This is a minor but positive change to the code style.


1-738: Removal of prefaulting mechanism

The previous prefault_mmap_pages method has been removed from this file, which aligns with the PR's goal of removing the delayed mmap population feature in favor of the more explicit populate/clear_cache methods added elsewhere. This change helps centralize and standardize memory management across the codebase.

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)

74-80: Implementation correctly populates mmap pages

Good implementation of the populate method that ensures both vectors and offsets are populated, following the PR's goal of introducing explicit functions for cache management.


82-87: Implementation cleanly clears disk cache

The clear_cache implementation properly delegates to both underlying storage components, ensuring complete cache management for this storage type.

lib/gridstore/src/bitmask/mod.rs (3)

9-9: Import updated to include clear_disk_cache function

The import statement has been correctly updated to include the new clear_disk_cache function.


504-510: Implementation correctly populates mmap pages

The populate method follows the pattern implemented in other components, ensuring both the bitslice and regions_gaps are properly populated, which aligns with the PR's goal of improving cache management.


512-517: Implementation cleanly drops the disk cache

The clear_cache method correctly uses the clear_disk_cache function for the main bitmask file and calls clear_cache on the regions_gaps, ensuring complete cache management for this component.

lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (3)

11-11: Import updated to include clear_disk_cache function

The import statement has been appropriately updated to include the new clear_disk_cache function.


288-293: Implementation correctly populates mmap pages

The populate method properly calls the populate method on the flags field, consistent with the PR's objective of introducing explicit functions for managing the memory-mapped cache.


295-300: Implementation properly clears the disk cache

The clear_cache method correctly constructs the path to the flags file and calls the clear_disk_cache function with it, aligning with the PR's goal of enhancing cache management.

lib/common/memory/src/madvise.rs (1)

7-7: Added Path import for the new clear_disk_cache function

Added the required import for the new function.

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

963-968: Effective implementation for cache clearing

The clear_cache method takes a file-based approach by iterating over all files and calling clear_disk_cache on each one. This is a clean implementation that properly handles error propagation.

lib/segment/src/segment/mod.rs (1)

101-131: Well-structured cache clearing during segment drop

The implementation effectively clears caches for all components of a segment when it's being dropped, which aligns with the PR's goal of improving disk cache hygiene. The error handling approach (logging errors but continuing) is appropriate for a drop method where errors can't be propagated.

The structured approach of handling each component (payload storage, payload index, and vector data components) ensures thorough cache clearing, preventing potential cache pollution as noted in your comment.

lib/segment/src/index/vector_index_base.rs (3)

101-117: Well-documented and implemented storage location check

The is_on_disk method provides a clear way to determine if the underlying storage is disk-based rather than RAM-based. The method is properly documented and delegates to the appropriate underlying methods for each index type.


119-134: Comprehensive population handling across index types

The populate method correctly handles different index types, calling the appropriate populate methods for disk-based indices while doing nothing for RAM-based ones. The error propagation with the ? operator is properly implemented.


136-151: Consistent cache clearing implementation

The clear_cache method mirrors the structure of the populate method, applying cache clearing operations to appropriate index types and propagating errors properly. The consistency between these two methods makes the code easier to understand and maintain.

lib/gridstore/src/gridstore.rs (2)

542-548: Thorough cache clearing implementation

The clear_cache method properly clears caches for all pages and the bitmask, with consistent error handling and propagation. The combination of this method with the populate method provides complete control over the memory-mapped cache.


533-539:

❓ Verification inconclusive

Inconsistent error handling in page population

The populate method calls page.populate() for each page but doesn't handle potential errors from these calls, although it does properly handle errors from bitmask.read().populate().

Looking at the Page::populate implementation in the provided context, it appears that page.populate() doesn't return a result, which explains this approach. However, this creates inconsistency with the error handling pattern used in the clear_cache method.

Check if the current implementation matches the design intent:


🏁 Script executed:

#!/bin/bash
# Look for the Page::populate implementation to verify if it returns a Result
rg -A 3 "fn populate" --type rust lib/gridstore/src/page.rs

Length of output: 127


Clarify Error Handling in Page Population

After verification, it’s confirmed that the implementation of Page::populate in lib/gridstore/src/page.rs intentionally does not return a Result—it simply calls self.mmap.populate(). As a result, any errors from populating individual pages aren’t propagated, which is why no error handling is applied for those calls in gridstore::populate. This approach contrasts with the handling of errors for bitmask.read().populate() and differs from the error propagation pattern observed in the clear_cache method.

  • Action Items:
    • Confirm whether the divergence in error handling (i.e., no error propagation for page population) is intentional.
    • If so, consider adding a clarifying comment or documentation note explaining why Page::populate is designed this way.
    • Alternatively, to improve consistency, evaluate if updating Page::populate to return a Result (and adjusting its callers accordingly) would be beneficial.
lib/gridstore/src/bitmask/gaps.rs (3)

5-5: Import added for new cache functionality.

The addition of clear_disk_cache to the imports is necessary for the new cache clearing functionality.


257-262: New method for explicitly populating the mmap cache.

This method provides a way to explicitly load all pages into memory, which aligns with the PR objective of improving cache hygiene. The implementation correctly propagates any I/O errors that might occur during the population process.


264-268: New method for explicitly clearing the disk cache.

This method provides a way to explicitly clear the disk cache, which is important for memory management in memory-constrained environments. The implementation correctly propagates any I/O errors that might occur during the cache clearing process.

lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (3)

13-13: Import expanded for new cache functionality.

The addition of Madviseable and clear_disk_cache to the imports is necessary for the new memory management features.


350-355: New method for explicitly populating the mmap cache.

This method enhances memory management by providing explicit control over when pages are loaded into memory. The implementation is consistent with similar methods added throughout the codebase, calling populate on the mmap field and properly handling errors.


357-360: New method for explicitly clearing the disk cache.

This method complements the populate method by providing a way to release memory when needed. The direct call to clear_disk_cache with the path is correct and consistent with other implementations in the codebase.

lib/segment/src/index/hnsw_index/graph_links.rs (2)

6-6: Import expanded for new cache functionality.

The addition of Advice, AdviceSetting, and Madviseable to the imports provides the necessary components for the memory management enhancements.


147-155: Replaced prefaulting with explicit population method.

This change removes the delayed mmap population feature (identified as a source of cache pollution in the PR objectives) and replaces it with an explicit populate method. The implementation correctly handles both RAM and memory-mapped storage variants, providing a consistent interface for cache management.

The method now returns an OperationResult<()> instead of an Option<mmap_ops::PrefaultMmapPages>, which improves error propagation and follows the error handling pattern used throughout the codebase.

lib/segment/src/payload_storage/payload_storage_enum.rs (2)

210-222: New method for explicitly populating the mmap cache in payload storage.

This method extends the cache management capabilities to payload storage, ensuring consistent memory management across all components. The implementation correctly handles all storage variants, only performing population for the memory-mapped variant.


224-235: New method for explicitly clearing the disk cache in payload storage.

This method complements the populate method by providing a way to release memory when needed. The implementation correctly handles all storage variants, only performing cache clearing for the memory-mapped variant.

lib/common/memory/src/mmap_ops.rs (1)

3-4: Import simplification aligns with removing PrefaultMmapPages

The simplified import statement removes PathBuf and Arc which were likely used by the removed PrefaultMmapPages struct and related functionality. This change supports the PR's goal of removing the delayed mmap population feature that was identified as a source of cache pollution.

lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)

181-183: Logical and consistent implementation of is_on_disk

The implementation correctly returns the inverted value of POPULATE_NULL_INDEX constant, which is consistent with the semantics - when data isn't populated in-memory, it means it remains on disk. This method helps downstream code make decisions about cache state.


185-191: Appropriate implementation of populate method

This method correctly populates both internal slices (is_null_slice and has_values_slice) and handles error propagation appropriately. The implementation aligns with similar methods across the codebase and supports the PR objective of providing explicit functions for memory-mapped cache population.


193-199: Clean implementation of clear_cache method

The method properly clears the disk cache for both underlying slices and includes appropriate error handling with the ? operator. This implementation is consistent with the cache clearing functionality added to other components in the PR.

lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (3)

11-11: Updated import to support new cache functionality

The import statement has been expanded to include Madviseable and clear_disk_cache, which are necessary for the new cache management methods implemented in this file. This change supports the PR's goal of enhancing disk cache management.


280-285: Well-implemented populate method

This method follows the established pattern seen in other components, calling populate() on the memory-mapped resource and returning a proper result type. The method documentation clearly describes its purpose and blocking behavior, which is important for users to understand its performance characteristics.


287-290: Effective implementation of clear_cache method

The method correctly uses the clear_disk_cache function with the file path, maintaining consistency with the cache clearing approach across the codebase. This implementation supports the PR's goal of introducing explicit cache clearing functionality.

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (2)

12-12: Simplified import statement for madvise functionality

The import has been updated to focus on the necessary components from the madvise module. This is consistent with the removal of the prefaulting mechanism and the focus on explicit cache management.


247-250: Simple and effective populate method implementation

The method logically calls populate() on the underlying mmap resource and returns an appropriate result type. This approach is consistent with similar implementations across the codebase and properly replaces the removed prefaulting mechanism with a more explicit and controlled approach to cache population.

lib/segment/src/index/field_index/bool_index/mod.rs (3)

86-91: Consistent method for determining on-disk state.

This new method provides a clear way to check if the index is stored on disk, delegating to MmapBoolIndex::is_on_disk() when appropriate. The method follows the pattern used in other index types for determining storage status.


93-101: Good mmap population implementation.

The populate method handles both variants correctly - doing nothing for the simple in-memory variant and delegating to the mmap variant's implementation. The code includes proper documentation explaining the blocking behavior.


103-110: Effective cache clearing mechanism.

This method appropriately implements cache clearing functionality across both index variants, with clear documentation. It's part of the disk cache hygiene initiative, allowing explicit control over memory usage.

lib/collection/src/shards/local_shard/mod.rs (1)

34-36: Removed automatic mmap prefaulting imports.

The removal of std::mem::size_of and Mem imports is correct as part of removing the delayed mmap population feature that was causing cache pollution according to the PR objectives.

lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (3)

64-66: Added method to check disk storage status.

The new is_on_disk() method delegates to the underlying inverted index implementation, maintaining consistency in how storage status is determined across different index types.


68-73: Explicit population control for mmap.

This new method provides a clear way to populate memory pages from disk, blocking until complete. It properly delegates to the underlying inverted index and includes good documentation about its behavior.


75-79: Added cache clearing functionality.

The clear_cache() method completes the trio of disk cache management methods, properly delegating to the underlying implementation. This helps with memory management across the system.

lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (3)

284-286: Correct implementation of on-disk check.

The is_on_disk() method returns the negation of self.populated, which follows the pattern used elsewhere in the codebase. This returns true when the data is primarily on disk and not fully loaded into memory.


288-295: Complete mmap population implementation.

This method ensures all memory pages are populated by calling populate() on both trues_slice and falses_slice. The implementation correctly handles error propagation and includes proper documentation.


297-303: Thorough cache clearing functionality.

The clear_cache() method properly drops the disk cache for both slices, with appropriate error handling. This method completes the disk cache management interface, allowing explicit control over memory usage.

lib/segment/src/vector_storage/vector_storage_base.rs (3)

433-460: Consistent implementation of the populate method for VectorStorageEnum

The implementation correctly handles all vector storage variants by calling populate only on memory-mapped variants while skipping non-memory-mapped ones. This is well-aligned with the PR objective of introducing explicit functions for populating the memory-mapped cache.


462-489: Well-structured clear_cache implementation

This method follows the same pattern as populate, applying cache clearing operations only to memory-mapped storage variants. The consistent error handling via the ? operator ensures proper error propagation throughout the call chain.


549-551: Helpful documentation for the is_on_disk method

The added documentation clearly explains the behavior of the is_on_disk method, making it easier to understand the distinction between RAM storage with disk persistence versus on-disk storage without forced RAM presence.

lib/segment/src/index/field_index/map_index/mmap_map_index.rs (4)

54-58: Good improvement to conditional population on load

The addition of the do_populate flag based on the is_on_disk parameter properly controls whether memory-mapped files should be populated on load. This supports the PR's goal of improving cache management by making population explicit rather than automatic.


337-339: Simple accessor for on-disk status

This straightforward implementation provides consistent access to the on-disk status flag, which is essential for cache management decisions throughout the codebase.


341-347: Well-documented populate method

The clear documentation explaining the purpose of the method and its blocking behavior is helpful. The implementation correctly calls populate on both the hash map and point-to-values components.


349-359: Comprehensive cache clearing implementation

The method properly handles clearing the cache for all components of the mmap index, both by directly clearing file paths and by delegating to the point-to-values cache clearing method.

lib/common/memory/src/mmap_type.rs (4)

33-34: Updated import for Madviseable trait

The updated import includes the Madviseable trait which is necessary for the mmap population functionality being added throughout this PR.


186-189: Foundational implementation of populate for MmapType

This is the core implementation that other memory-mapped types will build upon. It correctly calls populate on the underlying mmap and returns an appropriate result.


302-307: Consistent populate implementation for MmapSlice

The implementation properly delegates to the underlying MmapType's populate method and handles error propagation correctly.


408-413: Well-documented populate method for MmapBitSlice

The clear documentation specifies that the method blocks until all pages are populated, which is important information for users of this API. The implementation correctly propagates errors from the underlying mmap population.

lib/segment/src/index/struct_payload_index.rs (3)

445-452: Efficient population of all field indexes

This implementation correctly iterates through all field indexes and calls populate on each one, ensuring that all memory-mapped data is properly loaded into memory when needed.


454-461: Comprehensive cache clearing for field indexes

The implementation properly iterates through all field indexes and clears their caches, which aligns with the PR objective of enhancing disk cache management across all components.


463-472: Selective cache clearing optimization

This method efficiently checks if each index is on disk before clearing its cache, which prevents unnecessary operations on in-memory indexes. This is a good performance optimization.

lib/segment/src/index/field_index/field_index_base.rs (3)

423-437: LGTM - Adds a comprehensive is_on_disk method

The implementation correctly delegates the disk status check to the respective index implementations for each variant of the FieldIndex enum, following a consistent pattern across all index types.


441-455: LGTM - Implements populate method for explicit cache control

This method allows for explicit control over memory map page population, which aligns well with the PR's objective of enhancing disk cache management. The implementation properly delegates to the appropriate method for each index type and includes clear documentation.


458-472: LGTM - Implements clear_cache method for explicit cache management

This method provides a standardized way to clear disk caches across all index types, which is essential for the "disk cache hygiene" improvements in this PR. The implementation correctly delegates to each index type's implementation and handles error propagation.

lib/segment/src/index/field_index/mmap_point_to_values.rs (4)

6-6: LGTM - Adds import for cache management functions

The import adds clear_disk_cache which is necessary for implementing the new cache management functionality.


262-265: LGTM - Enhanced open method with explicit populate parameter

The modification to the open method adds a new parameter populate that allows for explicit control over whether the memory-mapped file should be populated immediately upon opening. This change supports the PR's goal of improving cache management by providing more fine-grained control.


373-375: LGTM - Implements populate method

This method allows for explicitly populating all pages in the memory-mapped file, which aligns with the PR objectives of providing better cache control mechanisms.


378-381: LGTM - Implements clear_cache method

This method properly implements cache clearing functionality by using the clear_disk_cache function on the file. The implementation correctly handles error propagation.

lib/segment/src/index/field_index/map_index/mod.rs (3)

438-444: LGTM - Adds is_on_disk method with appropriate variant handling

The implementation correctly identifies that Mutable and Immutable variants are not stored on disk and only delegates to the contained index for the Mmap variant. This properly supports the disk cache hygiene objectives.


448-455: LGTM - Implements populate method with clear variant handling

This method properly handles the different variants of the MapIndex enum - taking no action for memory-only indices and delegating to the mmap implementation when appropriate. The comments explaining why no action is taken for certain variants are helpful for maintainability.


458-465: LGTM - Implements clear_cache method with appropriate variant handling

Similar to the populate method, this method correctly handles the different variants of the MapIndex enum and only applies cache clearing operations to memory-mapped indices. The implementation includes helpful comments and properly propagates any errors.

lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (5)

8-8: LGTM - Adds import for cache management functions

The import adds clear_disk_cache from the memory madvise module, which is necessary for implementing the cache clearing functionality.


140-142: LGTM - Adds is_on_disk method

Simple and straightforward implementation that returns the cached value of whether the index is on disk.


146-151: LGTM - Implements populate method for explicit cache control

This method properly populates all the memory-mapped components of the inverted index: postings, vocabulary, and point-to-tokens count. Error propagation is correctly handled.


154-161: LGTM - Implements comprehensive clear_cache method

This method clears the disk cache for all files associated with the inverted index. The implementation iterates through all files and applies the cache clearing operation to each, with proper error handling.


90-90:

❓ Verification inconclusive

Verify vocabulary population behavior change

The change to the vocab initialization now explicitly sets the second argument to false instead of using the populate parameter. This suggests that vocabulary is now never populated immediately upon opening, which is a behavior change.

Is this intentional as part of removing the delayed mmap population feature mentioned in the PR objectives?


🏁 Script executed:

#!/bin/bash
# Look for other places in the codebase that use MmapHashMap::open
# to understand if this is a consistent pattern

# Search for MmapHashMap::open usage
rg "MmapHashMap::open" -A 2 -B 2

Length of output: 528


Confirm intentional removal of delayed mmap population for vocabulary

It appears that in lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs the vocabulary is opened with population explicitly disabled (false), whereas in lib/segment/src/index/field_index/map_index/mmap_map_index.rs the MmapHashMap::open call conditionally enables population using a flag (do_populate). Please verify that the decision to always disable population in the inverted index is intentional and aligned with the PR objectives (i.e., removing the delayed mmap population feature).

  • Action Required: Confirm whether the hardcoded false is the desired behavior or if the vocabulary should be conditionally populated (similar to the map index).
  • Documentation: If this change is intentional, please consider adding a clear comment or documentation note explaining the removal of the delayed population feature.
lib/segment/src/index/field_index/geo_index/mod.rs (3)

346-352: Public API enhancement for disk state check

The is_on_disk method provides a clean way to check whether the geo index is stored on disk, by properly delegating to the underlying implementation for the Mmap variant while returning false for in-memory variants.


356-363: Memory management improvement with explicit page population

The populate method is well-implemented to handle all index variants appropriately, only taking action for the Mmap variant that actually uses memory-mapped files. This method aligns with the PR's goal of enhancing cache management.


366-373: Cache hygiene improvement with explicit cache clearing

The clear_cache method complements the populate method by providing a way to explicitly drop the disk cache. The implementation correctly handles each index variant and propagates errors properly.

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)

12-12: Appropriate import for new functionality

Added import for clear_disk_cache function to support the new cache management functionality.


42-58: Improved memory and cache management for vector storage

These new methods provide explicit control over memory-mapped pages and disk cache management:

  1. populate: Ensures all memory-mapped pages are loaded, optimizing for performance when data needs to be accessed.
  2. clear_cache: Allows dropping the disk cache, potentially freeing memory when vectors aren't needed.

Both methods handle nullable cases appropriately and provide clear error propagation.

lib/segment/src/index/hnsw_index/hnsw.rs (10)

17-17: Adding necessary import for cache management

Added clear_disk_cache import to support the new cache clearing functionality.


84-84: New field to track disk persistence state

Added is_on_disk field to store whether the HNSW index is on disk, making the information directly accessible within the struct.


161-162: Initializing new disk state field

Properly initializes the is_on_disk field from the configuration, using a reasonable default (false) when the config value is not provided.


163-163: Updated method call to use new field

Modified GraphLayers::load to use the newly created is_on_disk variable instead of directly accessing the config value, improving consistency.


174-175: Initializing new field in struct constructor

Added the is_on_disk field to the struct initialization, maintaining consistency with the rest of the code.


178-180: Added accessor for disk state information

New is_on_disk method provides a clean way to access the disk state of the index without exposing internal details.


487-491: Performance optimization for graph construction

Setting is_on_disk to true during build ensures the graph is not loaded into RAM, which is appropriate since it will be discarded after build. This comment is clearly explained in the preceding lines.


522-523: Consistent struct initialization

Updated the HNSWIndex struct initialization to include the new is_on_disk field.


1137-1140: Added explicit page population method

The populate method provides a way to read all underlying data from disk into the disk cache, improving performance for subsequent operations by ensuring data is already loaded.


1142-1147: Added cache clearing functionality

The clear_cache method provides a way to drop the disk cache, freeing memory when the index data isn't needed. It appropriately handles all files associated with the graph.

lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (5)

9-9: Updated import to include cache management function

Added clear_disk_cache to the import list to support the new cache management functionality.


227-227: Updated method call for enhanced functionality

Changed MmapPointToValues::open call to include a second parameter set to true, which likely affects how the data is handled upon opening. This aligns with the PR's goals of improving memory management.


401-403: Added disk state accessor

The is_on_disk method provides a clean way to access the disk state of the index, consistent with similar methods in other components.


407-413: Added explicit page population

The populate method ensures all memory-mapped components are loaded into memory, optimizing for performance in subsequent operations. It properly handles all relevant mmap components.


416-430: Added comprehensive cache clearing

The clear_cache method provides a thorough implementation that clears the cache for all relevant files and components, freeing memory when the index data isn't needed. The implementation is consistent with similar methods in other components.

lib/segment/src/index/field_index/full_text_index/text_index.rs (3)

263-270: Well-implemented disk status check method.

This method correctly implements an interface check to determine whether the index is on disk, properly handling all enum variants.


272-281: Clear implementation of memory page population.

The populate method provides explicit control for memory mapping, correctly delegating to the underlying implementation for the Mmap variant while handling other variants appropriately.


283-291: Well-structured cache clearing implementation.

The clear_cache method follows the same pattern as the other new methods, providing consistent API across different index variants and properly propagating any errors that might occur.

lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (5)

11-11: Appropriate imports for cache management.

This import correctly includes the necessary functions for memory advice settings and cache clearing.


171-171: Pass population flag to point-to-values loader.

The load method now correctly passes a population flag to the MmapPointToValues::open method based on the disk state, aligning with the disk cache hygiene objectives.


359-361: Simple and effective disk status check.

The method appropriately returns the stored state indicating whether the index data resides on disk.


363-369: Thorough memory page population.

This method ensures all components of the numeric index are properly populated in memory by calling the appropriate populate methods for both pairs and point-to-values.


371-382: Comprehensive cache clearing implementation.

The method handles cache clearing for all components of the numeric index, including both the underlying storage files and the point-to-values structure.

lib/common/common/src/mmap_hashmap.rs (2)

197-199: API enhancement for explicit population control.

The open method now accepts an explicit populate parameter that's passed to the underlying memory mapping function, allowing for fine-grained control over cache population.


340-345: Well-implemented memory page population.

This method provides a clear way to ensure all memory pages are populated, with appropriate documentation explaining its purpose and behavior.

lib/segment/src/index/field_index/numeric_index/mod.rs (4)

425-431: Consistent disk status check implementation.

This method follows the same pattern as other index implementations, correctly handling all enum variants and delegating to the appropriate inner implementation when needed.


433-442: Clean implementation of memory page population.

The method provides explicit control over memory mapping, with clear comments indicating the behavior for each variant.


444-452: Well-structured cache clearing implementation.

This method follows the established pattern for handling cache operations across different variants, with proper documentation.


527-529: Complete API exposure through delegation.

The delegate macro has been appropriately updated to include the new methods, ensuring they're accessible through the NumericIndex public interface.

lib/segment/src/segment_constructor/segment_builder.rs (6)

537-537: Good addition of clone() to preserve the reference

This change ensures that payload_storage_arc is properly cloned when passed to StructPayloadIndex::open, allowing the original reference to remain intact for later use in cache management.


567-575: Improved vector storage handling for cache management

The code now properly extracts and stores the vector storage in a local variable before using it, then clones it when passing to build_vector_index. This change enables the subsequent cache clearing operations after the index is built.


588-592: Good addition of cache clearing for on-disk vector storage

This new check prevents cache pollution by explicitly clearing the vector storage cache when it's no longer needed, but only if the storage is on-disk. This aligns with the PR's goal of improving disk cache hygiene.


614-622: Appropriate cache clearing for sparse vector components

The implementation now properly checks and clears caches for both sparse vector storage and sparse vector index when they're on-disk. This is a consistent application of the cache hygiene pattern throughout different storage types.


625-629: Well-implemented payload storage cache clearing

Adding this check ensures that payload storage caches are also cleared when they're on-disk, completing the comprehensive approach to cache management across all storage types.


631-632: Good addition of payload index cache clearing

The call to clear_cache_if_on_disk() for the payload index completes the implementation of proper cache hygiene across all components. This is a consistent and thorough approach.

@generall generall force-pushed the disk-cache-hygiene branch from 6f0ea90 to 01eef9d Compare April 7, 2025 23:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

941-961: ⚠️ Potential issue

Error propagation missing in populate method

The method correctly handles different storage types, but fails to propagate errors from the underlying populate() calls, which could lead to silent failures when population errors occur.

Apply this fix to properly propagate errors from the storage population operations:

 pub fn populate(&self) -> OperationResult<()> {
     match &self.storage_impl {
         QuantizedVectorStorage::ScalarRam(_) => {} // not mmap
-        QuantizedVectorStorage::ScalarMmap(storage) => storage.storage().populate(),
+        QuantizedVectorStorage::ScalarMmap(storage) => storage.storage().populate()?,
         QuantizedVectorStorage::PQRam(_) => {}
-        QuantizedVectorStorage::PQMmap(storage) => storage.storage().populate(),
+        QuantizedVectorStorage::PQMmap(storage) => storage.storage().populate()?,
         QuantizedVectorStorage::BinaryRam(_) => {}
-        QuantizedVectorStorage::BinaryMmap(storage) => storage.storage().populate(),
+        QuantizedVectorStorage::BinaryMmap(storage) => storage.storage().populate()?,
         QuantizedVectorStorage::ScalarRamMulti(_) => {}
         QuantizedVectorStorage::ScalarMmapMulti(storage) => {
-            storage.storage().storage().populate()
+            storage.storage().storage().populate()?
         }
         QuantizedVectorStorage::PQRamMulti(_) => {}
-        QuantizedVectorStorage::PQMmapMulti(storage) => storage.storage().storage().populate(),
+        QuantizedVectorStorage::PQMmapMulti(storage) => storage.storage().storage().populate()?,
         QuantizedVectorStorage::BinaryRamMulti(_) => {}
         QuantizedVectorStorage::BinaryMmapMulti(storage) => {
-            storage.storage().storage().populate()
+            storage.storage().storage().populate()?
         }
     }
     Ok(())
 }
🧹 Nitpick comments (10)
lib/segment/src/vector_storage/vector_storage_base.rs (3)

433-460: Consider adding documentation to explain the purpose of this method

While the implementation is correct, adding a documentation comment would help explain the method's purpose, when it should be used, and its effect on system performance and memory usage.

+    /// Populate memory pages for memory-mapped storage variants
+    /// 
+    /// For memory-mapped storage variants, this method ensures vector data
+    /// is loaded into memory for faster access. This can be useful to
+    /// improve performance at the cost of increased memory usage.
+    /// For in-memory storage variants, this method does nothing.
     pub fn populate(&self) -> OperationResult<()> {
         match self {
             VectorStorageEnum::DenseSimple(_) => {} // Can't populate as it is not mmap

462-489: Add documentation for the clear_cache method

Similar to the suggestion for populate(), adding documentation would make the purpose and effect of this method clearer.

+    /// Clear memory pages from cache for memory-mapped storage variants
+    /// 
+    /// For memory-mapped storage variants, this method releases cached memory
+    /// pages, potentially reducing memory usage. This can be useful when
+    /// memory pressure is high or when the vectors are not likely to be 
+    /// accessed soon.
+    /// For in-memory storage variants, this method does nothing.
     pub fn clear_cache(&self) -> OperationResult<()> {
         match self {
             VectorStorageEnum::DenseSimple(_) => {} // Can't populate as it is not mmap

464-476: Update comments to mention "clearing" instead of "populating"

The comments for non-memory-mapped variants in the clear_cache method incorrectly mention "Can't populate" when they should mention "Can't clear cache".

     pub fn clear_cache(&self) -> OperationResult<()> {
         match self {
-            VectorStorageEnum::DenseSimple(_) => {} // Can't populate as it is not mmap
-            VectorStorageEnum::DenseSimpleByte(_) => {} // Can't populate as it is not mmap
-            VectorStorageEnum::DenseSimpleHalf(_) => {} // Can't populate as it is not mmap
+            VectorStorageEnum::DenseSimple(_) => {} // Can't clear cache as it is not mmap
+            VectorStorageEnum::DenseSimpleByte(_) => {} // Can't clear cache as it is not mmap
+            VectorStorageEnum::DenseSimpleHalf(_) => {} // Can't clear cache as it is not mmap
lib/segment/src/index/hnsw_index/graph_layers.rs (1)

344-347: Add documentation comments for consistency

The populate method is correctly implemented by calling the method on the links component. However, unlike similar methods in other files, this one is missing documentation comments explaining its purpose and behavior.

+    /// Populate all pages in the mmap.
+    /// Block until all pages are populated.
     pub fn populate(&self) -> OperationResult<()> {
         self.links.populate()?;
         Ok(())
     }
lib/segment/src/index/field_index/mmap_point_to_values.rs (1)

370-381: Validate new cache management methods.

The newly added populate and clear_cache methods are clear and consistent. Keep in mind:

  • populate() forces a potentially large synchronous read, which should be invoked only in performance-critical or pre-warm scenarios.
  • clear_cache() might not be supported on all platforms and could need error handling for partial failures.
lib/segment/src/index/field_index/geo_index/mod.rs (1)

354-373: Ensure uniform cache invocation patterns across variants.

These populate and clear_cache methods route properly to the Mmap variant. Consider unifying the approach if future improvements allow Mutable or Immutable variants to optionally live on disk later.

lib/segment/src/index/hnsw_index/hnsw.rs (2)

84-85: Track is_on_disk consistently.

Introducing is_on_disk into the HNSWIndex struct is a good step. Ensure that any toggling or runtime transitions from disk-based to memory-based are appropriately handled, if needed in the future.


487-492: Forced on-disk mode during build phase.

Setting is_on_disk = true could override user preferences if on_disk was intended to be false. Confirm that forcing on-disk indexing is truly desired for all scenarios.

lib/common/common/src/mmap_hashmap.rs (2)

197-198: Updated method signature lacks documentation.

The open method now accepts a populate parameter to control whether memory pages are loaded immediately, but there's no documentation explaining this parameter and its implications.

Consider adding a doc comment explaining what the populate parameter does:

/// Load the hash map from file.
+/// 
+/// If `populate` is true, all memory pages will be loaded immediately.
+/// Otherwise, they will be loaded on-demand when accessed.
pub fn open(path: &Path, populate: bool) -> io::Result<Self> {

340-345: Error handling inconsistency in the populate method.

The populate method returns an io::Result<()> but doesn't actually propagate any errors - self.mmap.populate() doesn't return a result that's being checked.

There are two options to improve this:

  1. If errors can't occur, change the return type to just ():
-pub fn populate(&self) -> io::Result<()> {
+pub fn populate(&self) {
    self.mmap.populate();
-    Ok(())
}
  1. If populate() is meant to propagate errors but doesn't currently, add the necessary error handling.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f0ea90 and 01eef9d.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (62)
  • .dockerignore (1 hunks)
  • Cargo.toml (1 hunks)
  • lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (0 hunks)
  • lib/collection/src/shards/local_shard/mod.rs (1 hunks)
  • lib/common/common/Cargo.toml (1 hunks)
  • lib/common/common/benches/mmap_hashmap.rs (1 hunks)
  • lib/common/common/src/mmap_hashmap.rs (5 hunks)
  • lib/common/memory/Cargo.toml (1 hunks)
  • lib/common/memory/src/chunked_utils.rs (2 hunks)
  • lib/common/memory/src/madvise.rs (2 hunks)
  • lib/common/memory/src/mmap_ops.rs (1 hunks)
  • lib/common/memory/src/mmap_type.rs (4 hunks)
  • lib/gridstore/src/bitmask/gaps.rs (2 hunks)
  • lib/gridstore/src/bitmask/mod.rs (2 hunks)
  • lib/gridstore/src/gridstore.rs (1 hunks)
  • lib/gridstore/src/page.rs (2 hunks)
  • lib/quantization/src/encoded_vectors_binary.rs (1 hunks)
  • lib/quantization/src/encoded_vectors_pq.rs (1 hunks)
  • lib/quantization/src/encoded_vectors_u8.rs (1 hunks)
  • lib/segment/benches/multi_vector_search.rs (1 hunks)
  • lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (1 hunks)
  • lib/segment/src/index/field_index/bool_index/mod.rs (2 hunks)
  • lib/segment/src/index/field_index/field_index_base.rs (1 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (2 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (3 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs (1 hunks)
  • lib/segment/src/index/field_index/full_text_index/text_index.rs (1 hunks)
  • lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (3 hunks)
  • lib/segment/src/index/field_index/geo_index/mod.rs (1 hunks)
  • lib/segment/src/index/field_index/map_index/mmap_map_index.rs (3 hunks)
  • lib/segment/src/index/field_index/map_index/mod.rs (1 hunks)
  • lib/segment/src/index/field_index/mmap_point_to_values.rs (5 hunks)
  • lib/segment/src/index/field_index/null_index/mmap_null_index.rs (1 hunks)
  • lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (3 hunks)
  • lib/segment/src/index/field_index/numeric_index/mod.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/graph_links.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/hnsw.rs (7 hunks)
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs (1 hunks)
  • lib/segment/src/index/struct_payload_index.rs (1 hunks)
  • lib/segment/src/index/vector_index_base.rs (2 hunks)
  • lib/segment/src/payload_storage/mmap_payload_storage.rs (1 hunks)
  • lib/segment/src/payload_storage/payload_storage_enum.rs (1 hunks)
  • lib/segment/src/segment/mod.rs (1 hunks)
  • lib/segment/src/segment/segment_ops.rs (1 hunks)
  • lib/segment/src/segment_constructor/segment_builder.rs (4 hunks)
  • lib/segment/src/types.rs (2 hunks)
  • lib/segment/src/vector_storage/chunked_mmap_vectors.rs (2 hunks)
  • lib/segment/src/vector_storage/chunked_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (2 hunks)
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2 hunks)
  • lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (2 hunks)
  • lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (1 hunks)
  • lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2 hunks)
  • lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/vector_storage_base.rs (2 hunks)
  • lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (2 hunks)
  • lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (2 hunks)
💤 Files with no reviewable changes (1)
  • lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
🚧 Files skipped from review as they are similar to previous changes (36)
  • lib/common/common/Cargo.toml
  • lib/common/memory/Cargo.toml
  • .dockerignore
  • lib/quantization/src/encoded_vectors_binary.rs
  • lib/quantization/src/encoded_vectors_pq.rs
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs
  • lib/segment/benches/multi_vector_search.rs
  • Cargo.toml
  • lib/quantization/src/encoded_vectors_u8.rs
  • lib/gridstore/src/page.rs
  • lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs
  • lib/segment/src/payload_storage/payload_storage_enum.rs
  • lib/segment/src/vector_storage/chunked_vector_storage.rs
  • lib/segment/src/index/field_index/full_text_index/mmap_text_index.rs
  • lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs
  • lib/segment/src/index/field_index/null_index/mmap_null_index.rs
  • lib/segment/src/index/field_index/map_index/mmap_map_index.rs
  • lib/segment/src/index/struct_payload_index.rs
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs
  • lib/common/common/benches/mmap_hashmap.rs
  • lib/gridstore/src/bitmask/gaps.rs
  • lib/sparse/src/index/inverted_index/inverted_index_mmap.rs
  • lib/common/memory/src/madvise.rs
  • lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs
  • lib/segment/src/segment_constructor/segment_builder.rs
  • lib/segment/src/index/field_index/field_index_base.rs
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
  • lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs
  • lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs
  • lib/segment/src/payload_storage/mmap_payload_storage.rs
  • lib/segment/src/index/field_index/numeric_index/mod.rs
  • lib/segment/src/index/field_index/full_text_index/text_index.rs
  • lib/segment/src/index/field_index/map_index/mod.rs
  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/index/field_index/bool_index/mod.rs
🧰 Additional context used
🧬 Code Definitions (16)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (4)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (1)
  • populate (13-15)
lib/segment/src/index/vector_index_base.rs (2)
  • populate (119-134)
  • clear_cache (136-151)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • populate (433-460)
  • clear_cache (462-489)
lib/segment/src/index/vector_index_base.rs (7)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
  • inverted_index (243-245)
lib/segment/src/index/hnsw_index/hnsw.rs (3)
  • is_on_disk (178-180)
  • populate (1137-1139)
  • clear_cache (1142-1147)
lib/segment/src/vector_storage/vector_storage_base.rs (4)
  • is_on_disk (42-42)
  • is_on_disk (551-577)
  • populate (433-460)
  • clear_cache (462-489)
lib/sparse/src/index/inverted_index/mod.rs (1)
  • is_on_disk (31-31)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
  • populate (344-347)
lib/segment/src/index/hnsw_index/graph_links.rs (1)
  • populate (149-155)
lib/segment/src/vector_storage/chunked_vector_storage.rs (1)
  • clear_cache (69-69)
lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (5)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
  • populate (61-65)
  • clear_cache (68-72)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (1)
  • clear_cache (120-122)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (6)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
  • populate (45-50)
  • clear_cache (53-57)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
  • populate (116-118)
  • clear_cache (120-122)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)
  • populate (180-184)
  • clear_cache (187-191)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • populate (433-460)
  • clear_cache (462-489)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (6)
lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
  • populate (65-68)
  • clear_cache (71-74)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
  • populate (116-118)
  • clear_cache (120-122)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
  • populate (61-65)
  • clear_cache (68-72)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)
  • populate (76-80)
  • clear_cache (83-87)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • populate (433-460)
  • clear_cache (462-489)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (5)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
  • populate (45-50)
  • clear_cache (53-57)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
  • populate (61-65)
  • clear_cache (68-72)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)
  • populate (76-80)
  • clear_cache (83-87)
lib/common/memory/src/mmap_type.rs (9)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (1)
  • populate (240-242)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/gridstore/src/bitmask/gaps.rs (1)
  • populate (259-262)
lib/gridstore/src/bitmask/mod.rs (1)
  • populate (506-510)
lib/segment/src/vector_storage/chunked_vector_storage.rs (1)
  • populate (66-66)
lib/gridstore/src/page.rs (1)
  • populate (118-120)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (1)
  • populate (116-118)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
  • populate (180-184)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
  • populate (433-460)
  • std (141-141)
lib/gridstore/src/bitmask/mod.rs (5)
lib/common/memory/src/madvise.rs (1)
  • clear_disk_cache (177-198)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/gridstore/src/bitmask/gaps.rs (2)
  • populate (259-262)
  • clear_cache (265-268)
lib/gridstore/src/page.rs (2)
  • populate (118-120)
  • clear_cache (123-126)
lib/gridstore/src/gridstore.rs (2)
  • populate (533-539)
  • clear_cache (542-548)
lib/segment/src/index/hnsw_index/graph_links.rs (7)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
  • populate (1137-1139)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
  • populate (344-347)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
  • populate (45-50)
lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (1)
  • populate (282-285)
lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (1)
  • populate (352-355)
lib/segment/src/vector_storage/vector_storage_base.rs (1)
  • populate (433-460)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (9)
lib/common/memory/src/madvise.rs (5)
  • madvise (94-96)
  • madvise (102-102)
  • madvise (108-114)
  • madvise (132-138)
  • clear_disk_cache (177-198)
lib/segment/src/index/hnsw_index/hnsw.rs (2)
  • populate (1137-1139)
  • clear_cache (1142-1147)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
  • populate (116-118)
  • clear_cache (120-122)
lib/gridstore/src/bitmask/mod.rs (1)
  • clear_cache (513-517)
lib/gridstore/src/page.rs (1)
  • clear_cache (123-126)
lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (1)
  • clear_cache (288-290)
lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (1)
  • clear_cache (358-360)
lib/segment/src/index/hnsw_index/hnsw.rs (4)
lib/common/memory/src/madvise.rs (8)
  • madvise (94-96)
  • madvise (102-102)
  • madvise (108-114)
  • madvise (132-138)
  • clear_disk_cache (177-198)
  • populate (104-104)
  • populate (116-128)
  • populate (140-152)
lib/segment/src/vector_storage/vector_storage_base.rs (3)
  • is_on_disk (42-42)
  • is_on_disk (551-577)
  • populate (433-460)
lib/segment/src/index/hnsw_index/graph_layers.rs (2)
  • load (272-286)
  • populate (344-347)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/segment/src/vector_storage/vector_storage_base.rs (8)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
  • populate (247-250)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
  • populate (45-50)
  • clear_cache (53-57)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
  • populate (116-118)
  • clear_cache (120-122)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
  • populate (61-65)
  • clear_cache (68-72)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)
  • populate (180-184)
  • clear_cache (187-191)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)
  • populate (76-80)
  • clear_cache (83-87)
lib/segment/src/index/vector_index_base.rs (2)
  • populate (119-134)
  • clear_cache (136-151)
lib/segment/src/index/field_index/geo_index/mod.rs (3)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (3)
  • is_on_disk (401-403)
  • populate (407-413)
  • clear_cache (416-430)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (3)
  • is_on_disk (337-339)
  • populate (343-347)
  • clear_cache (350-359)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (3)
  • is_on_disk (359-361)
  • populate (365-369)
  • clear_cache (372-382)
lib/segment/src/types.rs (11)
lib/segment/src/index/field_index/bool_index/mod.rs (1)
  • is_on_disk (86-91)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
  • is_on_disk (178-180)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (1)
  • is_on_disk (140-142)
lib/segment/src/index/field_index/map_index/mod.rs (1)
  • is_on_disk (438-444)
lib/segment/src/index/field_index/full_text_index/text_index.rs (1)
  • is_on_disk (264-270)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
  • is_on_disk (401-403)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
  • is_on_disk (337-339)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (1)
  • is_on_disk (181-183)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
  • is_on_disk (359-361)
lib/segment/src/index/field_index/geo_index/mod.rs (1)
  • is_on_disk (346-352)
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
  • is_on_disk (425-431)
lib/common/common/src/mmap_hashmap.rs (9)
lib/common/memory/src/mmap_ops.rs (1)
  • open_read_mmap (44-62)
lib/segment/src/index/field_index/bool_index/mod.rs (1)
  • populate (95-101)
lib/segment/src/index/field_index/map_index/mod.rs (1)
  • populate (448-455)
lib/segment/src/index/field_index/full_text_index/text_index.rs (1)
  • populate (274-281)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
  • populate (407-413)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
  • populate (343-347)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
  • populate (365-369)
lib/segment/src/index/field_index/geo_index/mod.rs (1)
  • populate (356-363)
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
  • populate (435-442)
lib/gridstore/src/gridstore.rs (5)
lib/common/memory/src/chunked_utils.rs (1)
  • populate (45-47)
lib/gridstore/src/bitmask/gaps.rs (2)
  • populate (259-262)
  • clear_cache (265-268)
lib/gridstore/src/bitmask/mod.rs (2)
  • populate (506-510)
  • clear_cache (513-517)
lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
  • populate (66-66)
  • clear_cache (69-69)
lib/gridstore/src/page.rs (2)
  • populate (118-120)
  • clear_cache (123-126)
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: lint
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
🔇 Additional comments (49)
lib/segment/src/types.rs (2)

552-557: Clean implementation for checking on-disk status in Indexes

The new is_on_disk method provides a convenient way to check if an index is stored on disk, which is important for disk cache management. The implementation correctly handles the Plain variant (always returning false) and the Hnsw variant (checking the configuration's on_disk property with a safe default).


1385-1394: Explicit on-disk status handling for sparse vector storage

This implementation clearly indicates that both storage variants (OnDisk and Mmap) are considered to be disk-based. The comment explaining why both return true is helpful for future maintainers who might otherwise be confused about why an explicit check is needed when the result is always true.

lib/segment/src/vector_storage/vector_storage_base.rs (3)

433-460: Implementation of memory management method looks good

This method properly implements explicit memory management by calling populate() on all memory-mapped variants and doing nothing for in-memory storage types. This aligns with the PR objective of enhancing disk cache management.


462-489: Clear cache implementation looks correct

The implementation follows the same pattern as the populate() method, appropriately handling all variants of the enum and propagating errors. The code structure is consistent with other methods in this file.


549-551: Good documentation addition

The added documentation comment provides clear explanation of the is_on_disk() method's return value semantics, which improves code maintainability and developer understanding.

lib/common/memory/src/chunked_utils.rs (1)

44-47: Well-implemented populate method

The implementation correctly delegates to the underlying mmap field's populate method, which is a clean approach for providing cache management functionality.

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (2)

427-432: Good implementation of populate method

This implementation properly iterates through all chunks and calls their respective populate methods, ensuring that all memory-mapped pages are populated. Error handling is correctly implemented with the ? operator to propagate any errors.


434-440: Effective cache clearing implementation

The implementation correctly identifies all chunk files and clears their disk cache individually. This approach aligns with the PR objective of enhancing disk cache management.

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)

116-118: Clean delegation to underlying storage

The method properly delegates to the underlying mmap_storage.populate() method, keeping the code DRY.


120-122: Appropriate cache clearing delegation

The method correctly delegates to the underlying mmap_storage.clear_cache() method, consistent with the design pattern used throughout this implementation.

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

963-968: Effective implementation of clear_cache

The method correctly iterates through all files associated with the QuantizedVectors instance and clears their disk cache. Error handling is properly implemented with the ? operator to propagate any errors that may occur during the process.

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)

177-184: Implementation looks good for the populate method

The method correctly populates all mmap pages by calling populate() on both the deleted flags and storage components. This matches similar implementations in other storage types and aligns with the PR goal of improving disk cache management.


186-191: Implementation looks good for the clear_cache method

The method properly clears the disk cache by calling clear_cache() on both components. This is consistent with the PR objective of enhancing cache hygiene across the system.

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)

74-80: Implementation looks good for the populate method

The method correctly populates all mmap pages by calling populate() on both the vectors and offsets components. This is consistent with the implementation pattern used across storage types and aligns with the PR objective of improving cache management.


82-87: Implementation looks good for the clear_cache method

The method properly clears the disk cache by calling clear_cache() on both the vectors and offsets components. This implementation follows the pattern established in other storage types and supports the PR goal of enhancing disk cache hygiene.

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

59-65: Implementation looks good for the populate method

The method correctly populates all mmap pages by calling populate() on both the deleted flags and vectors components. This implementation is consistent with other storage types and supports the PR objective of improving memory-mapped cache management.


67-72: Implementation looks good for the clear_cache method

The method properly clears the disk cache by calling clear_cache() on both components. This implementation matches the pattern used in other storage types and aligns with the PR goal of enhancing disk cache hygiene.


244-245: Note that populate is initially set to false

The populate parameter is set to false when opening the vector storage, which means the memory-mapped files won't be automatically populated. This is aligned with the PR's goal of removing delayed mmap population and replacing it with explicit population through the new populate() method.

lib/gridstore/src/bitmask/mod.rs (2)

504-510: Good addition of the populate method to preload data into memory!

This method ensures all pages of the memory-mapped files are populated, which can improve performance by preventing page faults during critical operations. The implementation correctly calls populate on both the bitslice and regions_gaps components.


512-517: Effective cache clearing implementation

The implementation correctly clears the disk cache for both the bitmask file and the regions gaps. This helps prevent cache pollution when the bitmask is no longer needed, which is consistent with the PR objectives for improving disk cache hygiene.

lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (3)

10-11: Correctly updated import to include clear_disk_cache

Added the necessary import for the new cache clearing functionality.


288-293: Well-implemented populate method for memory preloading

This method correctly calls populate on the flags field to ensure all memory-mapped pages are loaded into memory. This prevents page faults during critical operations, improving performance.


295-300: Good implementation of disk cache clearing

The method correctly constructs the path to the flags file and calls clear_disk_cache on it. This helps prevent cache pollution when these flags are no longer needed, consistent with the PR objectives.

lib/segment/src/segment/mod.rs (1)

101-132: Excellent implementation of cache clearing on segment drop

This change adds comprehensive cache clearing when a segment is dropped, which directly addresses the PR objective of improving disk cache hygiene. The implementation:

  1. Systematically clears caches for all components (payload_storage, payload_index, and vector data)
  2. Properly handles all vector data components (index, storage, and quantized vectors)
  3. Includes appropriate error handling with detailed logging

This will help prevent disk cache pollution when segments are dropped, especially during operations like segment optimization or collection updates.

lib/segment/src/index/vector_index_base.rs (3)

101-117: Well-implemented disk storage detection method.

This is_on_disk() method properly delegates to the appropriate variant-specific implementation, providing a clean interface for checking if an index uses disk-based storage.


119-134: Good implementation of memory page population.

The populate() method correctly handles all variants of the enum, calling the appropriate implementation when available and properly propagating errors. This is a nice addition to explicitly control memory mapping behavior.


136-151: Well-structured cache clearing functionality.

The clear_cache() method is implemented consistently with the populate method, providing a clean way to explicitly drop disk caches. The error handling is appropriate, and the method handles all enum variants correctly.

lib/gridstore/src/gridstore.rs (2)

531-539: Well-implemented page population method.

This method properly iterates through all pages and calls populate() on each one, then handles the bitmask as well. Clear documentation and correct error handling make this a solid implementation.


541-548: Clean implementation of cache clearing.

The clear_cache() method follows the same pattern as populate(), providing a consistent interface for cache management. Proper error propagation ensures that any issues are correctly reported up the call stack.

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)

247-250: Well-implemented memory page population method.

This populate() method provides a clean way to explicitly load all memory-mapped pages into memory. The implementation correctly propagates errors and aligns with the standardized approach used across the codebase.

lib/common/memory/src/mmap_type.rs (3)

186-189: Good implementation of memory page population for MmapType.

This method provides a consistent interface for populating memory-mapped pages, properly calling the underlying mmap's populate() method.


302-307: Well-documented MmapSlice population method.

Clear documentation and proper error handling make this implementation solid. The method correctly delegates to the underlying mmap's implementation while maintaining a consistent interface across different mmap types.


408-413: Clear and consistent BitSlice population implementation.

This method follows the same pattern as the other populate() methods, providing a uniform interface across different mmap implementations. The documentation clearly explains the purpose and behavior of the method.

lib/common/memory/src/mmap_ops.rs (1)

3-4: Removed unused imports and references
By removing PathBuf and other unused imports here, the code cleanly relies on Path and no longer includes logic related to the old pre-faulting mechanism. This aligns well with the newer approach to managing memory without refaulting logic.

lib/segment/src/index/hnsw_index/graph_links.rs (2)

6-6: Added extra import for Advice
Including Advice in the import statement allows more granular setting of memory advice options for the mapped file. This change is consistent with the new pattern of explicitly populating and advising memory pages.


147-154: Introduced populate method
This new method cleanly replaces the old pre-faulting approach and explicitly loads data into the disk cache if the underlying data is memory-mapped. The design using a blocking call is clear and well-documented. If future use cases require non-blocking or partial population, consider adding more granular controls or asynchronous operations.

lib/collection/src/shards/local_shard/mod.rs (1)

34-35: Refined imports for segmentation and snapshots
Adding Filter, SegmentType, SnapshotFormat, and others is consistent with the recent snapshot and segment data management features. Keep track of any imports that are no longer required to maintain clarity.

lib/segment/src/index/field_index/mmap_point_to_values.rs (1)

262-277:

✅ Verification successful

Confirm consistent usage of the populate parameter.

By introducing a populate: bool argument to the open method, the code now allows for optional page pre-loading. Please verify that all call sites correctly use this parameter to avoid unintentional memory overhead or unexpected delays.

Run the following script to locate all invocations of MmapPointToValues::open and review whether the populate argument is set as intended:


🏁 Script executed:

#!/bin/bash
rg -A3 'MmapPointToValues::open\('

Length of output: 1378


Verified Consistent Usage of the populate Parameter

I verified that all call sites for MmapPointToValues::open pass a boolean as intended:

  • In both lib/segment/src/index/field_index/map_index/mmap_map_index.rs and lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs, the code passes the do_populate variable.
  • In lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs, a literal true is passed.

If the hard-coded true for the geo index is intentional (to always preload pages), then the usage is consistent across the codebase. Otherwise, please consider aligning it with the dynamic approach used elsewhere.

lib/segment/src/index/field_index/geo_index/mod.rs (1)

345-352: Provide a consistent disk check for all index variants.

The is_on_disk() method correctly delegates to the underlying Mmap variant while returning false for Mutable and Immutable. This is clear and consistent. No issues.

lib/segment/src/index/hnsw_index/hnsw.rs (6)

17-17: Confirm usage of clear_disk_cache.

Importing clear_disk_cache is justified as it’s used below. No concerns here.


161-163: Ensure correct fallback for on_disk configuration.

Here you default to false when hnsw_config.on_disk is None. Confirm this aligns with the intended defaults and that no unexpected memory usage patterns arise for large indexes.


174-175: Assigning is_on_disk.

Storing is_on_disk from the configuration is straightforward and consistent. No issues here.


178-180: Expose on-disk status programmatically.

The is_on_disk() accessor is concise and might be useful for dynamic runtime checks or logging. Looks good.


1136-1140: Populate method for HNSWIndex.

Delegates population to graph.populate(). Straightforward. Ensure any concurrency constraints or large data loads are documented.


1141-1147: Cache clearing logic for HNSWIndex.

Iterating each file from graph.files to drop disk cache is consistent with the rest of the codebase’s approach. No issues.

lib/common/common/src/mmap_hashmap.rs (3)

11-12: New imports support expanded memory management functionality.

These imports enable better control over memory-mapped files, enabling the explicit cache population feature that's the focus of this PR.


521-521: Tests updated to use the new parameter.

The test has been correctly updated to pass the populate parameter (set to false) to the open method.


568-568: Test with populate=true included.

Good coverage - this test verifies functionality with populate set to true, ensuring both code paths work correctly.

lib/segment/src/segment/segment_ops.rs (1)

5-5: Import cleanup aligns with removal of prefault_mmap_pages.

The change in the import statement from likely using use std::thread::{self, JoinHandle}; to just use std::thread::JoinHandle; is consistent with the removal of the prefault_mmap_pages method (as mentioned in the AI summary). This method was previously responsible for creating a background thread to prefault memory-mapped pages, which explains why the direct thread module import is no longer needed.

This change aligns with the PR's objective of removing the delayed mmap population feature that was identified as a source of cache pollution, in favor of more explicit cache management functions.

@generall generall merged commit 0266508 into dev Apr 9, 2025
17 checks passed
@generall generall deleted the disk-cache-hygiene branch April 9, 2025 08:54
timvisee added a commit that referenced this pull request Apr 14, 2025
pull bot pushed a commit to kp-forks/qdrant that referenced this pull request Apr 21, 2025
* wip: implement explicit populate and clear_cache functions for all components

* fmt

* implement clear and populate for vector storages

* fmt

* implement clear and populate for payload storage

* wip: implement explicit populate and clear_cache functions payload indexes

* implement explicit populate and clear_cache functions payload indexes

* fix clippy on CI

* only compile posix_fadvise on linux

* only compile posix_fadvise on linux

* implement explicit populate and clear_cache functions for quantized vectors

* fmt

* remove post-load prefault

* fix typo

* implement is-on-disk for payload indexes, implement clear on drop for segment, implement clear after segment build

* fmt

* also evict quantized vectors after optimization

* re-use and replace advise_dontneed
@coderabbitai coderabbitai bot mentioned this pull request Jul 29, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants