-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Clear disk cache after read #6396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThis set of changes introduces a new Suggested reviewers
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/id_tracker/mutable_id_tracker.rs (1)
659-675
: Consider usingOneshotFile
for loading versions as well.The
load_versions
function still uses a standardFile
while theload_mappings
function was updated to useOneshotFile
. For consistency and to fully implement the cache optimization strategy, consider updating this function as well.- let file = File::open(versions_path)?; + let file = OneshotFile::open(versions_path)?; let file_len = file.metadata()?.len(); if file_len % VERSION_ELEMENT_SIZE != 0 { log::warn!( "Corrupted ID tracker versions storage, file size not a multiple of a version, assuming automatic recovery by WAL" ); } let version_count = file_len / VERSION_ELEMENT_SIZE; let mut reader = BufReader::new(file); - Ok((0..version_count) + let result = Ok((0..version_count) .map(|_| reader.read_u64::<FileEndianess>()) - .collect::<Result<_, _>>()?) + .collect::<Result<_, _>>()?); + + // Drop the cache after reading + let file = reader.into_inner(); + file.drop_cache()?; + + result
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (23)
lib/common/memory/Cargo.toml
(2 hunks)lib/common/memory/build.rs
(1 hunks)lib/common/memory/src/fadvise.rs
(1 hunks)lib/common/memory/src/lib.rs
(1 hunks)lib/common/memory/src/madvise.rs
(0 hunks)lib/gridstore/src/bitmask/gaps.rs
(1 hunks)lib/gridstore/src/bitmask/mod.rs
(1 hunks)lib/gridstore/src/page.rs
(1 hunks)lib/quantization/src/encoded_storage.rs
(2 hunks)lib/segment/src/id_tracker/mutable_id_tracker.rs
(3 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
(1 hunks)lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs
(1 hunks)lib/segment/src/index/field_index/map_index/mmap_map_index.rs
(1 hunks)lib/segment/src/index/field_index/mmap_point_to_values.rs
(1 hunks)lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs
(1 hunks)lib/segment/src/index/hnsw_index/hnsw.rs
(1 hunks)lib/segment/src/vector_storage/chunked_mmap_vectors.rs
(1 hunks)lib/segment/src/vector_storage/chunked_vectors.rs
(3 hunks)lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs
(1 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
(1 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs
(1 hunks)lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs
(1 hunks)lib/sparse/src/index/inverted_index/inverted_index_mmap.rs
(1 hunks)
💤 Files with no reviewable changes (1)
- lib/common/memory/src/madvise.rs
🧰 Additional context used
🧬 Code Graph Analysis (15)
lib/gridstore/src/bitmask/mod.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)lib/common/memory/src/madvise.rs (4)
madvise
(93-95)madvise
(101-101)madvise
(107-113)madvise
(131-137)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)
lib/gridstore/src/page.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)lib/common/memory/src/madvise.rs (4)
madvise
(93-95)madvise
(101-101)madvise
(107-113)madvise
(131-137)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)
lib/segment/src/index/field_index/mmap_point_to_values.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)lib/common/memory/src/madvise.rs (4)
madvise
(93-95)madvise
(101-101)madvise
(107-113)madvise
(131-137)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)lib/common/memory/src/madvise.rs (4)
madvise
(93-95)madvise
(101-101)madvise
(107-113)madvise
(131-137)
lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)lib/common/memory/src/madvise.rs (4)
madvise
(93-95)madvise
(101-101)madvise
(107-113)madvise
(131-137)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)lib/common/memory/src/madvise.rs (4)
madvise
(93-95)madvise
(101-101)madvise
(107-113)madvise
(131-137)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)lib/common/memory/src/madvise.rs (4)
madvise
(93-95)madvise
(101-101)madvise
(107-113)madvise
(131-137)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)lib/common/memory/src/madvise.rs (4)
madvise
(93-95)madvise
(101-101)madvise
(107-113)madvise
(131-137)
lib/quantization/src/encoded_storage.rs (1)
lib/common/memory/src/fadvise.rs (1)
fadvise
(11-13)
lib/gridstore/src/bitmask/gaps.rs (2)
lib/common/memory/src/fadvise.rs (2)
fadvise
(11-13)clear_disk_cache
(18-26)lib/common/memory/src/madvise.rs (4)
madvise
(93-95)madvise
(101-101)madvise
(107-113)madvise
(131-137)
lib/common/memory/src/lib.rs (1)
lib/common/memory/src/fadvise.rs (1)
fadvise
(11-13)
lib/segment/src/id_tracker/mutable_id_tracker.rs (1)
lib/common/memory/src/fadvise.rs (1)
fadvise
(11-13)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: storage-compat-test
🔇 Additional comments (32)
lib/common/memory/Cargo.toml (2)
16-16
: Dependency addition is appropriate.Adding the
delegate
crate as a workspace dependency is justified for trait delegation in the newfadvise
module.
28-28
: Good formatting: newline at end of file.Adding a newline at EOF is a standard formatting best practice.
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
12-12
: Import update is correct.Switching the import of
clear_disk_cache
tomemory::fadvise
is accurate and aligns with the new implementation.lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
16-16
: Import update is correct.Switching the import of
clear_disk_cache
tomemory::fadvise
is accurate and consistent with the new implementation.lib/gridstore/src/page.rs (1)
4-4
: Import update is correct.Switching the import of
clear_disk_cache
tomemory::fadvise
is accurate and aligns with the new implementation.lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
9-9
: Import update is correct.Switching the import of
clear_disk_cache
tomemory::fadvise
is accurate and consistent with the new implementation.lib/segment/src/vector_storage/chunked_mmap_vectors.rs (1)
11-11
: Import path update is correct.The import of
clear_disk_cache
frommemory::fadvise
is accurate and aligns with the refactor. No further action needed.lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
11-11
: Import path update is correct.The import of
clear_disk_cache
frommemory::fadvise
is correct and matches the refactor. No further action needed.lib/gridstore/src/bitmask/gaps.rs (1)
5-5
: Import path update is correct.The import of
clear_disk_cache
frommemory::fadvise
is correct and matches the refactor. No further action needed.lib/segment/src/index/field_index/mmap_point_to_values.rs (1)
6-6
: Import path update is correct.The import of
clear_disk_cache
frommemory::fadvise
is correct and matches the refactor. No further action needed.lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (1)
11-11
: Import path update is correct.The import of
clear_disk_cache
frommemory::fadvise
is correct and matches the refactor. No further action needed.lib/sparse/src/index/inverted_index/inverted_index_mmap.rs (1)
11-12
: Import source correctly changed forclear_disk_cache
The import was appropriately moved from
memory::madvise
tomemory::fadvise
, which aligns with the PR objective of providing explicit OS file cache management usingposix_fadvise
.lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (1)
8-9
: Import source correctly changed forclear_disk_cache
The import was appropriately moved from
memory::madvise
tomemory::fadvise
, which aligns with the PR objective of providing explicit OS file cache management usingposix_fadvise
.lib/sparse/src/index/inverted_index/inverted_index_compressed_mmap.rs (1)
13-14
: Import source correctly changed forclear_disk_cache
The import was appropriately moved from
memory::madvise
tomemory::fadvise
, which aligns with the PR objective of providing explicit OS file cache management usingposix_fadvise
.lib/gridstore/src/bitmask/mod.rs (1)
9-10
: Import source correctly changed forclear_disk_cache
The import was appropriately moved from
memory::madvise
tomemory::fadvise
, which aligns with the PR objective of providing explicit OS file cache management usingposix_fadvise
.lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
9-9
: Import updated to use the new fadvise module.The import of
clear_disk_cache
has been updated to use the new location frommemory::fadvise
instead ofmemory::madvise
. This is part of the refactoring described in the PR objectives, where cache management functionality has been consolidated in the new fadvise module.lib/segment/src/index/hnsw_index/hnsw.rs (1)
18-18
: Import updated to use the new fadvise module.The import of
clear_disk_cache
has been updated to use the new location frommemory::fadvise
instead ofmemory::madvise
. This matches the refactoring described in the PR objectives, moving the cache management functionality to the more appropriate fadvise module.lib/common/memory/src/lib.rs (1)
2-2
: Export of new fadvise module.Adding the public export of the new
fadvise
module makes its functionality available to the rest of the codebase. This module houses the implementation of theOneshotFile
struct and theclear_disk_cache
function that were previously in themadvise
module, providing more precise OS cache management through theposix_fadvise
system call.lib/segment/src/vector_storage/chunked_vectors.rs (3)
8-8
: Import OneshotFile from the new fadvise module.Added import for the
OneshotFile
struct from the newfadvise
module. This struct will be used to optimize file access patterns and explicitly manage OS cache.
193-193
: Use OneshotFile instead of standard File for improved cache management.Replaced standard
File::open
withOneshotFile::open
, which optimizes file access by applyingposix_fadvise
hints (POSIX_FADV_SEQUENTIAL
andPOSIX_FADV_NOREUSE
) for better sequential file reading performance.
204-204
: Explicitly drop OS file cache after reading.Added call to
drop_cache()
after reading the file to explicitly inform the OS that the cached pages for this file are no longer needed, which helps prevent unnecessary memory usage as described in the PR objectives.lib/quantization/src/encoded_storage.rs (2)
5-5
: LGTM: Good import for the newOneshotFile
struct.The import aligns with the PR's goal of optimizing OS file cache usage after reading files.
39-42
: Excellent change to improve file cache handling.Replacing
File::open
withOneshotFile::open
and addingdrop_cache()
call addresses the issue described in the PR objectives. This change helps prevent the OS file cache from being polluted with quantized vectors that are unlikely to be read again soon.lib/segment/src/id_tracker/mutable_id_tracker.rs (3)
11-11
: LGTM: Good import for the newOneshotFile
struct.The import aligns with the PR's goal of optimizing OS file cache usage.
409-417
: Clean implementation ofOneshotFile
usage.Using
OneshotFile
instead of standardFile
provides file access pattern hints to the OS. The change from dropping the reader to getting the underlying file withinto_inner()
is a necessary adjustment to preserve the file handle for truncation and cache dropping.
431-431
: Good addition of cache dropping after file operation.Explicitly dropping the file cache after truncation and syncing aligns with the PR objective of reducing OS file cache usage.
lib/common/memory/build.rs (1)
1-15
: Well-designed build script for conditional compilation.The build script correctly detects OS and environment conditions to enable
posix_fadvise
functionality on supported platforms. The comment referencing the actual implementation in thenix
crate provides good context for maintainers.lib/common/memory/src/fadvise.rs (5)
6-14
: Well-implemented conditional wrapper forposix_fadvise
.Good use of conditional compilation with
#[cfg(posix_fadvise_supported)]
and clean implementation of the wrapper function. The function appropriately uses the entire file (offset 0, length 0) for the advice.
15-26
: Robust implementation of publicclear_disk_cache
function.This function properly handles platforms where
posix_fadvise
is not supported by doing nothing silently. The documentation clearly states this behavior, which is important for cross-platform code.
28-59
: Excellent design for theOneshotFile
struct and its public API.The
OneshotFile
struct is well-designed for one-time sequential reading with appropriate file access hints. The implementation ofdrop_cache
with explicit error handling and theOption<File>
design to prevent double drops is particularly good.
62-87
: Good trait implementations for file operations.The implementations for
Deref
,Read
, andSeek
are clean and effectively delegate to the inner file. Thedelegate!
macro usage makes the code concise and maintainable.
89-97
: RobustDrop
implementation for automatic cache dropping.The
Drop
implementation correctly handles the case wheredrop_cache()
has already been called explicitly. It also ignores any errors that might occur during the advisory call, which is appropriate for destructors.
e5c5748
to
51e227b
Compare
51e227b
to
6414477
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses the file cache pollution issue by replacing the previous disk cache clearing mechanism with a new fadvise‐based implementation using an OneshotFile wrapper. Key changes include:
- Replacing all usages of memory::madvise::clear_disk_cache with memory::fadvise::clear_disk_cache.
- Updating file read operations to use OneshotFile::open and invoking drop_cache() to clear the OS file cache after reading.
- Adding a new fadvise module along with necessary build script and dependency adjustments.
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs | Updated clear_disk_cache import to use fadvise module. |
lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs | Updated ordering of import for clear_disk_cache to fadvise. |
lib/segment/src/vector_storage/chunked_vectors.rs | Replaced File::open with OneshotFile::open and added drop_cache call. |
lib/segment/src/vector_storage/chunked_mmap_vectors.rs | Updated clear_disk_cache import to use fadvise module. |
lib/segment/src/index/hnsw_index/hnsw.rs | Updated clear_disk_cache import to use fadvise module. |
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs | Updated clear_disk_cache import to use fadvise module. |
lib/segment/src/index/field_index/mmap_point_to_values.rs | Updated clear_disk_cache import to use fadvise module. |
lib/segment/src/index/field_index/map_index/mmap_map_index.rs | Updated clear_disk_cache import to use fadvise module. |
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs | Updated clear_disk_cache import to use fadvise module. |
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs | Updated clear_disk_cache import to use fadvise module. |
lib/segment/src/id_tracker/mutable_id_tracker.rs | Replaced File::open with OneshotFile::open and added drop_cache call. |
lib/quantization/src/encoded_storage.rs | Replaced File::open with OneshotFile::open and added drop_cache call. |
lib/gridstore/src/page.rs | Updated clear_disk_cache import to use fadvise module. |
lib/gridstore/src/bitmask/mod.rs | Updated clear_disk_cache import to use fadvise module. |
lib/gridstore/src/bitmask/gaps.rs | Updated clear_disk_cache import to use fadvise module. |
lib/common/memory/src/madvise.rs | Removed clear_disk_cache implementation. |
lib/common/memory/src/lib.rs | Added export for new fadvise module. |
lib/common/memory/src/fadvise.rs | New module that implements clear_disk_cache and the OneshotFile wrapper. |
lib/common/memory/build.rs | New build script to enable posix_fadvise support conditionally. |
lib/common/memory/Cargo.toml | Added new dependency on delegate. |
* Move clear_disk_cache to memory::fadvise * Add memory::fadvise::OneshotFile * Use OneshotFile
Related: #6323.
Consider this simplified example:
The problem with this code is that after calling this function, the file contents is stored twice in the memory: once as
Vec<u8>
(as anonymous memory), and once implicitly as OS file cache, making it more likely to evict other files from the cache.Particularly, after reproducing #6323, I've found that quantized vectors and id tracker files are still in the OS file cache when the mmaped index cache gets evicted. Since we are unlikely to read quantized vectors again, we'd better to avoid polluting the OS file cache.
This PR adds a new wrapper around
File
,OneshotFile
, that callsposix_fadvise(…, POSIX_FADV_DONTNEED)
when the file is closed. Not ideal, but should improve the situation a bit.