-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix some wrong vector-read IO measurements #6365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
(1 hunks)lib/segment/src/vector_storage/chunked_vectors.rs
(1 hunks)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
(1 hunks)lib/segment/src/vector_storage/quantized/quantized_custom_query_scorer.rs
(2 hunks)lib/segment/src/vector_storage/quantized/quantized_query_scorer.rs
(4 hunks)lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
(1 hunks)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
lib/segment/src/vector_storage/quantized/quantized_custom_query_scorer.rs (1)
lib/segment/src/data_types/query_context.rs (1)
hardware_counter
(187-189)
lib/segment/src/vector_storage/quantized/quantized_query_scorer.rs (1)
lib/segment/src/data_types/query_context.rs (1)
hardware_counter
(187-189)
🪛 GitHub Actions: Storage compatibility tests
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
[error] 45-45: no method named is_on_disk
found for reference &'a QuantizedVectorStorage
in the current scope. Consider implementing the required traits.
🪛 GitHub Actions: Rust tests
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
[error] 45-45: no method named is_on_disk
found for reference &'a QuantizedVectorStorage
in the current scope
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consensus-compose
- GitHub Check: lint
🔇 Additional comments (8)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (1)
181-183
: Fixed incorrect disk state reportingThe vector storage is now correctly reporting as in-memory storage. This aligns with the comment on line 29 that describes it as "In-memory vector storage with on-update persistence". The prior
true
return value was causing inaccurate I/O measurements.lib/segment/src/vector_storage/chunked_vectors.rs (1)
224-226
: Fixed incorrect disk state reporting for ChunkedVectorsThe change correctly identifies that
ChunkedVectors<u8>
is an in-memory structure, not an on-disk structure. While it has methods to load from and save to files, the primary storage is in memory, so returningfalse
foris_on_disk()
is appropriate.lib/segment/src/vector_storage/quantized/quantized_custom_query_scorer.rs (2)
75-77
: Fixed vector I/O read multiplier for on-disk storageChanged the multiplier from
size_of::<TElement>()
to a constant value of1
for better accuracy in I/O measurements when vectors are on disk. This ensures consistent measurement of vector reads regardless of the element type size.
131-133
: Fixed vector I/O read multiplier for on-disk storage (in new_multi method)Same fix as in the
new
method, ensuring consistent I/O measurement across both single and multi-vector operations.lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
167-169
: Improved on-disk state determination by delegating to underlying vectorsRather than hardcoding
true
, the implementation now properly delegates to the underlying vector storage'sis_on_disk()
method. This creates a more accurate representation of the storage's actual state, which is essential for correct I/O measurements.lib/segment/src/vector_storage/chunked_mmap_vectors.rs (1)
423-424
: Improved accuracy for determining on-disk storage status.The change now correctly determines if vectors are stored on disk based on the
populate
configuration value instead of using a hardcoded response. This will lead to more accurate vector I/O measurements since vectors that are fully loaded into memory (populate
is true) are not considered on-disk anymore.lib/segment/src/vector_storage/quantized/quantized_query_scorer.rs (2)
37-38
: Improved hardware counter handling based on storage location.The code now properly makes the hardware counter mutable and adjusts the vector I/O read multiplier based on whether the vectors are on disk. This will ensure more accurate measurement of vector I/O operations.
Also applies to: 48-52
67-68
: Consistent implementation in the multi-vector version.The same logic is correctly applied to the
new_multi
method, maintaining consistency with the single vector implementation.Also applies to: 83-87
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
Outdated
Show resolved
Hide resolved
f79f2ac
to
920832a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
lib/common/common/src/counter/counter_cell.rs (2)
162-170
: Consider the performance impact of backtrace collectionThe track method collects 9 levels of backtrace information for each tracked operation, which could impact performance if called frequently. Consider adding a conditional compilation flag or a configuration option to disable this in performance-critical deployments.
192-206
: Consider using a logging framework for debug outputThe debug method uses println! for output, which might not be ideal in a library codebase. Consider using a logging framework (like log, slog, or tracing) to allow consumers to control how and where the output appears.
pub fn debug(&self, name: &str) { let tracks = self.get_tracks(); if tracks.is_empty() { return; } let mut list: Vec<_> = tracks.into_iter().collect(); list.sort_by_key(|i| i.1); list.reverse(); - println!("\n\n{name}:"); + log::debug!("\n\n{name}:"); for (k, v) in list { - println!("\n\t{}", k.functions.clone().join("\n\t")); - println!("\t==> {} [{} stack(s)]", v.0, v.1); + log::debug!("\n\t{}", k.functions.clone().join("\n\t")); + log::debug!("\t==> {} [{} stack(s)]", v.0, v.1); } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (19)
lib/collection/src/shards/local_shard/query.rs
(2 hunks)lib/common/common/Cargo.toml
(1 hunks)lib/common/common/src/counter/counter_cell.rs
(4 hunks)lib/common/common/src/counter/hardware_accumulator.rs
(6 hunks)lib/common/common/src/counter/hardware_counter.rs
(2 hunks)lib/common/common/src/lib.rs
(1 hunks)lib/segment/src/vector_storage/chunked_mmap_vectors.rs
(1 hunks)lib/segment/src/vector_storage/chunked_vectors.rs
(1 hunks)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
(1 hunks)lib/segment/src/vector_storage/quantized/quantized_custom_query_scorer.rs
(2 hunks)lib/segment/src/vector_storage/quantized/quantized_query_scorer.rs
(4 hunks)lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
(1 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs
(1 hunks)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
(1 hunks)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
(2 hunks)lib/segment/src/vector_storage/raw_scorer.rs
(1 hunks)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
(1 hunks)lib/segment/src/vector_storage/vector_storage_base.rs
(1 hunks)lib/storage/src/content_manager/toc/request_hw_counter.rs
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- lib/segment/src/vector_storage/raw_scorer.rs
- lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
- lib/collection/src/shards/local_shard/query.rs
🚧 Files skipped from review as they are similar to previous changes (7)
- lib/segment/src/vector_storage/quantized/quantized_custom_query_scorer.rs
- lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
- lib/segment/src/vector_storage/chunked_vectors.rs
- lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
- lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
- lib/segment/src/vector_storage/chunked_mmap_vectors.rs
- lib/segment/src/vector_storage/quantized/quantized_query_scorer.rs
🧰 Additional context used
🧬 Code Graph Analysis (4)
lib/segment/src/vector_storage/vector_storage_base.rs (4)
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (1)
name
(51-51)lib/segment/src/types.rs (3)
name
(1690-1692)name
(1725-1727)name
(1794-1799)lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs (1)
name
(239-241)lib/segment/src/id_tracker/id_tracker_base.rs (2)
name
(130-130)name
(372-379)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (14)
lib/segment/src/vector_storage/chunked_vectors.rs (1)
is_on_disk
(224-226)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
is_on_disk
(167-169)lib/segment/src/vector_storage/chunked_mmap_vectors.rs (1)
is_on_disk
(423-425)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (1)
is_on_disk
(181-183)lib/segment/src/vector_storage/vector_storage_base.rs (2)
is_on_disk
(42-42)is_on_disk
(583-609)lib/segment/src/index/hnsw_index/hnsw.rs (1)
is_on_disk
(177-179)lib/segment/src/fixtures/index_fixtures.rs (1)
is_on_disk
(61-63)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
is_on_disk
(174-176)lib/quantization/src/encoded_vectors_pq.rs (1)
is_on_disk
(478-480)lib/quantization/src/encoded_vectors_binary.rs (1)
is_on_disk
(309-311)lib/quantization/src/encoded_vectors_u8.rs (1)
is_on_disk
(321-323)lib/segment/src/index/vector_index_base.rs (1)
is_on_disk
(103-117)lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (1)
is_on_disk
(257-259)lib/segment/src/types.rs (2)
is_on_disk
(552-557)is_on_disk
(1187-1189)
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (6)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
std
(144-144)lib/segment/src/vector_storage/chunked_mmap_vectors.rs (2)
std
(63-63)std
(118-118)lib/segment/src/vector_storage/vector_storage_base.rs (1)
std
(141-141)lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (1)
std
(326-326)src/common/inference/service.rs (1)
type_name
(65-71)lib/segment/src/data_types/query_context.rs (1)
hardware_counter
(187-189)
lib/common/common/src/counter/hardware_accumulator.rs (2)
lib/common/common/src/counter/hardware_counter.rs (2)
default
(209-211)vector_io_read
(148-150)lib/common/common/src/counter/counter_cell.rs (1)
default
(218-223)
🪛 GitHub Actions: Storage compatibility tests
lib/common/common/src/lib.rs
[error] 1-1: #![feature]
may not be used on the stable release channel.
[error] 3-3: file not found for module backtrace_tracker
. To create the module backtrace_tracker
, create file 'lib/common/common/src/backtrace_tracker.rs' or 'lib/common/common/src/backtrace_tracker/mod.rs'.
🪛 GitHub Actions: Rust tests
lib/common/common/src/lib.rs
[error] 1-1: #![feature]
may not be used on the stable release channel.
[error] 3-3: file not found for module backtrace_tracker
. To create the module backtrace_tracker
, create file 'lib/common/common/src/backtrace_tracker.rs' or 'lib/common/common/src/backtrace_tracker/mod.rs'.
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: lint
🔇 Additional comments (21)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
108-125
: Implementation looks good and matches the project's requirements.The addition of the
is_on_disk()
method forQuantizedVectorStorage
correctly differentiates between memory-based (RAM) storage variants that returnfalse
and memory-mapped (Mmap) storage variants that returntrue
. This implementation aligns with other similar implementations in the codebase and supports the PR's objective of fixing vector-read IO measurements.lib/segment/src/vector_storage/vector_storage_base.rs (1)
287-317
:❓ Verification inconclusive
Implementation of the name method looks good
The new
name
method provides a clean way to get a string representation of each vector storage variant. This is consistent with similar implementations in other parts of the codebase (like intypes.rs
andid_tracker_base.rs
).Can you verify how this name method is used in relation to the PR objective of fixing "wrong vector-read IO measurements"? Since the PR mentions adjusting multipliers for quantized scorers, it would be helpful to understand the connection between storage type identification and measurement corrections.
🏁 Script executed:
#!/bin/bash # Search for references to the new name method across the codebase rg -A 5 "vector_storage.*name\(\)" --type rustLength of output: 609
Review Verification: 'name' Method Usage and Its Relation to IO Measurements
The implementation of the
name()
method is correct and consistent with similar patterns in the codebase. Our search confirms that this method is used—for example, inlib/segment/src/vector_storage/raw_scorer.rs
—mainly for logging and runtime identification of the vector storage variants. However, there is no direct evidence that it actively influences the multipliers for quantized scorers or the vector-read IO measurements. Its role appears to be limited to providing a debug-friendly string representation.
- Usage Confirmation: The method is invoked in
raw_scorer.rs
(e.g., within aprintln!
statement) to log the vector storage type.- PR Objective Relation: While the PR aims at fixing IO measurement issues by adjusting multipliers for quantized scorers, this adjustment logic is not directly tied to the
name()
output. The storage identification here is more for traceability and debugging rather than affecting measurement computations.Please verify that the adjustments for the multipliers are correctly implemented where the storage type is utilized beyond this logging mechanism.
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (1)
1-1
: Add thetype_name
import for debugging.The
type_name
import is necessary for the new debug logging you're adding below to identify the vector storage type.lib/common/common/src/counter/hardware_counter.rs (3)
2-2
: Updated import to includeFullHwTracker
The updated import now brings in the
FullHwTracker
struct which is used in the newly implemented methods.
183-199
: Well-structured tracker handling implementationThe new
full_trackers
method effectively collects all trackers from the counter and applies the appropriate multipliers. This implementation enables more accurate tracking of hardware resource usage, particularly for vector I/O operations.
203-203
: Enhanced accumulator with tracker informationThe addition of tracker accumulation ensures that detailed tracking data is preserved along with the hardware metrics, which will help provide more accurate measurements for vector I/O reads as mentioned in the PR description.
lib/common/common/src/counter/hardware_accumulator.rs (8)
4-4
: Added import for CounterTrackerThe new import provides access to the CounterTracker type used in the FullHwTracker implementation.
108-108
: Added tracker field to HwMeasurementAccAdding the counter_tracker field to the HwMeasurementAcc struct enhances its capability to track detailed hardware metrics.
118-118
: Initialized counter_tracker in all constructorsProperly initializing the counter_tracker field in all constructors maintains consistency and prevents potential null reference issues.
Also applies to: 130-130, 148-148
152-154
: Added debug method for trackersThe debug_tracks method provides a convenient way to output the tracking information for debugging purposes.
162-186
: Implemented tracker accumulationThe accumulate_tracker method effectively merges tracker data from different sources, ensuring that hardware resource usage is accurately tracked and aggregated.
260-260
: Updated Clone implementationThe Clone implementation now properly includes the counter_tracker field, ensuring that all data is properly duplicated when cloning a HwMeasurementAcc instance.
265-274
: Well-structured FullHwTracker implementationThe FullHwTracker struct provides a comprehensive collection of all hardware tracking metrics, enabling more accurate monitoring of system resource usage.
277-285
: Useful debug method for FullHwTrackerThe debug method provides a convenient way to output detailed tracking information for each hardware component.
lib/common/common/src/counter/counter_cell.rs (7)
2-5
: Updated imports for CounterTracker functionalityThe added imports support the new CounterTracker functionality with HashMap for tracking data and synchronization primitives for thread safety.
10-13
: Enhanced CounterCell with tracking capabilityAdding the tracker field to CounterCell enables detailed tracking of counter operations, which helps in diagnosing and fixing the vector-read IO measurement issues mentioned in the PR.
26-26
: Properly initialized tracker in constructorsThe tracker is properly initialized with a new CounterTracker instance, ensuring it's ready for use.
30-32
: Added getter for accessing the trackerThe get_tracker method provides a way to access the tracking data, which is essential for accumulating and analyzing hardware metrics.
57-57
: Updated core methods to use the trackerCore counter methods now record their operations in the tracker, enabling detailed analysis of counter usage patterns.
Also applies to: 64-64
149-154
: Well-designed CounterTracker structThe CounterTracker implementation effectively combines backtrace tracking with thread-safe data storage via Arc.
209-224
: Well-implemented Debug and Default traitsThe Debug and Default implementations are clean and follow Rust best practices.
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
Outdated
Show resolved
Hide resolved
920832a
to
095ffde
Compare
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
Outdated
Show resolved
Hide resolved
lib/segment/src/vector_storage/quantized/quantized_query_scorer.rs
Outdated
Show resolved
Hide resolved
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
Outdated
Show resolved
Hide resolved
1726ff7
to
7d15377
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/segment/src/vector_storage/quantized/quantized_query_scorer.rs
(4 hunks)lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/segment/src/vector_storage/quantized/quantized_query_scorer.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (1)
lib/segment/src/data_types/query_context.rs (1)
hardware_counter
(187-189)
🪛 GitHub Actions: Formatter and linter
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
[error] 45-45: cannot find value quantized_data
in this scope
🪛 GitHub Actions: Rust tests
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
[error] 45-45: cannot find value quantized_data
in this scope
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consistency
- GitHub Check: test
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
Outdated
Show resolved
Hide resolved
7d15377
to
5596032
Compare
📝 WalkthroughWalkthroughThe changes update how disk storage status is determined across several vector storage components. Multiple implementations of the Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used🧬 Code Graph Analysis (1)lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (2)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🔇 Additional comments (1)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
5596032
to
375c472
Compare
375c472
to
270d4f1
Compare
* Fix some wrong vector_storage io measurements * Review remarks
Fixes some wrong measurements for vector storage IO reads.
Note: For quantized scorers, we already account for the units inside scoring and don't need it as multiplier.
For this reason, multiplier has been changed to
1
.