-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Payload Index Io measurements with Iterator #6132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ed264d0
to
6f3ecd4
Compare
a5a87ee
to
694f725
Compare
lib/segment/src/index/field_index/map_index/mutable_map_index.rs
Outdated
Show resolved
Hide resolved
I realized we could use the precomputed counts more in mmap bool index. Added the change in a75d952 |
83fb468
to
4f9a385
Compare
2a7e68a
to
3cb1eb9
Compare
3cb1eb9
to
61a74a6
Compare
📝 WalkthroughWalkthroughThis pull request implements significant updates to hardware counter integration and measurement across various modules. Notable changes include the removal of the Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
387-395
: Validate log-based overhead calculation for I/O reads.Using
2 * ((self.total_unique_values_count() as f32).log2().ceil() as usize)
to estimate binary search overhead is reasonable, but please ensure it accurately reflects edge cases with very low or high unique values.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
lib/common/common/src/counter/hardware_counter.rs
(6 hunks)lib/common/common/src/iterator_ext/mod.rs
(2 hunks)lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs
(6 hunks)lib/segment/src/index/field_index/bool_index/mod.rs
(2 hunks)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
(8 hunks)lib/segment/src/index/field_index/facet_index.rs
(3 hunks)lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs
(2 hunks)lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs
(1 hunks)lib/segment/src/index/field_index/geo_index/mod.rs
(1 hunks)lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs
(3 hunks)lib/segment/src/index/field_index/immutable_point_to_values.rs
(1 hunks)lib/segment/src/index/field_index/map_index/immutable_map_index.rs
(3 hunks)lib/segment/src/index/field_index/map_index/mmap_map_index.rs
(3 hunks)lib/segment/src/index/field_index/map_index/mod.rs
(2 hunks)lib/segment/src/index/field_index/map_index/mutable_map_index.rs
(3 hunks)lib/segment/src/index/field_index/null_index/mmap_null_index.rs
(3 hunks)lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs
(1 hunks)lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs
(1 hunks)lib/segment/src/index/field_index/numeric_index/mod.rs
(3 hunks)lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
(3 hunks)lib/segment/src/index/hnsw_index/hnsw.rs
(1 hunks)lib/segment/src/segment/facet.rs
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Formatter and linter
lib/common/common/src/iterator_ext/mod.rs
[error] 108-108: this expression creates a reference which is immediately dereferenced by the compiler. Change this to: hw_cell
.
[error] 1-1: could not compile common
(lib) due to 1 previous error.
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
🔇 Additional comments (57)
lib/segment/src/segment/facet.rs (1)
79-79
: Simplify hardware counter usage.Changed from
hw_counter.new_accumulator()
to directly passing thehw_counter
reference, which aligns with the new hardware counter integration approach across the codebase.lib/common/common/src/counter/hardware_counter.rs (2)
20-20
: LGTM: New disposable tracking field added.The new
disposable
boolean field allows tracking whether a hardware counter is disposable (doesn't report usage).
35-35
: Consistent implementation of disposable tracking.The
disposable
field is properly initialized and maintained throughout the implementation, including in the constructor, fork method, and field destructuring.Also applies to: 52-52, 65-66, 87-87, 170-171
lib/segment/src/index/field_index/immutable_point_to_values.rs (1)
36-36
: Improved flexibility with FnMut instead of Fn.Changed the
check_fn
parameter fromimpl Fn(&N) -> bool
toimpl FnMut(&N) -> bool
, allowing closures to mutate their captured environment. This is a backward-compatible improvement that enables more flexible usage patterns.lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
245-249
: Good addition of hardware counter measurement for deleted flag read operations.Adding hardware counter measurements for deleted flag reads properly tracks I/O operations in the
check_values_any
method, which will help with performance analysis and optimization.lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)
4-4
: Good addition of IteratorExt import.The addition of this import is necessary to support the hardware counter measurements in the filter method.
238-239
: Effective integration of hardware measurement for non-empty value filtering.Using
measure_hw_with_cell
properly integrates hardware counter measurements into iterator operations for non-empty values, which will provide more accurate metrics for performance analysis.
250-251
: Effective integration of hardware measurement for null value filtering.Similar to the non-empty value filtering, this properly integrates hardware counter measurements into iterator operations for null values, maintaining consistency across both filtering paths.
lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (2)
6-6
: Good addition of BUCKET_OFFSET_OVERHEAD import.This import is necessary for the calculation in the updated
check_values_any
method.
81-92
: Excellent optimization of hardware counter updates.The refactoring of the
check_values_any
method to accumulate checked values before incrementing the hardware counter is a significant optimization. This approach reduces the number of hardware counter updates, which can be expensive when called frequently.The calculation now properly includes both the total size of checked GeoPoints and the bucket overhead, providing a more accurate measurement of memory access costs.
lib/segment/src/index/field_index/geo_index/mod.rs (1)
189-189
: Good update to support hardware counter measurement for Mutable variant.This change ensures the
hw_counter
parameter is consistently passed to all variants of theGeoMapIndex
enum, maintaining uniform hardware counter tracking across all index types.lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (3)
7-7
: Added IteratorExt trait for improved hardware counter measurements.The addition of the
IteratorExt
trait import enables a more consistent approach to hardware counter measurements during iterations.
83-88
: Improved hardware counter measurement in check_values_any using IteratorExt.The implementation now uses
measure_hw_with_cell
from theIteratorExt
trait, which provides a more streamlined approach to hardware counter increments during iteration. This replaces what was likely a manual counter increment for each value check.
118-120
: Enhanced hardware counter measurement in values_range.The change replaces a likely
inspect
method call with the more consistentmeasure_hw_with_cell
method, improving the way hardware counter increments are handled during range iterations.lib/segment/src/index/field_index/map_index/mmap_map_index.rs (5)
9-9
: Added IteratorExt trait for improved hardware counter measurements.The addition of the
IteratorExt
trait import enables the use of specialized methods for hardware counter measurements during iterations.
165-169
: Added hardware counter measurement for deleted field access.The change explicitly measures the access to the
deleted
field, adding the size of a boolean to the hardware counter. This ensures the I/O cost of checking if a point is deleted is properly accounted for.
291-294
: Updated method signature to use HardwareCounterCell.The signature change from using
HwMeasurementAcc
toHardwareCounterCell
aligns with the approach used throughout the codebase, allowing for more consistent hardware counter tracking.
296-298
: Added hardware counter measurement for key bytes.This change appropriately tracks the I/O cost of accessing the key bytes by incrementing the hardware counter with the bytes written by the key.
305-308
: Enhanced hardware counter measurement during filtering.The implementation now uses
measure_hw_with_cell
to track the hardware usage for each point offset during the iteration, ensuring accurate measurement of performance impact.lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (1)
176-187
: Improved hardware counter measurement logic in check_values_any.The function now separates the counting of function calls from the execution of the check function. This approach:
- Counts the number of calls to the check function using a local variable
- Increments the hardware counter once after all checks are completed
- Returns the result of the check operation
This is more efficient than incrementing the counter for each value check and provides a clearer separation of concerns.
lib/segment/src/index/hnsw_index/hnsw.rs (2)
1223-1223
: Use hardware counter from query context.Instead of creating a disposable hardware counter, the code now correctly uses the hardware counter from the query context. This ensures that all measurements are properly tracked in a single counter for the entire query operation.
1226-1226
: Pass hardware counter by reference.The call to
estimate_cardinality
now correctly passes the hardware counter by reference, allowing the cardinality estimation to add its measurements to the same counter used throughout the query execution.lib/segment/src/index/field_index/facet_index.rs (2)
20-23
: Ensure proper lifetime consistency for hardware counter references.The change to accept a borrowed hardware counter with a lifetime
'a
is consistent with returning an iterator that borrows the counter. No immediate issues are found.
60-75
: Confirm thorough integration of hardware counter references.Forwarding the
hw_counter
reference toFacetIndex::iter_values_map(*index, hw_counter)
is aligned with the trait's signature. Ensure any downstream iteration leverages the counter as intended without missing increments.lib/segment/src/index/field_index/numeric_index/mod.rs (1)
698-709
: Use consistent hardware counter for estimate_paths.The updated signature ensures
hw_counter
usage inestimate_points
is propagated to this branch. Confirm that enough test coverage exists to validate these metrics thoroughly.lib/segment/src/index/field_index/map_index/mutable_map_index.rs (4)
7-7
: Import usage confirmed.Introducing
common::iterator_ext::IteratorExt
is consistent with the usage for measuring iteration overhead. No issues observed.
179-188
: Prefer calculating dynamic memory size using item count.Using
size_of_val(p)
only covers the set overhead, missing the per-item size. For aBTreeSet<PointOffsetType>
, considerp.len() * size_of::<PointOffsetType>()
for more accurate counting.
194-211
: Hardware counter usage in iterator mapping is well-structured.Incrementing the counter for both the key’s mmapped size and each item in the set provides granular I/O measurement. This is a sensible approach that integrates seamlessly with
measure_hw_with_cell
.
224-224
: Use per-item size for counting dynamic collection usage.As with the earlier comment,
size_of_val(ids)
will likely underestimate the real memory footprint of the collection. Usingids.len() * size_of::<PointOffsetType>()
would be more accurate.lib/segment/src/index/field_index/map_index/mod.rs (1)
210-213
: Signature updated to use hardware counter cell referenceThe method signature has been appropriately updated to use a reference to
HardwareCounterCell
instead of a disposable accumulator, with correct lifetime annotations to ensure the iterator's lifetime is bound to both the index and the hardware counter reference.lib/common/common/src/iterator_ext/mod.rs (3)
7-9
: New imports added for hardware counter supportThe necessary imports have been added to support the new hardware counter measurement functionality.
50-71
: New method added to measure hardware usage with accumulatorThis new method properly implements hardware usage measurement using an accumulator, correctly handling the iteration count and excluding the final
None
item.
79-93
: New method added to measure hardware usage with counter cell referenceThis method provides a cleaner way to measure hardware usage by directly accepting a counter cell reference, avoiding the need to create a disposable accumulator.
lib/segment/src/index/field_index/bool_index/mod.rs (2)
28-35
: Updated to use hardware counter cell reference in BoolIndexThe
iter_values_map
method has been correctly updated to accept a reference toHardwareCounterCell
with proper lifetime annotations, ensuring that the delegated calls to the underlying implementations also receive the counter.
170-176
: Updated FacetIndex implementation to match signature changesThe implementation for
FacetIndex
trait has been properly updated to match the signature changes inBoolIndex
, maintaining consistency throughout the codebase.lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs (3)
7-8
: New imports for iterator extension and bucket overhead constantAdded necessary imports to support the new hardware counter measurement functionality in geo index.
182-182
: Delegate signature updated for hardware counter supportThe delegate method signature has been appropriately updated to include the hardware counter parameter.
214-235
: Enhanced check_values_any with hardware counter measurementsThe method now properly measures hardware operations during value checking:
- Records a read counter increment for the bucket overhead access
- Uses the new
measure_hw_with_cell
method to track individual geo point readsThis implementation correctly handles the hardware counter measurements while preserving the original functionality.
lib/segment/src/index/field_index/map_index/immutable_map_index.rs (3)
265-269
: Streamlined variable binding and hardware measurement
Removing the explicit type annotation inlet v = v.borrow();
simplifies the code, and callingmmapped_size
to track the payload IO overhead adheres to the standardized hardware counter usage.
295-303
: Ensuring consistent overhead tracking
Adding thehw_counter
parameter toget_count_for_value
and incrementing withBUCKET_OFFSET_OVERHEAD
plussize_of_val(entry)
offers fine-grained visibility into the access cost. Just verify thatBUCKET_OFFSET_OVERHEAD
is applied once per logical retrieval when intended, to avoid double counting.
312-315
: Hardware counter integration for value iteration
The updated signature now properly accepts and propagates the hardware counter, enabling iteration cost measurements. The approach of wrapping the iterator withcopied()
is reasonable for producingIdIter
.Also applies to: 319-319
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (5)
5-5
: Imported iterator extension for hardware measurements
Bringing incommon::iterator_ext::IteratorExt
is necessary to access.measure_hw_with_cell(...)
functions.
133-140
: Simplified boolean count retrieval
The newget_count_for
method neatly returns either thetrues_count
orfalses_count
. The approach is concise and aligns with existing naming.
214-226
: Incorporating hardware counter usage in iterator
Passinghw_counter
toiter_values_map
and measuring iteration overhead improves observability. The code snippet using.measure_hw_with_cell
cleanly integrates counters with the returned iterator.
400-407
: Measured iteration for boolean slice filtering
Using.measure_hw_with_cell_and_fraction(hw_counter, u8::BITS as usize, ...)
perfectly tracks bit-level overhead, ensuring accurate IO read counter increments for each set bit.
460-461
: Refined block creation arguments
Reusingmake_block(self.trues_count, true, key.clone())
andmake_block(self.falses_count, false, key)
maintains clarity while avoiding unnecessary duplication of the payload key.lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (11)
5-5
: Extended iterator utilities import
This import is necessary to leverage measured iteration utilities, such as.measure_hw_with_cell(...)
.
26-28
: Hardware counter usage imports
IntroducingHardwareCounterCell
andIteratorExt
within thememory
module paves the way for bit iteration measurements, ensuring cohesive usage of the new hardware tracking APIs.
176-187
: Measured iteration for “true” bits
iter_has_true_measured
integrates the hardware counter for each selected bit usingu8::BITS
to reflect bit indexing overhead. This is consistent with the codebase’s measurement model.
192-203
: Measured iteration for “false” bits
Similarly,iter_has_false_measured
provides a parallel measured flow for points with false flags, ensuring we capture bit-level reads.
255-255
: Explicit boolean overhead measurement
Incrementing bysize_of::<bool>()
when checking values is an effective way to record read costs for boolean data.
282-285
: Adding hardware counter to values map iterator
Acceptinghw_counter
initer_values_map
ensures iteration over the two Boolean sets can now be measured directly, aligning with parallel changes in other index implementations.
291-293
: Size-based overhead tracking
Usingsize_of::<usize>()
within.measure_hw_with_cell
is coherent for measuring the fixed overhead of iterating over each item in the map array.
375-375
: Filter method instrumentation
Addinghw_counter
to thefilter
method signature paves the way for measured partial iteration in specialized boolean checks.
382-385
: Measured filtering for true/false values
Leveraging the newly introducediter_has_true_measured
anditer_has_false_measured
in thefilter
method ensures hardware usage data is recorded consistently for each flagged point.
394-395
: Pre-emptive cardinality measurement
Addinghw_counter
toestimate_cardinality
extends the measurement coverage to cardinality checks.
406-408
: Counting overhead
Incrementing the counter bycount
effectively captures the read overhead for all matching boolean entries. Consider verifying that largecount
values reflect the desired measurement.
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs
Outdated
Show resolved
Hide resolved
61a74a6
to
7d7549b
Compare
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs
Outdated
Show resolved
Hide resolved
lib/segment/src/index/field_index/map_index/mutable_map_index.rs
Outdated
Show resolved
Hide resolved
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs
Outdated
Show resolved
Hide resolved
/// - `hw_acc`: accumulator holding a counter cell | ||
/// - `multiplier`: multiplies the number of iterations by this factor. | ||
/// - `f`: Closure to get the specific counter to increase from the cell inside the accumulator. | ||
fn measure_hw_with_acc<R>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't feel like Hw related functions should be in the pretty low-level IteratorExt
. I would prefer to have it on higher level of abstraction, similar to our discussion about MmapHashmap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put them into a separate Iterator extension into the counter
module directly to our Hardware measurement related objects.
{ | ||
OnFinalCount::new(self, move |i| { | ||
let hw_counter = hw_acc.get_counter_cell(); | ||
// Subtract 1 to not account for the latest `None` call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that i
in this context is not a number of values in the iterator?
If so, shouldn't it be fixed inside OnFinalCount
? Or if it is intended, what's a true meaning of i
? I would prefer a more explicit naming if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i
represents how many times we called .next()
on an iterator, which might be less than the actual amount of items in the iterator e.g. if we stopped early.
The call of next()
that returns None
is counted too in i
. This behavior is documented on the on_final_count
function of this trait. Because we don't want this measurement, we subtract 1
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
291-312
: Ensure consistent overhead measurement for keys and items.Accounting for the key size with
k.write_bytes()
is helpful. However, if your storage system reads data in larger chunks, consider verifying whether multiple smaller calls toincr_delta
align with your measurement goals. Also, the.filter()
and.measure_hw_with_cell(...)
usage looks correct for capturing the iteration overhead.Consider providing inline documentation on how these measurements relate to real-world I/O overhead or chunked reads.
lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (2)
282-294
: Consider using the new measured iteratorsThe
iter_values_map
method is currently measuring the outer iterator but still using the unmeasured inner iterators (iter_has_false
anditer_has_true
). For consistency with other changes in this PR, consider using the newly added measured iterators.pub fn iter_values_map<'a>( &'a self, hw_counter: &'a HardwareCounterCell, ) -> impl Iterator<Item = (bool, IdIter<'a>)> + 'a { [ - (false, Box::new(self.memory.iter_has_false()) as IdIter), - (true, Box::new(self.memory.iter_has_true()) as IdIter), + (false, Box::new(self.memory.iter_has_false_measured(hw_counter)) as IdIter), + (true, Box::new(self.memory.iter_has_true_measured(hw_counter)) as IdIter), ] .into_iter() .measure_hw_with_cell(hw_counter, size_of::<usize>(), |i| { i.payload_index_io_read_counter() }) }
246-261
: Consider using measured iterators for consistencyThe
check_values_any
method manually increments the counter but still uses the unmeasured methods (values_has_true
andvalues_has_false
). For more consistent hardware measurement, consider refactoring this to use the measured iterators.pub fn check_values_any( &self, point_id: PointOffsetType, is_true: bool, hw_counter: &HardwareCounterCell, ) -> bool { - hw_counter - .payload_index_io_read_counter() - .incr_delta(size_of::<bool>()); - if is_true { - self.values_has_true(point_id) + // Use the measured iterator and check if point_id is in the result + self.memory.iter_has_true_measured(hw_counter).any(|id| id == point_id) } else { - self.values_has_false(point_id) + self.memory.iter_has_false_measured(hw_counter).any(|id| id == point_id) } }Note: This refactoring might be less efficient if the iterators scan through many points, so this optimization should be evaluated for performance impact before implementation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
lib/common/common/src/counter/hardware_counter.rs
(0 hunks)lib/common/common/src/counter/iterator_hw_measurement.rs
(1 hunks)lib/common/common/src/counter/mod.rs
(1 hunks)lib/common/common/src/iterator_ext/mod.rs
(1 hunks)lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs
(6 hunks)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
(8 hunks)lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs
(3 hunks)lib/segment/src/index/field_index/map_index/immutable_map_index.rs
(3 hunks)lib/segment/src/index/field_index/map_index/mmap_map_index.rs
(3 hunks)lib/segment/src/index/field_index/map_index/mod.rs
(2 hunks)lib/segment/src/index/field_index/map_index/mutable_map_index.rs
(2 hunks)lib/segment/src/index/field_index/null_index/mmap_null_index.rs
(3 hunks)lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
(3 hunks)
💤 Files with no reviewable changes (1)
- lib/common/common/src/counter/hardware_counter.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- lib/segment/src/index/field_index/map_index/mutable_map_index.rs
- lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs
- lib/segment/src/index/field_index/map_index/mod.rs
- lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (20)
lib/common/common/src/counter/mod.rs (1)
5-5
: New module added for iterator hardware measurementThe addition of this module is clean and follows the existing pattern in the codebase. This module will provide iterator extension capabilities for hardware performance measurements.
lib/common/common/src/iterator_ext/mod.rs (1)
7-8
: Visibility change for on_final_count moduleChanging the visibility of the
on_final_count
module from private topub(super)
allows access from the parent module while maintaining encapsulation from external modules. This change supports the integration of hardware measurement functionalities.lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)
4-4
: Import added for hardware measurement extension traitThe import of
HwMeasurementIteratorExt
correctly prepares the file for using hardware measurement functionality with iterators.
238-239
: Added hardware measurement for non-empty values filterThe implementation correctly uses the
measure_hw_with_cell
extension method to track hardware resource usage during iteration, with appropriate read counter tracking for payload index operations.
250-251
: Added hardware measurement for null values filterSimilar to the non-empty values filter, this implementation correctly measures hardware usage during iteration over null values, maintaining consistency in how hardware measurements are tracked.
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (3)
7-7
: Import added for hardware measurement extension traitThe import of
HwMeasurementIteratorExt
correctly prepares the file for using hardware measurement functionality with iterators.
118-120
: Improved hardware measurement in values_range methodThe implementation now uses
measure_hw_with_cell
to track hardware resource usage during iteration over values in a range, with proper sizing based on the type T. This is a cleaner approach than the previous implementation.
83-89
:❓ Verification inconclusive
Improved hardware measurement in check_values_any method
The implementation now elegantly uses the
measure_hw_with_cell
extension method to track hardware resource usage during iteration, with proper sizing based on the type being iterated.However, there might be a missing import for
std::mem::size_of
that's used here. Verify that it's imported elsewhere in the file.
🏁 Script executed:
#!/bin/bash # Check if size_of is imported in the file rg "use std::mem::size_of" lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rsLength of output: 101
Attention: Verify the
std::mem::size_of
importThe updated
measure_hw_with_cell
implementation clearly improves hardware tracking by using type-based sizing. However, our verification did not reveal any explicit import forstd::mem::size_of
inlib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
. Please confirm that this function is either imported indirectly (e.g., via a glob import fromstd::mem
) or add an explicit import (use std::mem::size_of;
) to ensure the code compiles reliably.
- File:
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
- Relevant Lines: 83–89 (usage of
size_of::<T>()
)lib/segment/src/index/field_index/map_index/mmap_map_index.rs (2)
9-9
: Add import forHwMeasurementIteratorExt
.Good addition. Importing the extension trait is necessary to leverage hardware measurement for iterators.
165-168
: Confirm overhead measurement for bit-slice access.Using
size_of::<bool>()
may underrepresent the actual cost. SinceMmapBitSlice
manages data in a bit-level store, the real overhead might be different than reading a singlebool
.Would you like me to run a script to gather more evidence of how
self.deleted
is accessed elsewhere in the code?lib/common/common/src/counter/iterator_hw_measurement.rs (1)
1-70
: Solid extension trait for hardware usage measurement.Using
OnFinalCount::new
and subtracting one iteration to avoid double-counting the finalNone
is a neat approach. Ensure you won't miss counts when the iterator stops early or yieldsNone
in a non-standard pattern.Do you want me to run a script to check how these methods are invoked throughout the codebase to ensure they always handle the final iteration as intended?
lib/segment/src/index/field_index/map_index/immutable_map_index.rs (3)
264-265
: Simplify variable binding.Switching from explicit type annotation to
let v = v.borrow();
reduces verbosity. Measuring the size viammapped_size(v.as_referenced())
is valid as long asN
implementsMmapValue
in a way consistent with your hardware measurement usage.
294-300
: Validate container read overhead.
size_of_val(entry)
measures theContainerSegment
struct but not the underlying slice invalue_to_points_container
. If you intend to track the total read cost, consider also accounting for the length of the associated container segment.
309-312
: Iterate values map with hardware measurement.You uniformly switched from
HwMeasurementAcc
to&HardwareCounterCell
, bringing this method in line with the rest of the code. Returning a copied iterator for numeric indices is efficient.Also applies to: 316-316
lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (6)
176-186
: Implementation looks good!Good use of
measure_hw_with_cell_and_fraction
with the bit size as fraction. This correctly accounts for the memory representation where each boolean value is stored as a single bit (1/8 of a byte).
192-202
: Implementation looks good!The implementation correctly mirrors the
iter_has_true_measured
method, providing hardware counter measurements for false values with the appropriate memory fraction.
375-376
: Good parameter renamingProperly renamed from
_hw_counter
tohw_counter
now that the parameter is actually being used.
382-385
: Good use of measured iteratorsThe implementation now correctly uses the hardware-measured iterators in the filter method.
394-395
: Good parameter renamingProperly renamed from
_hw_counter
tohw_counter
now that the parameter is actually being used.
406-408
: Good counter incrementThe implementation now correctly increments the hardware counter based on the count of values. The comment explains that this matches the behavior in the mmap index implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/common/common/src/counter/iterator_hw_measurement.rs (1)
6-68
: Well-designed iterator extension trait for hardware measurementsThe
HwMeasurementIteratorExt
trait provides a clean and flexible API for measuring hardware usage during iteration. The three methods offer different levels of control and flexibility.However, there's a discrepancy between comments and implementation:
- // Subtract 1 to not account for the latest `None` call. + // Increment by the total count of iterations multiplied by the factorSimilar comments should be updated on lines 47 and 64 as well.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/common/common/src/counter/iterator_hw_measurement.rs
(1 hunks)lib/common/common/src/iterator_ext/on_final_count.rs
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (4)
lib/common/common/src/counter/iterator_hw_measurement.rs (1)
70-70
: Good use of blanket implementationThe blanket implementation makes this trait available for all iterators, which is the right approach for this kind of extension trait.
lib/common/common/src/iterator_ext/on_final_count.rs (3)
39-43
: Improved counter semantics for accurate item countingThis change fixes a conceptual issue with the counter. The counter now increments only when an actual item is yielded (
Some
), rather than on every call tonext()
. This aligns with the comments in theHwMeasurementIteratorExt
trait about not counting the finalNone
call.
57-57
: Fixed test to match new counter behaviorThe test expectation has been correctly updated to reflect the new counter behavior.
60-92
: Good test coverage for partial iteration scenariosThese additional tests properly verify the counter behavior in different scenarios:
- When an iterator is partially consumed
- When an empty iterator is repeatedly called
This ensures the counter accurately reflects the number of yielded items regardless of how the iterator is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/common/common/src/counter/iterator_hw_measurement.rs (1)
13-27
: Suggestion: Validate multiplier inputs.Although the code is functionally correct, consider validating that
multiplier
is not zero if that is a scenario to avoid (e.g., producing no increments). This helps ensure your usage logic remains consistent and can prevent accidental no-op increments.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/common/common/src/counter/iterator_hw_measurement.rs
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/common/common/src/counter/iterator_hw_measurement.rs (2)
lib/common/common/src/iterator_ext/mod.rs (1) (1)
on_final_count
(39-45)lib/common/common/src/iterator_ext/on_final_count.rs (1) (1)
new
(14-20)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-low-resources
- GitHub Check: test (macos-latest)
- GitHub Check: test-consistency
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (3)
lib/common/common/src/counter/iterator_hw_measurement.rs (3)
6-12
: Good use of documentation for the trait.Your doc comments are clear and concise, making it easier for other developers to understand the purpose of these measurement methods.
36-48
: Ensure thread-safety if counters are shared.When
hw_cell
is shared across threads, it’s important to confirm thatHardwareCounterCell
is properly synchronized. If it’s not thread-safe, concurrent increments could lead to data races. Clarify usage or add documentation (and tests, if necessary) to ensure its safe usage in multi-threaded contexts.
67-67
: Efficient blanket implementation.Providing a blanket implementation for all iterators simplifies usage across the codebase. This design effectively generalizes hardware measurement to any iterator. Good job on thorough trait coverage.
* IteratorExt helper functions for Hw Measurements + Some impls * More index impls; Performance improvements; Use new iterator * Add missing measurement in hnsw search. Use disposable flag in HwCounterCell too * use precomputed counts more * Review nits * Fix measurement of map_index iter_values_map * Review remarks * fix OnFinalCount * remove comment --------- Co-authored-by: Luis Cossío <luis.cossio@outlook.com> Co-authored-by: Andrey Vasnetsov <andrey@vasnetsov.com>
Depends on #6117