-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Cardinality estimation IO measurements #6117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cardinality estimation IO measurements #6117
Conversation
6216639
to
570cfd0
Compare
c58bbae
to
00c3999
Compare
570cfd0
to
e5cd057
Compare
00c3999
to
57412ea
Compare
e5cd057
to
8de3c8a
Compare
57412ea
to
874eb0e
Compare
df104ed
to
ed264d0
Compare
874eb0e
to
d57c503
Compare
ed264d0
to
6f3ecd4
Compare
d57c503
to
eabc66d
Compare
6f3ecd4
to
6eb4428
Compare
let len = self.postings.get(token_id as usize).map(|p| p.len()); | ||
hw_counter | ||
.payload_index_io_read_counter() | ||
.incr_delta(size_of::<Option<CompressedPostingList>>() + len.unwrap_or(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompressedPostingList::len()
only reads the len of two vecs, does that still count toward io reads?
.map(|x| x.len()); | ||
hw_counter | ||
.payload_index_io_read_counter() | ||
.incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this len
only reads the len of 1 vec
3d2b048
to
c766539
Compare
📝 WalkthroughWalkthroughThe changes consistently add a new parameter—a reference to a hardware counter—to numerous method signatures across the codebase. Methods related to cardinality estimation, point counting, and filtering now require an extra argument of type Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
288-301
:⚠️ Potential issueHardware counter integration for values_of_hash method.
Similar to the previous method, this correctly implements hardware counter tracking for IO operations.
The same issue with missing
size_of
import applies here.Make sure to add the import as suggested in the previous comment.
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
274-288
: 💡 Verification agent🧩 Analysis chain
Added placeholder for hardware measurement in iter_values_map
The TODO comment indicates this is a placeholder for future implementation. Consider defining a timeline or creating an issue to track the implementation of value collection in the hardware measurement.
🏁 Script executed:
#!/bin/bash # Check if there are other similar TODO comments echo "Searching for similar TODO comments..." rg "TODO\(io_measurement\): Collect values" --statsLength of output: 1027
Action Required: Establish Tracking for Hardware Measurement Implementation
The placeholder for hardware measurement in
lib/segment/src/index/field_index/map_index/mmap_map_index.rs
(lines 274–288) is consistently used as a temporary stub. Note that similarTODO(io_measurement): Collect values
markers exist in the following files:
lib/segment/src/index/field_index/null_index/mmap_null_index.rs
lib/segment/src/index/field_index/map_index/mutable_map_index.rs
lib/segment/src/index/field_index/map_index/immutable_map_index.rs
Please consider defining a timeline or creating a dedicated issue to track the implementation of value collection for hardware measurements across these modules.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
lib/collection/src/collection_manager/holders/proxy_segment.rs
(1 hunks)lib/collection/src/collection_manager/segments_searcher.rs
(2 hunks)lib/collection/src/shards/forward_proxy_shard.rs
(2 hunks)lib/collection/src/shards/local_shard/mod.rs
(1 hunks)lib/collection/src/shards/local_shard/shard_ops.rs
(1 hunks)lib/collection/src/shards/proxy_shard.rs
(3 hunks)lib/collection/src/shards/queue_proxy_shard.rs
(2 hunks)lib/collection/src/shards/replica_set/update.rs
(1 hunks)lib/collection/src/shards/shard.rs
(2 hunks)lib/quantization/src/encoded_vectors.rs
(1 hunks)lib/segment/benches/conditional_search.rs
(1 hunks)lib/segment/src/entry/entry_point.rs
(1 hunks)lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs
(1 hunks)lib/segment/src/index/field_index/bool_index/mod.rs
(4 hunks)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
(1 hunks)lib/segment/src/index/field_index/facet_index.rs
(3 hunks)lib/segment/src/index/field_index/field_index_base.rs
(2 hunks)lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
(1 hunks)lib/segment/src/index/field_index/full_text_index/inverted_index.rs
(3 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
(1 hunks)lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs
(1 hunks)lib/segment/src/index/field_index/full_text_index/text_index.rs
(2 hunks)lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs
(1 hunks)lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs
(1 hunks)lib/segment/src/index/field_index/geo_index/mod.rs
(16 hunks)lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs
(3 hunks)lib/segment/src/index/field_index/map_index/immutable_map_index.rs
(3 hunks)lib/segment/src/index/field_index/map_index/mmap_map_index.rs
(3 hunks)lib/segment/src/index/field_index/map_index/mod.rs
(23 hunks)lib/segment/src/index/field_index/map_index/mutable_map_index.rs
(2 hunks)lib/segment/src/index/field_index/null_index/mmap_null_index.rs
(3 hunks)lib/segment/src/index/field_index/numeric_index/mod.rs
(1 hunks)lib/segment/src/index/hnsw_index/hnsw.rs
(2 hunks)lib/segment/src/index/payload_index_base.rs
(1 hunks)lib/segment/src/index/plain_payload_index.rs
(2 hunks)lib/segment/src/index/plain_vector_index.rs
(3 hunks)lib/segment/src/index/query_optimization/optimizer.rs
(1 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs
(2 hunks)lib/segment/src/index/struct_payload_index.rs
(6 hunks)lib/segment/src/segment/entry.rs
(5 hunks)lib/segment/src/segment/facet.rs
(3 hunks)lib/segment/src/segment/order_by.rs
(1 hunks)lib/segment/src/segment/sampling.rs
(1 hunks)lib/segment/src/segment/scroll.rs
(2 hunks)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
(1 hunks)lib/segment/tests/integration/payload_index_test.rs
(9 hunks)src/actix/api/update_api.rs
(0 hunks)
💤 Files with no reviewable changes (1)
- src/actix/api/update_api.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test-consensus-compose
- GitHub Check: test
🔇 Additional comments (119)
lib/segment/src/segment/sampling.rs (1)
22-22
: Changes maintain functionality while integrating hardware counterThe modification correctly updates the call to
estimate_cardinality
by passing thehw_counter
parameter, ensuring hardware resource usage is tracked during cardinality estimation without altering the core logic.lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1)
281-281
: Now measuring deletions with disposable hardware counterThe code now properly passes a disposable hardware counter to the
update_stored
method when deleting a vector, which enables gathering metrics for deletion operations.lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (1)
85-94
: Hardware counter integration properly tracks posting list I/OThe implementation correctly measures I/O operations by incrementing the payload index counter with both the size of the option type and the actual size of the posting list content.
CompressedPostingList::len()
only reads the length of vectors, but this still counts toward I/O reads since accessing metadata involves memory operations that should be measured.lib/collection/src/shards/proxy_shard.rs (2)
149-155
: Method signature correctly updated to include hardware counterThe
estimate_cardinality
method now properly accepts and forwards the hardware counter parameter to the wrapped shard, ensuring consistent measurement across the delegation chain.
180-181
: Hardware counter correctly passed from accumulatorThe code now properly obtains the hardware counter from the measurement accumulator and passes it to the estimate_cardinality method, ensuring metrics are collected during filter-based cardinality estimation.
lib/collection/src/shards/forward_proxy_shard.rs (2)
7-7
: Adding hardware counter import for tracking resource usageThe addition of the hardware counter import is necessary to support the new parameter in the method signature below.
236-242
: Added hardware counter parameter to estimate_cardinality methodThis change consistently adds the hardware counter parameter to the
estimate_cardinality
method and passes it to the wrapped shard implementation. This aligns with the broader pattern of changes in this PR to track hardware resource usage during cardinality estimation operations.lib/collection/src/collection_manager/segments_searcher.rs (5)
775-775
: Adding hardware counter import for test codeThe import is needed to support the usage of hardware counter in the test module.
804-805
: Added hardware counter instantiation for testsCreating a new hardware counter instance to be used in the test cases.
808-809
: Updated method call to include hardware counterThis change adds the hardware counter parameter to the method call, ensuring that tests reflect the updated API for performance tracking.
811-813
: Updated second method call to include hardware counterSame pattern of change - adding the hardware counter to the method call to ensure test compliance with updated API.
819-824
: Updated third method call to include hardware counterThis change completes the pattern of updating all calls to the
is_small_enough_for_unindexed_search
method to include the hardware counter parameter, ensuring all test cases properly reflect the updated API.lib/segment/src/segment/order_by.rs (1)
35-35
: Updated cardinality estimation to use hardware counterThe change properly passes the hardware counter to the
estimate_cardinality
method, ensuring resource usage is tracked during this operation. This is consistent with the overall PR objective of adding hardware tracking to cardinality estimation functions.lib/segment/src/index/query_optimization/optimizer.rs (1)
129-129
: Updated condition_cardinality method call to include hardware counterThis change ensures the hardware counter is passed through to the
condition_cardinality
method when converting conditions in query optimization. The update is consistent with the overall pattern of adding hardware tracking to cardinality-related functions across the codebase.lib/segment/src/entry/entry_point.rs (1)
206-210
: Method signature change looks goodThe addition of the
hw_counter
parameter to theestimate_point_count
method is consistent with the project's goal of tracking hardware metrics during cardinality estimation operations.lib/segment/benches/conditional_search.rs (1)
145-145
: Updated call correctly uses the hardware counterThe
hw_counter
parameter has been correctly added to theestimate_cardinality
method call, using the variable that was already defined at line 136.lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
164-179
: Potential issue with hardware counter calculationThe implementation correctly adds the
hw_counter
parameter, but there may be an issue with how the counter is being updated.The calculation
size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u8>()
might not accurately represent the actual memory used by the posting list.This calculation only considers the length of one vector as mentioned in a past review, and it multiplies by
size_of::<u8>()
which may not be the correct size for each element in the posting list. Consider using the actual size of the elements stored in the posting list.lib/collection/src/shards/queue_proxy_shard.rs (1)
223-231
: Method signature and implementation updated correctlyThe
estimate_cardinality
method has been properly updated to include thehw_counter
parameter and correctly forwards it to the wrapped shard's implementation.lib/segment/src/index/sparse_index/sparse_vector_index.rs (3)
266-270
: Method signature updated to include hardware counter parameter.The method signature now includes a
hw_counter
parameter which allows tracking hardware performance metrics during cardinality estimation.
275-275
: Hardware counter correctly propagated to payload index.The
hw_counter
parameter is now passed to theestimate_cardinality
method of the payload index, ensuring consistent hardware performance tracking throughout the cardinality estimation process.
438-439
: Hardware counter correctly passed to cardinality estimation.The call to
get_query_cardinality
now includes the hardware counter from the vector query context, ensuring performance metrics are tracked throughout the query execution path.lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (2)
204-209
: Method signature updated to include hardware counter parameter.The
get_posting_len
method now accepts and utilizes a hardware counter parameter, properly propagating it to the underlying postings implementation.
213-213
: Helpful comment added explaining hardware counter usage.The comment clarifies that hardware counter propagation is not needed in this specific context since the function is only used for building the HNSW index, which is helpful for future maintenance.
lib/collection/src/shards/local_shard/mod.rs (2)
990-990
: Method signature updated to include hardware counter parameter.Added
hw_counter
parameter to theestimate_cardinality
method to enable hardware performance tracking during cardinality estimation.
995-1000
: Hardware counter correctly propagated to segment estimations.The hardware counter is now properly passed to each segment's
estimate_point_count
method, ensuring consistent performance tracking across all segments during cardinality estimation operations.lib/quantization/src/encoded_vectors.rs (2)
34-34
: Fixed typo in parameter name.Corrected the parameter name from
hw_couter
tohw_counter
in thescore_point
method for consistency and correctness.
36-36
: Fixed typo in parameter name.Corrected the parameter name from
hw_couter
tohw_counter
in thescore_internal
method for consistency and correctness.lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (1)
358-362
: New hardware counter parameter with TODO for future implementation.The method signature has been updated to include
_hw_counter: &HardwareCounterCell
but it's not used yet, as indicated by the underscore prefix and the TODO comment. This is consistent with the PR objectives which indicate that some areas have TODOs for future improvements.Make sure the hardware counter collection is implemented in a follow-up PR as noted in the comment.
Are there other places in this file where hardware counters could be utilized for measurements? For example, the
filter
method on line 339 has an_hw_acc
parameter that seems related.lib/segment/tests/integration/payload_index_test.rs (8)
467-473
: Hardware counter integration for cardinality estimation.The
HardwareCounterCell
is properly initialized and passed to theestimate_cardinality
method.
503-503
: Consistent use of hardware counter for estimation.The hardware counter is correctly passed to the second cardinality estimation call.
565-571
: Hardware counter for is_empty condition tests.The hardware counter is properly initialized and consistently used for both struct and plain segments.
Also applies to: 577-577
678-684
: Hardware counter for cardinality estimation tests.The hardware counter is properly initialized and passed to the cardinality estimation method.
724-729
: Hardware counter for nested array filter tests.The hardware counter is properly initialized and passed to the cardinality estimation method.
788-793
: Hardware counter for nested array nesting tests.The hardware counter is properly initialized and passed to the cardinality estimation method.
879-885
: Hardware counter for struct payload index tests.The hardware counter is properly initialized and passed to the cardinality estimation method.
1087-1092
: Hardware counter for nested fields tests.The hardware counter is properly initialized and passed to the cardinality estimation method.
lib/collection/src/shards/local_shard/shard_ops.rs (1)
216-220
: Add hardware counter to cardinality estimation.The change correctly passes the hardware counter to the
estimate_cardinality
method from the existinghw_measurement_acc
parameter, maintaining functionality while enabling performance monitoring.lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)
4-4
: Add import for HardwareCounterCell.Import added to support the new parameter in the
estimate_cardinality
method.
255-259
: New hardware counter parameter with TODO for future implementation.The method signature has been updated to include
_hw_counter: &HardwareCounterCell
but it's not used yet, as indicated by the underscore prefix and the TODO comment. This is consistent with the PR objectives which indicate that some areas have TODOs for future improvements.Make sure the hardware counter collection is implemented in a follow-up PR. Consider if similar measurement could be added to other methods like
filter
on line 209 which already has a related_hw_counter
parameter.
434-440
: Update test to use hardware counter.The test has been properly updated to create a
HardwareCounterCell
instance and pass it to theestimate_cardinality
calls.lib/segment/src/index/field_index/facet_index.rs (4)
1-1
: LGTM - Proper hardware counter importThe import for
HwMeasurementAcc
is correctly added to support the new parameter in the trait and implementation.
20-23
: LGTM - Trait interface updated correctlyThe
FacetIndex
trait'siter_values_map
method signature is properly updated to include the hardware measurement accumulator parameter.
60-63
: LGTM - Enum implementation updated consistentlyThe implementation of
iter_values_map
forFacetIndexEnum
is properly updated to include the hardware measurement accumulator parameter.
65-68
: LGTM - Parameter correctly propagatedThe hardware measurement accumulator parameter is correctly passed to all variant implementations, maintaining consistent behavior across all enum variants.
lib/segment/src/index/field_index/full_text_index/text_index.rs (4)
131-136
: LGTM - Method signature updated properlyThe
estimate_cardinality
method inFullTextIndex
correctly adds the hardware counter parameter.
138-146
: LGTM - Implementation consistently updatedThe hardware counter parameter is correctly passed to all the inverted index implementations in each match arm.
355-359
: LGTM - PayloadFieldIndex trait implementation updatedThe
estimate_cardinality
method implementation forPayloadFieldIndex
is properly updated to include the hardware counter parameter.
361-362
: LGTM - Hardware counter propagationThe hardware counter is correctly passed to the parse_query method and then used in the estimate_cardinality call.
lib/collection/src/shards/replica_set/update.rs (1)
548-551
: LGTM - Properly passes hardware counter to cardinality estimationThe hardware counter is correctly passed to the
estimate_request_cardinality
method, allowing for proper measurement of hardware usage during the cardinality estimation process.lib/segment/src/segment/entry.rs (5)
361-361
: LGTM - Hardware counter added to filtering checkThe
should_pre_filter
method call is properly updated to include the hardware counter parameter.
385-385
: LGTM - Consistent hardware counter usage in ordered filteringThe hardware counter is correctly propagated to the
should_pre_filter
method in the ordered filtering path.
412-412
: LGTM - Hardware counter for random filteringThe hardware counter parameter is consistently passed to the
should_pre_filter
method in the random filtering path.
456-460
: LGTM - Method signature updated for hardware trackingThe
estimate_point_count
method signature is properly updated to include the hardware counter parameter, enabling resource usage tracking during point count estimation.
473-473
: LGTM - Hardware counter correctly propagated to underlying estimationThe hardware counter is properly passed to the payload index's cardinality estimation method.
lib/segment/src/index/hnsw_index/hnsw.rs (2)
539-541
: Implementation of hardware counter for cardinality estimationThe disposable hardware counter is correctly created and passed to the
estimate_cardinality
method, ensuring that hardware performance monitoring is properly integrated for internal operations.
1223-1226
: TODO for hardware counter propagationThe code is currently using a disposable hardware counter that doesn't propagate measurements. The TODO comment correctly identifies this as an area for future improvement.
Based on the PR objectives, this is likely one of the TODOs mentioned as areas for improvement that will be addressed in a future PR (mentioned as depending on PR #5985).
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
687-691
: Method signature updated with hardware counter parameterThe
estimate_cardinality
method has been updated to accept a hardware counter parameter, consistent with other similar method signature updates across the codebase. The TODO comment correctly indicates that actual hardware counter collection is planned for future implementation.lib/segment/src/index/plain_payload_index.rs (2)
112-116
: Added hardware counter parameter with helpful commentThe method signature has been properly updated to include the hardware counter parameter, with a clear comment indicating that no measurements are needed for this particular implementation.
126-134
: Updated nested cardinality estimation to pass hardware counterThe
estimate_nested_cardinality
method correctly forwards the hardware counter to theestimate_cardinality
method, maintaining consistency in the API.lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (1)
397-414
: Implemented hardware counter tracking for boolean index operationsThis change properly integrates the hardware counter to track I/O operations during cardinality estimation. The code correctly increments the counter by the slice length before counting ones in the slice.
lib/segment/src/index/payload_index_base.rs (2)
61-65
: LGTM: Added hardware counter parameter to estimate_cardinality.The addition of the
hw_counter
parameter to the trait method signature is appropriate and consistent with the hardware counter integration pattern being implemented throughout the codebase.
68-73
: LGTM: Added hardware counter parameter to estimate_nested_cardinality.Consistent with the changes to
estimate_cardinality
, this change correctly adds the hardware counter parameter to the nested cardinality estimation method.lib/collection/src/collection_manager/holders/proxy_segment.rs (1)
974-994
: LGTM: Hardware counter parameter added to estimate_point_count method.The implementation properly passes the hardware counter to both the wrapped segment and write segment estimations, ensuring consistent performance tracking throughout the cardinality estimation process.
lib/collection/src/shards/shard.rs (1)
286-297
: LGTM: Hardware counter added to estimate_request_cardinality method.The implementation correctly passes the hardware counter to the underlying
estimate_cardinality
method when dealing with filter-based effect areas.lib/segment/src/index/field_index/field_index_base.rs (2)
56-62
: Integration of hardware counter for cardinality estimationThe
estimate_cardinality
method in thePayloadFieldIndex
trait has been updated to include a hardware counter parameter, which will help track performance metrics during cardinality estimation operations.
261-268
: Consistent parameter passing to underlying implementationThe
estimate_cardinality
implementation inFieldIndex
correctly passes the new hardware counter parameter to the payload field index, maintaining proper delegation.lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs (4)
6-6
: Added hardware counter importAdded import for
HardwareCounterCell
to support the new parameter in method signatures.
183-184
: Updated delegate macro with hardware counter parameterThe delegate macro has been updated to include the hardware counter parameter in the method signatures for
points_of_hash
andvalues_of_hash
.
236-240
: Added hardware counter instrumentation for GeoHash readsThe method now increments the payload index IO read counter by the size of a GeoHash before performing the read operation, providing valuable metrics on IO usage.
243-247
: Added hardware counter instrumentation for GeoHash values readsSimilarly to
points_of_hash
, this method now tracks IO reads when retrieving values associated with a GeoHash.lib/segment/src/segment/facet.rs (3)
39-39
: Updated cardinality estimation with hardware counterAdded the hardware counter parameter to the
estimate_cardinality
call to track performance metrics during filter cardinality estimation.
79-79
: Added hardware counter accumulator for value iterationUpdated the
iter_values_map
call to include a new accumulator from the hardware counter, enabling performance tracking during value iteration.
127-127
: Consistent hardware counter usage for facet values filteringAdded the hardware counter parameter to another instance of
estimate_cardinality
, maintaining consistency in performance tracking across different code paths.lib/segment/src/segment/scroll.rs (3)
18-23
: Updated method signature to include hardware counterThe
should_pre_filter
method now accepts a hardware counter parameter, allowing for performance tracking during the pre-filtering decision process.
26-26
: Added hardware counter to cardinality estimation in filtering decisionUpdated the
estimate_cardinality
call to include the hardware counter parameter, enabling accurate tracking of performance metrics.
104-104
: Consistent hardware counter usage in filtered readsAdded the hardware counter parameter to the
estimate_cardinality
call in thefiltered_read_by_index
method, ensuring consistent performance tracking throughout the filtering process.lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (2)
104-117
: Good implementation of hardware counter for binary search complexityThe addition of the hardware counter parameter and the implementation of I/O read estimation based on binary search complexity is well-designed. The logarithmic calculation for the complexity is mathematically accurate and the multiplication by the size of
Counts
properly accounts for the memory footprint.
119-130
: Consistent implementation for values_of_hash methodThis implementation follows the same pattern as
points_of_hash
, properly applying the hardware counter to track I/O read operations. The only difference is that it doesn't multiply by the size ofCounts
, which is appropriate since we're only counting the complexity of the search itself, not the data access.lib/segment/src/index/field_index/map_index/mmap_map_index.rs (2)
8-9
: New import for hardware counter accumulatorThe addition of the hardware accumulator import supports the changes in the
iter_values_map
method.
210-227
: Updated get_count_for_value to use hardware counter for metricsThe method signature has been properly updated to accept a hardware counter parameter which is now passed to the
get
method. This allows for consistent tracking of I/O operations throughout the call chain.lib/segment/src/index/field_index/map_index/mutable_map_index.rs (3)
6-7
: New import for hardware counter accumulatorThe addition of the hardware accumulator import supports the changes in the
iter_values_map
method.
176-182
: Added hardware counter parameter with TODO commentThe method has been properly modified to accept a hardware counter parameter, although it's currently unused as indicated by the underscore prefix. The TODO comment clarifies the intention for future implementation.
188-195
: Added hardware measurement accumulator with TODO commentSimilar to the previous change, this method has been modified to accept a hardware measurement accumulator parameter, currently unused. The TODO comment indicates the same future implementation intention as in the
get_count_for_value
method.lib/segment/src/index/plain_vector_index.rs (3)
47-52
: Added hardware counter parameter to is_small_enough_for_unindexed_searchThe method signature has been properly updated to accept a hardware counter parameter which will be used for cardinality estimation.
62-63
: Updated estimate_cardinality call to pass hardware counterThe call to
estimate_cardinality
has been properly updated to pass the hardware counter parameter, ensuring consistent hardware metrics tracking throughout the cardinality estimation process.
86-91
: Updated method call to pass the hardware counter from query contextThe call to
is_small_enough_for_unindexed_search
now correctly passes the hardware counter from the query context, ensuring proper propagation of hardware measurement capabilities.lib/segment/src/index/field_index/bool_index/mod.rs (3)
131-139
: LGTM: Successfully updated method signature and implementationsThe
estimate_cardinality
method has been updated to include thehw_counter
parameter, and the corresponding implementations in the match arms correctly pass this parameter to the underlying index implementations.
168-171
: Implementation needed for TODO commentThe
iter_values_map
method signature has been updated to include hardware measurement accumulation, but the parameter is currently unused as indicated by the TODO comment.Consider implementing the hardware measurement functionality according to the TODO comment or provide a timeline for when this will be addressed.
472-481
: LGTM: Tests updated correctly with hardware counterThe test code has been properly updated to instantiate a
HardwareCounterCell
and pass it to theestimate_cardinality
method.lib/segment/src/index/field_index/map_index/immutable_map_index.rs (2)
293-301
: Implementation needed for TODO commentThe
get_count_for_value
method signature has been updated to include a_hw_counter
parameter, but it's currently unused as indicated by the TODO comment.Consider implementing the hardware measurement functionality according to the TODO comment. The underscore prefix indicates it's intentionally unused for now, but this should be addressed in the future.
309-314
: LGTM: Successfully updated method signature with hardware counterThe
iter_values_map
method has been correctly updated to use the hardware counter from thehw_acc
parameter, and it's being properly passed to theget_iterator
method.lib/segment/src/index/field_index/geo_index/mod.rs (6)
110-116
: LGTM: Successfully updated method signature and implementationsThe
points_of_hash
method has been updated to include thehw_counter
parameter, and the implementations in all three variants properly pass this parameter.
118-124
: LGTM: Successfully updated method signature and implementationsThe
values_of_hash
method has been updated to include thehw_counter
parameter, and the implementations in all three variants properly pass this parameter.
217-268
: LGTM: Successfully updated method signature and internal callsThe
match_cardinality
method has been updated to include thehw_counter
parameter, and all internal calls topoints_of_hash
andvalues_of_hash
now correctly pass this parameter along.
551-601
: LGTM: Successfully updated method signature and internal callsThe
estimate_cardinality
method has been updated to include thehw_counter
parameter, and all internal calls tomatch_cardinality
now correctly pass this parameter along, ensuring the hardware counter is used consistently throughout the cardinality estimation flow.
1055-1057
: LGTM: Tests updated correctly with hardware counterThe test code has been properly updated to instantiate a
HardwareCounterCell
and pass it to thepoints_of_hash
method.
947-950
: LGTM: Tests updated correctly with hardware counterThe
match_cardinality
test has been updated to instantiate aHardwareCounterCell
and pass it to the updated method calls correctly.lib/segment/src/index/field_index/full_text_index/inverted_index.rs (3)
96-97
: LGTM: Updated method signature with hardware counterThe
get_posting_len
method signature has been updated to include thehw_counter
parameter, aligning with the hardware counter integration pattern used throughout the codebase.
99-114
: LGTM: Successfully updated method signature and internal callsThe
estimate_cardinality
method has been updated to include thehw_counter
parameter, and the internal call toget_posting_len
now correctly passes this parameter along.
184-188
: Fixed parameter name typoCorrected the parameter name from
hw_coutner
tohw_counter
, ensuring consistent naming throughout the codebase.lib/segment/src/index/struct_payload_index.rs (5)
71-83
: Add hardware counter parameter toestimate_field_condition
The addition of
hw_counter: &HardwareCounterCell
into both the function signature (line 71) and the subsequent invocation ofindex.estimate_cardinality(&full_path_condition, hw_counter)
(line 83) looks consistent. This clearly propagates the hardware counter to underlying field indexes.
246-296
: Extendcondition_cardinality
withhw_counter
usageAll added parameter references (lines 246, 253, 259, 266, and 296) properly pass
hw_counter
into nested or field-level estimations. This consistently integrates hardware counter measurement for each condition type. The logic remains coherent, and there are no immediate concerns.
446-453
: Incorporatehw_counter
inestimate_cardinality
Lines 446–450 introduce a new parameter in the function signature, while lines 452–453 successfully reference the parameter in a closure. Passing
hw_counter
intoself.condition_cardinality
allows consistent tracking of hardware metrics.
461-466
: Nested cardinality estimation with hardware counterLines 461–466 similarly extend the nested cardinality estimation logic to accept and utilize
hw_counter
. This is consistent with the rest of the cardinality estimation methods.
473-476
: Propagatehw_counter
inquery_points
These lines (473, 476) ensure that
hw_counter
is provided when callingself.estimate_cardinality(...)
. Maintaining consistency with the rest of the index logic, this extension helps accurately measure hardware performance during queries.lib/segment/src/index/field_index/map_index/mod.rs (14)
179-184
: New parameter inget_count_for_value
Adding
hw_counter: &HardwareCounterCell
(line 179) and passing it down for Mutable, Immutable, and Mmap indexes (lines 181–184) is aligned with the broader hardware counter integration. This extension maintains uniform coverage across all index types.
211-219
: Hardware accumulator initer_values_map
Lines 211–219 introduce
hw_acc: HwMeasurementAcc
and forward it to specific index variants. This consistency ensures measurement coverage while iterating values.
234-240
: Extendmatch_cardinality
with hardware counterIn lines 234–240, the new parameter
hw_counter
aids cardinality estimation for matched values. The changes are minimal yet fully consistent with the existing design.
317-364
: Usehw_counter
inexcept_cardinality
Within lines 317 and 361–364, the function calculates how many points are excluded/retained, now also factoring in hardware counter calls with
self.get_count_for_value(...)
. This is a coherent extension of metric tracking to the “except” logic.
421-427
: IntegrateHardwareCounterCell
intoexcept_set
Adding
hw_acc
(line 421) and retrievinghw_counter
(line 427) ensures uniform measurement coverage for set-based exclusions. No issues spotted.
565-565
: Leverageself.except_set
withhw_acc
for string filteringLine 565 adds the hardware counter argument when calling
self.except_set(...)
for string-based exclusions. This keeps the filter logic consistent.
578-624
: Implementhw_counter
inestimate_cardinality
forMapIndex<str>
These additions unify cardinality estimation under the hardware counter approach (lines 578–582, 586, 599, 624). The pattern of retrieving and passing
hw_counter
tomatch_cardinality
orexcept_cardinality
is well-executed.
755-817
: ExtendMapIndex<UuidIntType>
cardinality with hardware counterWithin lines 755–759, 764, 784–785, and 816–817, the updated function signature and calls consistently pass the hardware counter. This parallels the other map index implementations, ensuring uniform coverage.
913-913
: Filter except set forMapIndex<IntPayloadType>
Line 913 references the
Match::Except
variant, with line 921 callingself.except_set(integers, hw_acc)
. This extension properly includes the hardware counter for integer-based filters.Also applies to: 921-921
927-983
: Integrate hardware counter inestimate_cardinality
forMapIndex<IntPayloadType>
From lines 927–931, 936–937, 957–958, and 982–983, hardware metrics are consistently passed for integer-based cardinality estimations. The approach matches the other map index types.
997-1001
: Add disposableHardwareCounterCell
inpayload_blocks
The code (lines 997–1001) calls
get_count_for_value(...)
with a disposable hardware counter, noting thatpayload_blocks
usage is not measured. This maintains consistency in the interface while avoiding unneeded overhead.
1262-1268
: Initialize and usehw_counter
in tests (int map index)Defining
let hw_counter = HardwareCounterCell::new();
(lines 1262–1263) and passing it to.except_cardinality(...)
(lines 1267–1268) ensures the test code fully exercises the new interface. Looks good.
1305-1311
: Initialize and passhw_counter
in tests (string map index)As with the int map index tests, lines 1305–1306 create the hardware counter, then lines 1310–1311 pass it to
.except_cardinality(...)
. The logic is consistent and verifies the updated code path.
1331-1332
: Verify empty index scenario withhw_counter
At lines 1331–1332, passing
hw_counter
to.except_cardinality(...)
confirms that an empty index yields zero cardinality. This ensures test coverage aligns with the new parameter.
c766539
to
260d8db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
165-180
: The hardware counter logic could be simplifiedThe method calculates the length first and then uses it for both the hardware counter increment and the return value. This is good for code clarity, but there's a redundant
.as_ref()
call in the chain that could be removed.fn get_posting_len( &self, token_id: TokenId, hw_counter: &HardwareCounterCell, ) -> Option<usize> { let len = self .postings .get(token_id as usize) .and_then(|posting| posting.as_ref()) - .as_ref() .map(|x| x.len()); hw_counter .payload_index_io_read_counter() .incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u8>()); len }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
(2 hunks)lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-low-resources
- GitHub Check: test-consensus
- GitHub Check: test (macos-latest)
- GitHub Check: test-consistency
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test-consensus-compose
🔇 Additional comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
175-179
: Consider whether counting onlysize_of::<u8>()
per element is accurateSimilar to the issue in the
filter
method, there's a concern about whethersize_of::<u8>()
accurately represents the memory footprint of each element in the posting list.
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs
Outdated
Show resolved
Hide resolved
260d8db
to
83fb468
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
165-180
: Consider making hardware counter incrementation more consistent with the filter method.The implementation correctly calculates and increments the hardware counter based on the posting list access, but there's a minor inefficiency in first retrieving the length in a separate variable and then using it for the counter increment.
Consider refactoring to a more direct approach:
fn get_posting_len( &self, token_id: TokenId, hw_counter: &HardwareCounterCell, ) -> Option<usize> { - let len = self - .postings - .get(token_id as usize) - .and_then(|posting| posting.as_ref()) - .as_ref() - .map(|x| x.len()); - hw_counter - .payload_index_io_read_counter() - .incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u32>()); - len + let postings = self + .postings + .get(token_id as usize) + .and_then(|posting| posting.as_ref()); + + let len = postings.map(|x| x.len()); + hw_counter + .payload_index_io_read_counter() + .incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u32>()); + len }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
(2 hunks)lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
145-148
: The calculation correctly accounts for posting list element size.The hardware counter incrementation for posting list size is correctly calculated using
size_of::<u32>()
which matches thePointOffsetType
definition in the codebase.
83fb468
to
4f9a385
Compare
lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/mmap_point_to_values.rs (1)
300-302
: Optimize the cloning operation for hardware counter calculation.The code clones the value to calculate its memory-mapped size and then passes the original value to the check function. This cloning operation could be unnecessary and adds overhead, especially for larger values.
Consider calculating the size without cloning by using a method that accepts a reference:
- let mmap_size = T::mmapped_size(value.clone()); - hw_acc.payload_index_io_read_counter().incr_delta(mmap_size); - if check_fn(value) { + let mmap_size = T::mmapped_size(value); + hw_acc.payload_index_io_read_counter().incr_delta(mmap_size); + if check_fn(value) {This would require updating the
MmapValue
trait to ensuremmapped_size
can work with referenced values directly, potentially improving performance for large values.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/segment/src/index/field_index/mmap_point_to_values.rs
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (2)
lib/segment/src/index/field_index/mmap_point_to_values.rs (2)
305-305
: Good optimization: reusing calculated value.Reusing the pre-calculated
mmap_size
variable is a good optimization that avoids recalculating the value's size.
287-288
: Well-implemented hardware counter parameter.The addition of the
hw_acc
parameter aligns with the PR objective of adding hardware counter measurements for more accurate I/O tracking during cardinality estimation operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/map_index/mod.rs (1)
314-416
: Consider splitting this large function for clarity.
The logic inexcept_cardinality
is correct and makes good use of saturating arithmetic to handle edge cases. However, it’s quite lengthy. Refactoring into smaller helper methods (e.g., to compute minimal or maximal scenarios separately) could improve readability and maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs
(1 hunks)lib/segment/src/index/field_index/bool_index/mod.rs
(7 hunks)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
(2 hunks)lib/segment/src/index/field_index/field_index_base.rs
(2 hunks)lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_chunks_reader.rs
(6 hunks)lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_list.rs
(3 hunks)lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_visitor.rs
(4 hunks)lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
(4 hunks)lib/segment/src/index/field_index/full_text_index/inverted_index.rs
(10 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs
(2 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
(4 hunks)lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs
(3 hunks)lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
(5 hunks)lib/segment/src/index/field_index/full_text_index/postings_iterator.rs
(2 hunks)lib/segment/src/index/field_index/full_text_index/text_index.rs
(4 hunks)lib/segment/src/index/field_index/geo_index/mod.rs
(24 hunks)lib/segment/src/index/field_index/map_index/mod.rs
(29 hunks)lib/segment/src/index/field_index/null_index/mmap_null_index.rs
(10 hunks)lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs
(1 hunks)lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs
(1 hunks)lib/segment/src/index/field_index/numeric_index/mod.rs
(4 hunks)lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
(2 hunks)lib/segment/src/index/field_index/numeric_index/tests.rs
(4 hunks)lib/segment/src/index/struct_payload_index.rs
(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
- lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs
- lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
- lib/segment/src/index/struct_payload_index.rs
- lib/segment/src/index/field_index/null_index/mmap_null_index.rs
- lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
- lib/segment/src/index/field_index/field_index_base.rs
- lib/segment/src/index/field_index/full_text_index/inverted_index.rs
- lib/segment/src/index/field_index/bool_index/mod.rs
- lib/segment/src/index/field_index/full_text_index/text_index.rs
- lib/segment/src/index/field_index/numeric_index/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (76)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (2)
95-99
: Explicit lifetime annotations improve borrowing relationshipsThe added lifetime parameter
'a
properly ties the returnedChunkReader
's lifetime to the references it depends on (self
,header
, andhw_counter
). This change enhances the clarity and safety of the borrowing relationships, ensuring theChunkReader
can't outlive any of its source data.
128-132
: Consistent lifetime annotations for method chainThe lifetime annotations here are consistent with those added to
get_reader
, ensuring proper lifetime propagation through the method chain. This is particularly important sinceget
callsget_reader
on line 139. This change maintains memory safety while making the borrowing relationships explicit.lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (2)
209-214
: Good implementation of hardware counter trackingThe updated
values_range
method now properly tracks read operations using the hardware counter. The lifetime parameter'a
ensures that both&self
and&hw_counter
share the same lifetime, which is correctly propagated to the returned iterator.
217-221
:✅ Verification successful
Ensure proper size calculation in counter increment
The code increments the payload IO read counter with
size_of::<Point<T>>()
which measures the size of thePoint<T>
type. This looks correct, but verify that this matches the actual byte count read during this operation.
🏁 Script executed:
#!/bin/bash # Verify consistent memory size approach across the codebase rg -A 2 "size_of::<Point<" --type rustLength of output: 1080
Ensure Correct Byte Calculation in Payload IO
Our search confirms that the usage of
size_of::<Point<T>>()
is consistent across bothimmutable_numeric_index.rs
andmmap_numeric_index.rs
. The counter increment accurately reflects the byte size of aPoint<T>
. Please double-check that this computed size fully represents the actual bytes read during the operation and that no additional overhead needs to be accounted for.lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
244-257
:❓ Verification inconclusive
Good implementation with correct counter usage
The updated
values_range
method correctly integrates the hardware counter with proper lifetime annotations. Nice use ofinspect
to track the IO operations without modifying the data flow.Note that this implementation increments the
payload_index_io_read_counter
while the implementation inimmutable_numeric_index.rs
increments thepayload_io_read_counter
. Ensure this difference is intentional.
🏁 Script executed:
#!/bin/bash # Compare counter usage between implementations echo "Checking counter type differences between implementations:" rg -A 3 "hw_counter\s+.*\.(payload_.*_read_counter)" --type rustLength of output: 655
Minor Verification Required: Confirm Hardware Counter Differentiation
The implementation in
mmap_numeric_index.rs
uses proper lifetime annotations and integrates the hardware counter correctly via theinspect
method. However, as noted, it increments thepayload_index_io_read_counter
—while, according to earlier context, the implementation inimmutable_numeric_index.rs
(and similarly seen in other modules likemutable_inverted_index.rs
) uses a different counter (payload_io_read_counter
). Please verify that this discrepancy in counter usage is intentional and that the design rationale for using different counters across these modules is well documented.
- File:
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs
– Usespayload_index_io_read_counter()
- File:
lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs
– Confirm that the use ofpayload_io_read_counter()
is deliberatelib/segment/src/index/field_index/numeric_index/tests.rs (4)
1-1
: Good addition of necessary importThe import of
HwMeasurementAcc
is necessary for the changes in this file.
101-102
: Properly extracting the counter cell from the accumulatorGood approach to get the counter cell from the accumulator. This improves the design by allowing the counter to be shared across multiple operations.
111-111
: Consistent passing of hardware counter referenceThis change correctly passes the hardware counter reference to the filter method, maintaining consistency with other related changes.
439-441
: Good test update for hardware counter usageThe test has been correctly updated to create a hardware counter and pass it to the filter method, ensuring consistent test coverage with the new parameter requirements.
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (2)
108-122
: Properly updated method signature and implementationThe
values_range
method has been correctly updated with appropriate lifetime parameters and to use the hardware counter.
291-343
: Good refactoring from macro to inline methodsThe change from a delegate macro to explicit inline method implementations improves code readability and maintainability. Each method clearly delegates to the corresponding method in
in_memory_index
, and the#[inline]
annotations help maintain performance.lib/segment/src/index/field_index/geo_index/mod.rs (21)
109-115
: Ensure consistent usage of hardware counter inpoints_of_hash
These changes correctly forwardhw_counter
across all index variants. This appears consistent and maintains clarity.
117-123
: Mirror thehw_counter
approach invalues_of_hash
Similar topoints_of_hash
, this method correctly delegates the hardware counter to underlying index variants.
216-220
: Extended signature formatch_cardinality
Addinghw_counter
aligns this function’s signature with the rest of the codebase’s performance tracking strategy.
230-231
: Properly invokepoints_of_hash
&values_of_hash
withhw_counter
This ensures that I/O readings are tracked when retrievingcommon_hash
statistics.
235-235
: Inline call topoints_of_hash
in map
Passinghw_counter
here maintains instrumentation consistency.
509-513
: Updatedfilter
method signature
Introducing the lifetime parameter andhw_counter
reference is consistent with the pattern used elsewhere, enabling fine-grained performance monitoring.
518-518
: Usehw_counter
incheck_values_any
Continuing the consistent hardware counter pass-through helps with uniform tracing of I/O.
528-528
: Repeatedcheck_values_any
usage
Same comment as above: passinghw_counter
is properly aligned with the instrumentation approach.
538-538
: Anothercheck_values_any
call
No issues; consistent with the rest of the code.
547-551
: Expanded signature forestimate_cardinality
Welcomed consistency in providinghw_counter
for each cardinality estimation method.
554-554
: Leveragematch_cardinality
withhw_counter
Seamless alignment with the new signature.
573-579
: Polygon interior and exterior cardinality
Calls tomatch_cardinality
now incorporate the counter parameter, ensuring measurement is captured.
803-804
: Test usage ofhw_counter
Creating and usingHardwareCounterCell
in tests ensures coverage of the new parameter.
944-944
: Checking cardinality in tests
Usingfield_index.estimate_cardinality(&field_condition, &hw_counter)
here confirms consistent test coverage.
996-1013
: Extended test flow with hardware counter
From counter creation to the filter calls, these lines confirm that the new signature is well-tested for standard filtering logic.
1053-1076
:hw_counter
in block-based test logic
This chunk ensures block-based filtering leverages consistent I/O measurement.
1100-1161
: Multi-location cardinality checks
All references tohw_counter
in this block—Berlin, NYC, Tokyo—confirm test coverage for multiple geographic scenarios.
1182-1205
:hw_counter
usage for close geo payload
Retains performance measurement logic across close proximity tests (Berlin <-> Potsdam).
1245-1263
:hw_counter
in disk-load test
Validates that loading from disk maintains the same instrumentation approach.
1367-1404
: Empty index cardinality checking withhw_counter
Ensures hardware counter usage in edge cases with zero or partial data.
1406-1465
: Boundaries across the antimeridian
Again, passinghw_counter
through bounding box queries demonstrates well-rounded test coverage.lib/segment/src/index/field_index/full_text_index/postings_iterator.rs (3)
58-59
: ImportedHardwareCounterCell
in tests
Great addition to measure read operations while iterating compressed postings.
90-91
: Initializehw_counter
for intersection tests
Allows verifying that I/O reads are tracked correctly when merging postings.
96-98
: Passinghw_counter
to compressed posting readers
Integrating instrumentation into each posting list ensures consistent performance metrics.lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (3)
151-153
: Introducehw_acc
andhw_counter
Clear approach to obtain the counter cell from the hardware accumulator, avoiding duplicated counters.
156-174
: Applyhw_counter
in test filters
These lines confirm that full-text indexing logic with match queries now includes hardware metrics.
211-246
: Persist and re-checkhw_counter
usage
All subsequent filter calls maintain a consistent pattern of passing the same counter cell, ensuring reliable instrumentation data.lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (4)
47-51
: Refined lifetime signature andhw_counter
Adoptingfn filter<'a>(…, hw_counter: &'a HardwareCounterCell) -> …
clarifies ownership and ensures safe reference usage.
71-74
: Passhw_counter
toposting.reader
Good approach to accumulate read metrics for each token’s postings.
83-93
: Introducehw_counter
inget_posting_len
Incrementingpayload_index_io_read_counter()
upon retrieving the posting length is a logical place for measuring I/O overhead.
123-124
: Usehw_counter
incheck_match
Ensures posting lookups also contribute to read metrics. No concurrency problems apparent.lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_visitor.rs (5)
96-96
: Improved encapsulation by using a dedicated methodThe change from directly calling
remainder_postings.binary_search(&val).is_ok()
to using the newsearch_in_remainder
method improves encapsulation and allows for proper hardware counter tracking.
99-99
: Improved encapsulation by using accessor methodUsing
get_chunk_index
instead of direct array access provides better encapsulation and allows for tracking hardware metrics when accessing chunk data.
125-129
: Improved encapsulation with dedicated accessorsUsing the new
chunks_len()
method andget_remainder_posting
improves encapsulation and enables hardware counter tracking, which is essential for IO monitoring.
150-158
: Added hardware counter support to test codeProperly updated test code to include hardware counter initialization and usage, ensuring that tests accurately reflect the production code behavior with IO measurements.
173-175
: Added hardware counter in second testCorrectly added hardware counter initialization and usage in the second test, maintaining consistency across all test cases.
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (3)
376-380
: Updated filter method signature to use HardwareCounterCellChanged from using
HwMeasurementAcc
to&'a HardwareCounterCell
for hardware counter tracking, which aligns with the broader changes across the codebase for consistent IO measurement.
388-388
: Fixed bit calculation for hardware counter incrementUpdated counter increment to properly account for bits by dividing slice length by
u8::BITS
, which provides more accurate measurement of IO operations.
395-412
: Refactored estimate_cardinality to include hardware counterThe implementation now correctly tracks IO operations by incrementing the hardware counter after retrieving and processing the slice. The code is cleaner and more modular with better separation of concerns.
lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_chunks_reader.rs (6)
2-2
: Added import for HardwareCounterCellAdded the necessary import for the hardware counter functionality.
10-15
: Improved encapsulation and added hardware counterChanged field visibility from public to private and added a new
hw_counter
field to track hardware metrics, enhancing the class's encapsulation and measurement capabilities.
18-31
: Updated constructor to include hardware counterModified the constructor to accept and store the hardware counter, maintaining consistency with the new field added to the struct.
100-105
: Added hardware counter tracking for chunk decompressionAdded IO tracking with hardware counter when decompressing chunks, providing visibility into the performance characteristics of this operation.
105-105
: Used u8::BITS instead of magic number for bit calculationReplaced hardcoded value with the more semantic
u8::BITS
, improving code readability and maintainability.
118-141
: Added accessor methods with hardware counter trackingImplemented new helper methods for accessing chunks, remainder postings, and performing searches, each with appropriate hardware counter tracking. This improves encapsulation and enables detailed IO measurement.
lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_list.rs (4)
2-2
: Added import for HardwareCounterCellAdded the necessary import for hardware counter functionality.
38-46
: Updated reader method to accept hardware counterModified the
reader
method signature to include a hardware counter parameter and pass it to theChunkReader
constructor, enabling IO tracking in the reader.
53-61
: Updated iter method to use hardware counterModified the
iter
method to accept and pass the hardware counter to the reader, ensuring consistent hardware metric tracking throughout the iteration process.
97-105
: Updated test to use hardware counterProperly updated the test code to initialize a hardware counter and pass it to the reader, ensuring that tests reflect the production code behavior with IO measurements.
lib/segment/src/index/field_index/map_index/mod.rs (16)
179-185
: Implementation looks correct.
This update toget_count_for_value
consistently passes thehw_counter
parameter to specialized indices, ensuring hardware metrics are properly accounted for.
211-219
: No issues with the revised signature.
Theiter_values_map
method now takes anHwMeasurementAcc
and cleanly propagates it to each variant implementation.
234-242
: Accurate pass-through of the hardware counter.
Thematch_cardinality
method correctly usesget_count_for_value
to compute the exact cardinality, properly reusing thehw_counter
.
421-433
: Consistent hardware counter usage.
Inexcept_set
, the hardware counter is correctly passed toget_iterator
, ensuring performance metrics can be tallied for exclusion logic.
530-569
: Filter updates look solid.
The refactoredfilter
method for string-based payloads correctly invokesget_iterator
orexcept_set
withhw_counter
. The logic aligns with the updated signatures.
571-631
: Clean integration ofhw_counter
into cardinality estimation.
Withinestimate_cardinality
, the additional parameter is properly utilized in calls tomatch_cardinality
andexcept_cardinality
. Design remains consistent.
639-653
: Correct usage ofHardwareCounterCell::disposable()
.
The approach of passing a disposable counter inpayload_blocks
is valid, especially since measurement is not needed for HNSW construction.
680-742
: Filter method changes appear proper.
ForUuidIntType
, the updated lines consistently passhw_counter
toget_iterator
, preserving metrics collection.
744-819
: Proper integration with cardinality estimation for UUID-based payloads.
All calls tomatch_cardinality
andexcept_cardinality
reflect the newhw_counter
parameter. Implementation aligns well with the enhancements.
821-844
: Payload blocks logic is consistent.
Using a disposable hardware counter inpayload_blocks
avoids unnecessary measurements while enabling a unified interface. Implementation is coherent.
871-905
: Improved filter logic for integer payloads.
Passinghw_counter
intoget_iterator
andexcept_set
ensures consistent performance tracking for filtering operations.
910-970
: Cardinality estimation flows well with the new parameter.
For integer payloads, the expandedestimate_cardinality
method handles all matching variants and passeshw_counter
as needed.
980-992
: Accurate block condition building.
Within integer-basedpayload_blocks
, the reuse ofHardwareCounterCell::disposable()
maintains a cohesive approach to measuring or ignoring costs as appropriate.
1245-1253
: Unit test updates confirm new behavior.
The creation and usage ofhw_counter
forexcept_cardinality
verify that the code handles empty exclusions properly.
1288-1296
: Additional test coverage for string-based map index.
Similarly verifies thathw_counter
is well-integrated for empty sets.
1309-1317
: Ensures correct edge-case handling for an empty index.
Passing an empty iterator and checking that the result yields zero cardinality demonstrates robust test coverage with the new parameter.
) -> Option<ChunkReader<'_>> { | ||
fn get_reader<'a>( | ||
&'a self, | ||
header: &'a PostingListHeader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
header
doesn't need a lifetime here, since it is not part of the returned value.
) -> Option<Box<dyn Iterator<Item = PointOffsetType> + '_>> { | ||
fn filter<'a>( | ||
&'a self, | ||
condition: &'a FieldCondition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No lifetime needed for condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be concerned about double-counting, where we count some operation twice during the cardinality estimation and also while doing the operation itself?
I did some basic test with this, but definitely was not able to test all possible paths. Since this covers so much, it's very hard to keep track of what branches there are, and which we do or do not count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made test measurements on keyword and text indexes, after some fixes they now match my expectations. Also, found a potential place for optimization - we can try to exclude primary condition from second stage filtering.
I propose to merge this and proceed with further steps: create a checklist and some docs, which would allow external people to understand what to expect from measurement and how to test them.
* Cardinality estimation measurements * Apply hw measurements to latest changes from dev * Clippy * Also measure cardinality estimation for geo index * Make measured units 'bytes' * Use PointOffsetType instead of u32 for size calculation * fix memory cost for check_values_any in mmap index * fix double counting for value reading in mmap, remove hw_counter from mmap hashmap * fmt * fix hw measurement for text index * Remove non necessary lifetime annotations --------- Co-authored-by: generall <andrey@vasnetsov.com>
Depends on #5954.
There are places where we would benefit from #5985 which have been left out and marked with a TODO.