-
Notifications
You must be signed in to change notification settings - Fork 1.8k
IO read measurements for most Payload indices #5951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -490,12 +493,14 @@ impl PayloadFieldIndex for GeoMapIndex { | |||
fn filter( | |||
&self, | |||
condition: &FieldCondition, | |||
hw_acc: HwMeasurementAcc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid lifetime issues and complexity, we pass an owned accumulator here and move it into the iterator we return from this function.
This approach is used for the same reason in other functions as well.
// TODO(io_measurement): implement measurement | ||
index.check_values_any(point_id, is_true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we count just 1 for a boolean check?
In the description I see that you're handling this in a separate PR, fine by me. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it is more than 1 byte though because we load a full page into memory, even if we only check one byte.
Can we maybe go probabilistic here and say 99% of the time we access the same page so we use
0.01 * PAGE_SIZE
(4kb) as constant measurement.
@generall wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's already implemented with 1 byte for boolean here: https://github.com/qdrant/qdrant/pull/5954/files#diff-2bee7d830875853065cb32b763bc13bc85fca02f0dbab25f7018d251fd192d20R186
I think it's still worth reconsidering to use > 1 byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we should consider entire slices when we iterate only a few of their values.
Just in case, I just made this iterator adapter (#5985) which should make it easier (and faster considering it is not needed to have any smart pointers in the hot path) to measure the actual amount of iterations used.
let v: &N = v.borrow(); | ||
let size = <N as MmapValue>::mmapped_size((*v).as_referenced()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we get this size outside the hot path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. If N
is a str
, it may always have a different size so either we do it here or iterate over the values twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea how expensive this could get?
Looking around I could not find a bench for the immutable map index unfortunately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't measured it directly but the size calculation is as I said unavoidable since N
can be of type str
.
We probably can improve performance a bit by using an integer for collecting the total size and accumulate this value into the HwCounter afterwards but I'd suggest to address this in a separate PR and focus on merging the PRs for now.
Edit: Fixed in #6137
778b532
to
8e0ffeb
Compare
f15ea01
to
e7e4c15
Compare
8263c73
to
04d84c3
Compare
dd5ae10
to
bd123ba
Compare
5cef69a
to
2b7ec5f
Compare
bd123ba
to
f10c290
Compare
f10c290
to
ca799aa
Compare
📝 WalkthroughWalkthroughThe changes introduce a new field named ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (13)
lib/common/common/src/counter/hardware_data.rs (1)
9-9
: Appropriate addition of new I/O tracking field.The addition of
payload_index_io_read
properly extends theHardwareData
struct to track I/O read operations specifically for payload indices. This field follows the existing naming convention and type pattern used for other I/O metrics in the struct.Consider adding a brief comment explaining what this field specifically tracks, either above this field or in the struct's documentation comment, to help future developers understand its purpose more clearly.
docs/grpc/docs.md (1)
3037-3037
: New Field Added:payload_index_io_read
The new field is added to the HardwareUsage table with type
[uint64](#uint64)
, and its placement between the existing payload IO read/write metrics and vector IO metrics clearly documents the additional measurement for payload index read operations. Ensure that its semantics are clearly explained in accompanying documentation or API change logs so that users understand the distinction from the other IO metrics.lib/segment/src/index/field_index/mmap_point_to_values.rs (1)
176-177
: Good constant definition for measuring overhead.The
MMAP_PTV_ACCESS_OVERHEAD
constant appropriately represents the memory and I/O overhead associated with accessing the memory-mapped index. Using the size ofMmapRange
provides a consistent measurement of this overhead.However, I notice you're using
size_of
without importing it. You should add:+use std::mem::size_of;
at the top of the file.
lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (1)
74-86
: Hardware counter integration for I/O measurementThe
check_values_any
method now accepts a hardware counter parameter and uses it to measure payload index I/O reads by tracking the size of each value accessed.There appears to be a missing import for
size_of_val
which should come fromstd::mem
.+use std::mem::size_of_val; use std::collections::{BTreeMap, HashSet}; use std::path::PathBuf; use std::sync::Arc;
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (1)
208-212
: Added hardware counter parameter with TODO commentThe
filter
method now accepts aHwMeasurementAcc
parameter, though it's not being utilized yet (indicated by the underscore prefix). The TODO comment suggests future implementation to collect values.Would you like me to propose an implementation for the TODO that would track I/O reads in this filter method?
lib/segment/src/index/struct_payload_index.rs (1)
520-520
: Consider implementing IO measurements for index operationsThere's a TODO comment about adding measurements to the index during payload operations. Since you're already implementing I/O tracking for payload indices, this would be a good opportunity to address this TODO in a future PR.
Would you like suggestions on how to implement I/O measurements for index operations in a consistent way with the current approach?
lib/segment/src/index/field_index/map_index/immutable_map_index.rs (1)
304-312
: Consider propagating the hardware counter throughout the call stackThere's a TODO comment about potentially propagating the hardware counter. Creating a disposable counter here means measurements at this level won't be aggregated with parent operations.
Consider modifying the method signature to accept a hardware counter parameter and propagate it through the method call stack. This would allow for more accurate hierarchical measurement of I/O operations.
-pub fn iter_values_map(&self) -> impl Iterator<Item = (&N, IdIter<'_>)> + '_ { - let hw_counter = HardwareCounterCell::disposable(); // TODO(io_measurements): Propagate? +pub fn iter_values_map(&self, hw_counter: &HardwareCounterCell) -> impl Iterator<Item = (&N, IdIter<'_>)> + '_ {lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (1)
110-125
: Consider alternatives to.inspect()
for side-effect measurement.
.inspect()
is acceptable, but for production telemetry, an explicit loop or aggregator could provide more intentional side-effect behavior.lib/segment/src/index/field_index/map_index/mod.rs (1)
407-420
: Reminder to address the disposable hardware counter.
HardwareCounterCell::disposable()
inexcept_set
might leave out aggregated metrics. Consider propagatingHwMeasurementAcc
from higher-level calls.lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
210-211
: Propagate the hardware counter instead of using a disposable one.
The TODO indicates that a local disposable counter is used. For consistent I/O measurement across the call stack, pass an existing hardware counter from the caller.- let hw_counter = HardwareCounterCell::disposable(); // TODO(io_measurement): Propagate. + // Example of using a propagated hardware counter rather than a disposable one: + // let hw_counter = incoming_hw_counter.fork();lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (2)
29-55
: Implement measurement for Boolean values.
For theBoolIndex
branch, measurements are still marked with a TODO. This might skew overall statistics when boolean checks occur frequently.Would you like me to help integrate the hardware counter here, similar to the other index types?
92-127
: Consider refactoring repeated pattern logic.
The code for handlinglist.len() < INDEXSET_ITER_THRESHOLD
vs. bigger lists is repeated. You could unify it into a helper function or macro for improved maintainability.lib/segment/src/index/query_optimization/condition_converter.rs (1)
329-333
: Unify float and datetime range logic if desired.
The repeated pattern for range checks between floats and datetimes could be consolidated to reduce duplication.Also applies to: 335-336, 343-345, 350-355, 376-378
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (38)
docs/grpc/docs.md
(1 hunks)docs/redoc/master/openapi.json
(2 hunks)lib/api/src/grpc/conversions.rs
(3 hunks)lib/api/src/grpc/proto/points.proto
(1 hunks)lib/api/src/grpc/qdrant.rs
(1 hunks)lib/api/src/rest/models.rs
(2 hunks)lib/common/common/benches/mmap_hashmap.rs
(2 hunks)lib/common/common/src/counter/hardware_accumulator.rs
(7 hunks)lib/common/common/src/counter/hardware_counter.rs
(10 hunks)lib/common/common/src/counter/hardware_data.rs
(1 hunks)lib/common/common/src/mmap_hashmap.rs
(7 hunks)lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs
(2 hunks)lib/segment/src/index/field_index/bool_index/mod.rs
(6 hunks)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
(2 hunks)lib/segment/src/index/field_index/field_index_base.rs
(3 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
(2 hunks)lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
(3 hunks)lib/segment/src/index/field_index/full_text_index/text_index.rs
(2 hunks)lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs
(2 hunks)lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs
(2 hunks)lib/segment/src/index/field_index/geo_index/mod.rs
(10 hunks)lib/segment/src/index/field_index/map_index/immutable_map_index.rs
(4 hunks)lib/segment/src/index/field_index/map_index/mmap_map_index.rs
(4 hunks)lib/segment/src/index/field_index/map_index/mod.rs
(10 hunks)lib/segment/src/index/field_index/map_index/mutable_map_index.rs
(4 hunks)lib/segment/src/index/field_index/mmap_point_to_values.rs
(3 hunks)lib/segment/src/index/field_index/null_index/mmap_null_index.rs
(3 hunks)lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs
(2 hunks)lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs
(2 hunks)lib/segment/src/index/field_index/numeric_index/mod.rs
(7 hunks)lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
(4 hunks)lib/segment/src/index/field_index/numeric_index/tests.rs
(10 hunks)lib/segment/src/index/query_optimization/condition_converter.rs
(8 hunks)lib/segment/src/index/query_optimization/condition_converter/match_converter.rs
(7 hunks)lib/segment/src/index/struct_payload_index.rs
(3 hunks)lib/storage/src/content_manager/toc/mod.rs
(1 hunks)lib/storage/src/content_manager/toc/request_hw_counter.rs
(2 hunks)src/common/metrics.rs
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (101)
docs/redoc/master/openapi.json (1)
12640-12666
: Field addition for payload index I/O measurement looks good.The new field
payload_index_io_read
has been correctly added to theHardwareUsage
object, properly defined as an unsigned integer with a minimum value of 0. This addition aligns with the PR objective of implementing I/O read measurements for payload indices.The field is appropriately placed between other payload-related fields and vector-related fields, maintaining a logical organization. It's also correctly included in the required fields array, ensuring this metric will always be present in hardware usage reports.
This change enables tracking the amount of I/O read operations specifically performed on payload indices during request processing, enhancing the system's observability.
lib/api/src/rest/models.rs (2)
58-58
: New hardware metric for payload index IO reads.The addition of this field enhances the monitoring capabilities for tracking I/O operations specific to payload indexes.
101-101
: Consistent implementation in anonymize method.The new field is properly included in the anonymize method, maintaining consistency with the other hardware usage metrics.
lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (2)
6-6
: Import added for hardware counter.Appropriate import for the HardwareCounterCell type that will be used to track IO operations.
181-192
:❓ Verification inconclusive
Added hardware counter for IO read tracking.
The method signature has been updated to include a hardware counter parameter, and the implementation properly increments the payload index IO read counter based on the size of each value being checked.
🏁 Script executed:
#!/bin/bash # Check if this API change is consistently applied throughout the codebase rg -A 2 -B 2 "check_values_any\s*\(\s*.*\s*,\s*.*\s*,\s*&Hardware" --type rustLength of output: 80
API Change Reviewed: Hardware Counter Integration for IO Reads
The updated
check_values_any
method inlib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs
now accepts a hardware counter parameter (hw_counter: &HardwareCounterCell
) and correctly increments the payload IO read counter based on the size of each value processed. While this change appears correctly implemented in the shown snippet, please manually verify that all call sites ofcheck_values_any
across the codebase have been updated to pass the appropriate hardware counter.lib/api/src/grpc/conversions.rs (2)
2728-2728
: Added payload_index_io_read to HardwareUsage conversion.The new field is correctly included in the conversion from HwMeasurementAcc to HardwareUsage, ensuring the metric is properly propagated.
2741-2741
: Added payload_index_io_read to HardwareData conversion.The new field is consistently included in the conversion from HardwareUsage to HardwareData, maintaining the flow of the metric through different data structures.
Also applies to: 2750-2750
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (3)
106-106
: Added hardware counter import for tests.Import of HwMeasurementAcc for use in the test cases.
151-152
: Created hardware counter for test cases.Hardware counter instances are created for use in the test cases to comply with the updated filter method signature.
Also applies to: 212-212
154-157
: Updated filter method calls to include hardware counter.All calls to the filter method have been updated to pass the hardware counter parameter, ensuring the tests work with the updated API.
Also applies to: 161-164, 168-171, 179-184, 215-218, 222-225, 231-234, 240-243
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (2)
5-5
: Import looks good.The addition of the hardware counter import is appropriate for the functionality being implemented.
201-216
: Implementation of hardware counter for IO measurement looks correct.The
hw_counter
parameter has been properly added to thecheck_values_any
method and is correctly passed to the innerpoint_to_values.check_values_any
call when checking values. This change enables tracking I/O read operations related to payload indices.src/common/metrics.rs (2)
334-334
: Hardware usage field addition looks good.The
payload_index_io_read
field has been properly added to the HardwareUsage struct destructuring pattern.
353-358
:✅ Verification successful
Verify metric type for consistency.
The new
payload_index_io_read
metric is defined as aGAUGE
while other IO metrics likepayload_io_read
andvector_io_read
are defined asCOUNTER
. This might be intentional, but it's worth verifying this choice for consistency.
🏁 Script executed:
#!/bin/bash # Check how other IO metrics are defined in the codebase grep -r "metric_.*_io_" --include="*.rs" . | grep -E "GAUGE|COUNTER"Length of output: 70
Please run the above script to verify how the different IO metrics are declared (i.e. whether they use GAUGE or COUNTER) and share the results.
🏁 Script executed:
#!/bin/bash # Search for occurrences of the IO read metrics in the repository and display surrounding context to inspect their metric types. rg -i -C 3 "(payload_index_io_read|payload_io_read|vector_io_read)" .Length of output: 49349
Metric Types Verified: No Inconsistency Found
After reviewing the surrounding code, we can confirm that the use of a GAUGE for the
payload_index_io_read
metric is intentional. The corresponding IO metrics (payload_io_read
andvector_io_read
) are defined with MetricType::COUNTER, and this difference is reflected consistently across the codebase. No modifications are needed here unless there is further documentation or design rationale that suggests the discrepancy should be revisited.
- Verified in
src/common/metrics.rs
that:
"collection_hardware_metric_payload_io_read"
and"collection_hardware_metric_vector_io_read"
useMetricType::COUNTER
"collection_hardware_metric_payload_index_io_read"
usesMetricType::GAUGE
lib/storage/src/content_manager/toc/mod.rs (1)
683-683
: Hardware metrics collection properly updated.The new
payload_index_io_read
field has been correctly added to theHardwareUsage
struct in theall_hw_metrics
method, retrieving the value from the hardware metrics collector using the appropriate getter method.lib/api/src/grpc/proto/points.proto (1)
1157-1159
: Verify field number changes won't break backward compatibility.The field numbers have been reordered, with field 4 changing from
vector_io_read
topayload_index_io_read
, andvector_io_read
moved to field 5. While there is a previous comment indicating this should be safe since clients aren't using it yet, it's worth double-checking if any existing code depends on these field numbers.Protocol Buffer field number changes can break backward compatibility if clients are using the old message format definitions.
lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (2)
4-4
: LGTM: Clean import addition.The HwMeasurementAcc import is correctly added to support the new hardware measurement parameter in the filter method.
329-333
:❓ Verification inconclusive
Implement the measurement tracking in a follow-up PR.
The
_hw_acc
parameter has been added to the filter method signature, but it's currently unused with a TODO comment. This aligns with the PR description which mentioned that certain indices (like the boolean index) would be fully implemented in a future PR.
🏁 Script executed:
#!/bin/bash # Check how many other filter methods have been updated with the hw_acc parameter # and whether they have the same TODO comment echo "Checking filter methods that have been updated with hw_acc parameter:" rg -A 2 "fn filter.*hw_acc: HwMeasurementAcc" --type rustLength of output: 208
Measurement Tracking Deferred – Confirm Future Implementation
The
_hw_acc
parameter in the filter method oflib/segment/src/index/field_index/bool_index/simple_bool_index.rs
is intentionally left unused and marked with a TODO, which aligns with the PR description stating that measurement tracking will be implemented in a follow-up PR.
- Action: In the follow-up PR, ensure that the measurement tracking is fully integrated and that its implementation is consistent with other indices (if applicable).
- Verification Note: Our repository search for similar usages of
_hw_acc
returned no additional matches. Please manually verify that this deferred approach fits with the overall design of measurement tracking across the codebase.lib/common/common/benches/mmap_hashmap.rs (3)
1-1
: LGTM: Required import for hardware counter.The HardwareCounterCell import is correctly added to support hardware metrics tracking in benchmarks.
25-25
: LGTM: Hardware counter initialization.Properly initializes the counter object that will track I/O operations during benchmarking.
27-32
: LGTM: Updated API call with hardware counter parameter.The benchmark correctly passes the hardware counter to the mmap.get() method, allowing I/O metrics to be captured during benchmark execution.
lib/storage/src/content_manager/toc/request_hw_counter.rs (2)
40-40
: LGTM: Added new I/O metric to REST API response.The payload_index_io_read field is correctly added to the REST API HardwareUsage model, ensuring the new metric is exposed via the API.
55-55
: LGTM: Added new I/O metric to gRPC API response.The payload_index_io_read field is correctly added to the gRPC API HardwareUsage message with proper type casting to u64, maintaining consistency with other hardware metrics.
lib/segment/src/index/field_index/full_text_index/text_index.rs (2)
5-5
: LGTM: Clean import addition.The HwMeasurementAcc import is correctly added to support the new hardware measurement parameter in the filter method.
318-318
:❓ Verification inconclusive
Implement the measurement tracking in a follow-up PR.
The
_hw_acc
parameter has been added to the filter method signature, but it's currently unused with a TODO comment. This aligns with the PR description which mentioned that certain indices would be fully implemented in a future PR.
🏁 Script executed:
#!/bin/bash # Verify if there are any other files in the full_text_index module # that have implemented the hw_acc parameter usage echo "Checking full text index implementations that use hw_acc parameter:" rg -A 3 "\bfilter.*hw_acc.*{" --type rust --glob "lib/segment/src/index/field_index/full_text_index/*.rs" | grep -v "TODO"Length of output: 369
Below is a revised shell script using an escaped curly brace in the regex. Please run the script to verify if any implementations using the
_hw_acc
parameter (other than the documented TODO intext_index.rs
) exist in the full text index module:If the output confirms that no additional usage of
_hw_acc
exists (aside from the indicated TODO), then the review comment remains valid. That is, the parameter is intentionally left unutilized pending future implementation of measurement tracking.Actionable Directive:
Please verify that the_hw_acc
parameter is not being used elsewhere in the full text index module beyond its declaration inlib/segment/src/index/field_index/full_text_index/text_index.rs
(line 318). Once confirmed, proceed with the follow-up PR for implementing measurement tracking.
🏁 Script executed:
#!/bin/bash echo "Searching for 'filter' method implementations using hw_acc parameter with corrected regex:" rg -A 3 "\bfilter.*hw_acc.*\\{" --type rust --glob "lib/segment/src/index/field_index/full_text_index/*.rs" | grep -v "TODO"Length of output: 442
Measurement Tracking Pending Implementation
The_hw_acc
parameter remains unused (aside from the documented TODO) in the filter method, consistent with the PR description. Our search in the full text index module produced no additional usage of this parameter. Please manually verify that_hw_acc
is not being used elsewhere beyond line 318 inlib/segment/src/index/field_index/full_text_index/text_index.rs
before proceeding to merge. A follow-up PR should implement the measurement tracking as intended.lib/segment/src/index/field_index/mmap_point_to_values.rs (3)
3-3
: Good addition of hardware counter import.The addition of the hardware counter import allows for tracking I/O operations during memory-mapped file access.
283-293
: Well-implemented I/O measurement for index access.The addition of hardware counter incrementation before calling
self.get_range(point_id)
properly tracks the overhead of accessing the memory-mapped structure. This is a good practice for accurate performance measurement.
299-301
: Good implementation of payload read measurement.Incrementing the counter with the length of bytes read captures the I/O cost of accessing values from the memory-mapped file. This provides valuable metrics for performance analysis.
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (3)
4-4
: Good addition of hardware accumulator import.The import of
HwMeasurementAcc
is necessary for the modifications to track I/O operations in the boolean index.
368-372
: Good modification of filter method signature.Adding the
hw_acc
parameter to thefilter
method aligns with the changes to thePayloadFieldIndex
trait, ensuring consistent implementation across all index types.
373-373
: Good initialization of hardware counter cell.Creating a counter cell from the accumulator follows the proper pattern for hardware measurement.
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (3)
4-4
: Good addition of hardware counter import.The import of
HardwareCounterCell
is necessary for the modifications to track I/O operations in the geo index.
239-244
: Good modification of check_values_any signature.Adding the
hw_counter
parameter to the method signature allows for propagating hardware measurement throughout the call chain.
248-251
: Good hardware counter propagation.Properly passing the hardware counter to the underlying
point_to_values.check_values_any
method ensures that I/O measurements are captured at the appropriate level of abstraction.lib/segment/src/index/field_index/field_index_base.rs (3)
4-4
: Good addition of hardware accumulator import.The import of
HwMeasurementAcc
is necessary for the modifications to track I/O operations across field indices.
49-53
: Good modification of PayloadFieldIndex trait.Updating the
filter
method signature in thePayloadFieldIndex
trait to include thehw_counter
parameter ensures consistent hardware measurement across all implementations.
247-252
: Good propagation of hardware measurement in FieldIndex.The implementation correctly passes the hardware accumulator to the underlying field index, ensuring that I/O measurements are captured properly regardless of the specific index type.
lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (1)
5-5
: Added import for hardware counter supportThe addition of the
HardwareCounterCell
import enables payload index I/O read measurements in this module.lib/segment/src/index/field_index/null_index/mmap_null_index.rs (2)
3-3
: Added import for hardware measurementsThe addition of the
HwMeasurementAcc
import enables payload index I/O measurement capability in this module.
375-382
: Updated test code to include hardware counterTests have been properly updated to pass a hardware measurement accumulator to the filter method.
lib/segment/src/index/field_index/bool_index/mod.rs (4)
1-1
: Added import for hardware measurement accumulatorThe addition of the
HwMeasurementAcc
import enables payload index I/O measurement capability in this module.
114-122
: Added hardware counter parameter to filter methodThe
filter
method in thePayloadFieldIndex
implementation forBoolIndex
now accepts anhw_acc
parameter and correctly passes it to the underlying implementations. This maintains the enum pattern while adding hardware measurement capabilities.
210-210
: Updated test code to include hardware counterTest code has been properly updated to import and use
HwMeasurementAcc
. The counter is instantiated and passed to the filter method, ensuring the tests remain functional with the new parameter.Also applies to: 285-290
346-357
: Consistently updated all test casesAll test cases that call the
filter
method have been systematically updated to include the hardware measurement accumulator. This ensures comprehensive test coverage for the modified API.Also applies to: 378-396
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (2)
5-5
: Added import for hardware counterThe addition of the
HardwareCounterCell
import enables payload index I/O read measurements in this module.
259-267
:❓ Verification inconclusive
Added disposable hardware counter with propagation TODO
The
get_token_id
method now creates a disposable hardware counter to measure I/O reads when accessing the vocabulary. The TODO comment indicates a consideration to propagate the counter from the caller in the future.Consider whether propagating the hardware counter from the caller might be more consistent with other implementations in the codebase:
🏁 Script executed:
#!/bin/bash # Check how hardware counters are propagated in other similar methods rg -A 2 -B 2 "fn get_token_id.*HardwareCounter|fn get_.*HardwareCounter" --type rustLength of output: 2069
Action: Revisit hardware counter propagation for consistency
The
get_token_id
method inlib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
currently creates a disposable hardware counter with:let hw_counter = HardwareCounterCell::disposable(); // TODO(io_measurement): Maybe propagate?However, as our verification shows, similar methods—such as the
get_iterator
implementations in both immutable and mutable map indices—propagate a caller-supplied hardware counter rather than constructing one on the spot. This discrepancy could lead to subtle differences in I/O measurement behavior.• Location:
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
, lines 259–267
• Observation: Other indices (e.g., inmap_index
modules) pass a hardware counter received from the caller, ensuring consistent accumulation of I/O measurements.
• Suggestion: Consider refactoringget_token_id
to accept a hardware counter from the caller or, at minimum, update the TODO to clarify why a disposable counter is used here compared to other implementations.lib/common/common/src/counter/hardware_counter.rs (5)
16-16
: LGTM: Adding payload_index_io_read_counter fieldThe new field for tracking I/O operations related to payload indices is well-placed within the
HardwareCounterCell
struct, consistent with other counter fields.
30-30
: LGTM: Consistent initialization in constructorsThe initialization of the new counter field in all constructor methods follows the existing pattern and ensures the field is properly initialized in all instances.
Also applies to: 46-47, 67-68, 80-81, 100-101
87-89
: LGTM: Added new_accumulator helper methodThis method provides a convenient way to create a new accumulator from the existing one, which will be useful for propagating measurement capabilities.
140-143
: LGTM: Added accessor for the new counterThe getter method for the new counter follows the established pattern in the codebase.
182-183
: LGTM: Updated get_hw_data methodThe method correctly incorporates the new field into the hardware data structure when retrieving measurements.
Also applies to: 192-193
lib/common/common/src/counter/hardware_accumulator.rs (6)
15-15
: LGTM: Added new counter field to HwSharedDrainThe new field is correctly added to the struct, maintaining consistency with other counter fields.
33-35
: LGTM: Added accessor methodThe new getter method follows the established pattern for accessing counter values.
49-50
: LGTM: Updated constructor parametersConstructor parameters properly updated to include the new counter field.
Also applies to: 57-58
69-70
: LGTM: Updated accumulation logicThe accumulate_from_hw_data method correctly handles the new counter field.
Also applies to: 77-78
89-90
: LGTM: Updated Clone and Default implementationsClone and Default implementations are properly updated to handle the new field.
Also applies to: 101-102
182-184
: LGTM: Added payload_index_io_read accessor to HwMeasurementAccThe method provides access to the counter value from the HwMeasurementAcc struct, maintaining consistency with other counter fields.
lib/common/common/src/mmap_hashmap.rs (5)
19-19
: LGTM: Added import for HardwareCounterCellThe import is correctly added to support hardware counter functionality.
61-61
: Constant value for bucket offset overheadUsing a constant for
size_of::<BucketOffset>()
provides a single source of truth that can be reused across the codebase.
247-247
: Updated method signature to include hardware counterThe
get
method now accepts a hardware counter parameter to track I/O operations.
252-255
: LGTM: Added instrumentation to track I/O readsThe code correctly increments the payload index I/O read counter at two key points:
- After retrieving the hash value, incrementing by the overhead size
- After retrieving the entry data, incrementing by the actual entry size
This provides accurate tracking of I/O read operations related to payload indices.
Also applies to: 258-261
522-523
: LGTM: Updated test cases to pass hardware counterTest cases have been properly updated to pass the hardware counter parameter to the
get
method.Also applies to: 544-545, 572-573, 577-577
lib/segment/src/index/struct_payload_index.rs (4)
7-7
: LGTM: Added import for HwMeasurementAccThe import is correctly added to support hardware measurement accumulation.
86-90
: LGTM: Updated query_field method signatureThe method now accepts a hardware counter parameter to track I/O operations during field queries.
97-97
: LGTM: Updated filter call to pass hardware counterThe hardware counter is now correctly passed to the filter method, allowing I/O tracking during filtering operations.
355-361
: LGTM: Added hardware counter initialization and usageThe code now properly creates a new accumulator and passes it to the query_field method, enabling I/O tracking during point iteration.
lib/segment/src/index/field_index/map_index/immutable_map_index.rs (3)
8-9
: Good addition of necessary imports for the hardware countersThe imports for
HardwareCounterCell
andBUCKET_OFFSET_OVERHEAD
are properly added to support the I/O measurement functionality being implemented.
253-270
: Hardware counter implementation for value checksThe implementation properly measures both fixed overhead and per-value size costs during checks. This follows the established pattern for I/O measurements.
Regarding the past discussion about line 265-266: calculating the size inside the hot path is a reasonable trade-off here. For string values with variable sizes, avoiding double iteration is worth the potential performance impact. Consider benchmarking this approach if performance becomes a concern.
314-325
: Proper implementation of hardware counters in get_iteratorThe implementation correctly measures both the fixed bucket offset overhead and the variable size costs based on the range length.
lib/segment/src/index/field_index/numeric_index/tests.rs (4)
95-99
: Added hardware measurement parameter to cardinality_requestThe function signature has been properly updated to include the hardware measurement accumulator as needed for the implementation.
106-110
: Updated filter method call to include hardware counterThe modifications properly pass the hardware counter to the filter method, enabling I/O measurements during filtering operations.
154-155
: Consistent usage of hardware counters across all test casesThe test code has been consistently updated to provide hardware measurement accumulators at all call sites.
Also applies to: 164-165, 176-177, 186-187, 197-198, 208-209
436-440
: Added hardware counter to test_cond functionThe test helper function has been properly updated to use hardware counters and pass them to the filter method.
lib/segment/src/index/field_index/map_index/mutable_map_index.rs (4)
6-7
: Added hardware counter importsNecessary imports for hardware counter functionality have been properly added.
16-16
: Updated import for MmapValueThe additional import includes the necessary constants and types for hardware counter functionality.
126-148
: Properly implemented hardware counters in check_values_anyThe implementation correctly tracks both fixed overhead and per-value size measurements, consistent with the immutable implementation.
189-203
: Hardware counter implementation in get_iteratorThe implementation properly tracks both the bucket offset overhead and the size based on the number of IDs.
lib/segment/src/index/field_index/geo_index/mod.rs (5)
6-7
: Added hardware counter importsThe necessary imports for hardware counter functionality have been properly added.
506-510
: Using owned accumulator for filter methodAs noted in the past review comment, using an owned accumulator here avoids lifetime issues and complexity, which is a reasonable approach.
514-519
: Consistent hardware counter usage in geo query filtersThe implementation correctly retrieves the counter cell from the accumulator and passes it to the check_values_any method consistently across all geo query types.
Also applies to: 525-530, 536-541
988-996
: Updated test code to use hardware countersThe test code has been properly updated to create and use a hardware counter cell.
1001-1005
: Consistent hardware counter usage in test functionsAll test functions have been updated to use hardware measurement accumulators consistently.
Also applies to: 1058-1063, 1230-1245, 1438-1443
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (2)
6-7
: No concerns with new hardware counter imports.
75-92
: Clarify short-circuit read counting incheck_values_any
.
When using.any(...)
, we short-circuit upon the first match, so subsequent values are never "read" from an I/O perspective. Please verify if partial iteration is intended for I/O metrics.lib/segment/src/index/field_index/map_index/mod.rs (6)
10-11
: New hardware counter imports look good.
119-129
: Confirm short-circuit measurement incheck_values_any
.
Similar to the numeric index usage,.any()
terminates on the first match, meaning subsequent entries are not tallied. Validate whether this partial read is intended.
187-193
:get_iterator
addition with hardware counter param is consistent.
Implementation aligns with the rest of the code; no issues found.
516-543
: Filter function updated withhw_acc
param looks correct.
The approach to capturing I/O reads inAnyVariants::Strings
paths is consistent with the rest of the code.
642-731
: Filter logic inMapIndex<UuidIntType>
now integrates hardware counter calls.
Updates reflect the same methodology as for string-based conditions. All looks good.
848-894
: Extended filter logic forMapIndex<IntPayloadType>
withhw_acc
.
The changes follow the established pattern for capturing read counters. Looks fine.lib/segment/src/index/field_index/numeric_index/mod.rs (4)
17-18
: Hardware counter imports introduced without issues.
244-257
: Check short-circuit read accounting incheck_values_any
.
.any(...)
stops early on the first match, so not all potential reads are measured. Verify if that behavior suits your I/O measurement goals.
369-383
:point_ids_by_value
method includeshw_acc
for mutable index range reading.
The approach is properly aligned with the rest of the I/O metric integration.
645-685
:filter
method upgraded to accepthw_acc
.
Change is consistent with how numeric indexes handle hardware metrics. No concerns here.lib/segment/src/index/field_index/map_index/mmap_map_index.rs (2)
8-8
: No issues with the import.
158-174
: Looks solid. Ensure test coverage for edge cases.
The function correctly integrates the hardware counter while filtering out deleted entries. Consider adding or updating tests to ensure coverage for corner cases (e.g., whenidx
is out of range or whencheck_fn
matches no values).lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (2)
13-22
: Additionalhw_acc
parameter appears consistent.
These lines cleanly introduce theHwMeasurementAcc
parameter. Implementation flow looks correct.
165-247
: No major concerns.
The “except” checker logic follows the same pattern as “any”. Looks consistent with the other changes.lib/segment/src/index/query_optimization/condition_converter.rs (3)
3-3
: Importing HwMeasurementAcc is fine.
No issues here.
39-42
: Hardware accumulator usage is logically sound.
Creating a new accumulator for each index withinfind_map
is a clean approach, ensuring that sub-checkers remain isolated.
200-206
: Match, range, and geo checkers properly receive hardware counters.
These changes consistently pass the measurement accumulator to relevant checkers, enhancing I/O tracking coverage.Also applies to: 209-210, 213-215, 217-220, 223-225
* Add payload index filtering IO measurements for some indices * Add payload index metric to api and telemetry * Also account for index access overhead * Review remarks * Anonymize new HardwareUsage field * Fix tests
Depends on #5949
Basic Payload IO measurement implementation.
Some conditions and indices (eg. bool index) are not fully implemented yet. Those will be addressed in a separate PR.