Skip to content

Conversation

JojiiOfficial
Copy link
Contributor

Depends on #5949

Basic Payload IO measurement implementation.
Some conditions and indices (eg. bool index) are not fully implemented yet. Those will be addressed in a separate PR.

@@ -490,12 +493,14 @@ impl PayloadFieldIndex for GeoMapIndex {
fn filter(
&self,
condition: &FieldCondition,
hw_acc: HwMeasurementAcc,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid lifetime issues and complexity, we pass an owned accumulator here and move it into the iterator we return from this function.
This approach is used for the same reason in other functions as well.

Comment on lines +53 to 54
// TODO(io_measurement): implement measurement
index.check_values_any(point_id, is_true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we count just 1 for a boolean check?

In the description I see that you're handling this in a separate PR, fine by me. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it is more than 1 byte though because we load a full page into memory, even if we only check one byte.
Can we maybe go probabilistic here and say 99% of the time we access the same page so we use
0.01 * PAGE_SIZE (4kb) as constant measurement.
@generall wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's already implemented with 1 byte for boolean here: https://github.com/qdrant/qdrant/pull/5954/files#diff-2bee7d830875853065cb32b763bc13bc85fca02f0dbab25f7018d251fd192d20R186

I think it's still worth reconsidering to use > 1 byte.

Copy link
Contributor

@coszio coszio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we should consider entire slices when we iterate only a few of their values.

Just in case, I just made this iterator adapter (#5985) which should make it easier (and faster considering it is not needed to have any smart pointers in the hot path) to measure the actual amount of iterations used.

Comment on lines +265 to +266
let v: &N = v.borrow();
let size = <N as MmapValue>::mmapped_size((*v).as_referenced());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we get this size outside the hot path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. If N is a str, it may always have a different size so either we do it here or iterate over the values twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how expensive this could get?
Looking around I could not find a bench for the immutable map index unfortunately

Copy link
Contributor Author

@JojiiOfficial JojiiOfficial Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't measured it directly but the size calculation is as I said unavoidable since N can be of type str.

We probably can improve performance a bit by using an integer for collecting the total size and accumulate this value into the HwCounter afterwards but I'd suggest to address this in a separate PR and focus on merging the PRs for now.

Edit: Fixed in #6137

@JojiiOfficial JojiiOfficial force-pushed the io_measurement_cleanup_plus_tests branch from 778b532 to 8e0ffeb Compare March 5, 2025 11:14
@JojiiOfficial JojiiOfficial force-pushed the payload_index_io_read_measurement branch from f15ea01 to e7e4c15 Compare March 5, 2025 11:27
@JojiiOfficial JojiiOfficial force-pushed the io_measurement_cleanup_plus_tests branch from 8263c73 to 04d84c3 Compare March 6, 2025 14:15
@JojiiOfficial JojiiOfficial force-pushed the payload_index_io_read_measurement branch 2 times, most recently from dd5ae10 to bd123ba Compare March 7, 2025 12:14
@JojiiOfficial JojiiOfficial requested a review from timvisee March 10, 2025 14:06
@JojiiOfficial JojiiOfficial force-pushed the io_measurement_cleanup_plus_tests branch from 5cef69a to 2b7ec5f Compare March 10, 2025 14:29
@JojiiOfficial JojiiOfficial force-pushed the payload_index_io_read_measurement branch from bd123ba to f10c290 Compare March 10, 2025 14:31
Base automatically changed from io_measurement_cleanup_plus_tests to dev March 12, 2025 08:59
@JojiiOfficial JojiiOfficial force-pushed the payload_index_io_read_measurement branch from f10c290 to ca799aa Compare March 12, 2025 09:03
Copy link
Contributor

coderabbitai bot commented Mar 12, 2025

📝 Walkthrough

Walkthrough

The changes introduce a new field named payload_index_io_read to track payload index input/output read operations. This field has been added to hardware usage data structures and documentation, affecting both API definitions and protocol buffers. In addition, data conversion routines and structs used for hardware metrics have been updated to include the new field, ensuring that the value is propagated through conversion layers. Multiple index components—including those handling boolean, geo, numeric, map, and full text indexing—have been modified to accept and use new hardware counter parameters. These adjustments involve changes in method signatures, the incorporation of hardware counter accumulators in logic flows, and updates in test cases to pass the newly required parameters. Furthermore, telemetry and benchmarking modules have been extended to capture and report the additional I/O metric through hardware counters.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (13)
lib/common/common/src/counter/hardware_data.rs (1)

9-9: Appropriate addition of new I/O tracking field.

The addition of payload_index_io_read properly extends the HardwareData struct to track I/O read operations specifically for payload indices. This field follows the existing naming convention and type pattern used for other I/O metrics in the struct.

Consider adding a brief comment explaining what this field specifically tracks, either above this field or in the struct's documentation comment, to help future developers understand its purpose more clearly.

docs/grpc/docs.md (1)

3037-3037: New Field Added: payload_index_io_read

The new field is added to the HardwareUsage table with type [uint64](#uint64), and its placement between the existing payload IO read/write metrics and vector IO metrics clearly documents the additional measurement for payload index read operations. Ensure that its semantics are clearly explained in accompanying documentation or API change logs so that users understand the distinction from the other IO metrics.

lib/segment/src/index/field_index/mmap_point_to_values.rs (1)

176-177: Good constant definition for measuring overhead.

The MMAP_PTV_ACCESS_OVERHEAD constant appropriately represents the memory and I/O overhead associated with accessing the memory-mapped index. Using the size of MmapRange provides a consistent measurement of this overhead.

However, I notice you're using size_of without importing it. You should add:

+use std::mem::size_of;

at the top of the file.

lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (1)

74-86: Hardware counter integration for I/O measurement

The check_values_any method now accepts a hardware counter parameter and uses it to measure payload index I/O reads by tracking the size of each value accessed.

There appears to be a missing import for size_of_val which should come from std::mem.

+use std::mem::size_of_val;
 use std::collections::{BTreeMap, HashSet};
 use std::path::PathBuf;
 use std::sync::Arc;
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (1)

208-212: Added hardware counter parameter with TODO comment

The filter method now accepts a HwMeasurementAcc parameter, though it's not being utilized yet (indicated by the underscore prefix). The TODO comment suggests future implementation to collect values.

Would you like me to propose an implementation for the TODO that would track I/O reads in this filter method?

lib/segment/src/index/struct_payload_index.rs (1)

520-520: Consider implementing IO measurements for index operations

There's a TODO comment about adding measurements to the index during payload operations. Since you're already implementing I/O tracking for payload indices, this would be a good opportunity to address this TODO in a future PR.

Would you like suggestions on how to implement I/O measurements for index operations in a consistent way with the current approach?

lib/segment/src/index/field_index/map_index/immutable_map_index.rs (1)

304-312: Consider propagating the hardware counter throughout the call stack

There's a TODO comment about potentially propagating the hardware counter. Creating a disposable counter here means measurements at this level won't be aggregated with parent operations.

Consider modifying the method signature to accept a hardware counter parameter and propagate it through the method call stack. This would allow for more accurate hierarchical measurement of I/O operations.

-pub fn iter_values_map(&self) -> impl Iterator<Item = (&N, IdIter<'_>)> + '_ {
-    let hw_counter = HardwareCounterCell::disposable(); // TODO(io_measurements): Propagate?
+pub fn iter_values_map(&self, hw_counter: &HardwareCounterCell) -> impl Iterator<Item = (&N, IdIter<'_>)> + '_ {
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (1)

110-125: Consider alternatives to .inspect() for side-effect measurement.
.inspect() is acceptable, but for production telemetry, an explicit loop or aggregator could provide more intentional side-effect behavior.

lib/segment/src/index/field_index/map_index/mod.rs (1)

407-420: Reminder to address the disposable hardware counter.
HardwareCounterCell::disposable() in except_set might leave out aggregated metrics. Consider propagating HwMeasurementAcc from higher-level calls.

lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)

210-211: Propagate the hardware counter instead of using a disposable one.
The TODO indicates that a local disposable counter is used. For consistent I/O measurement across the call stack, pass an existing hardware counter from the caller.

- let hw_counter = HardwareCounterCell::disposable(); // TODO(io_measurement): Propagate.
+ // Example of using a propagated hardware counter rather than a disposable one:
+ // let hw_counter = incoming_hw_counter.fork();
lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (2)

29-55: Implement measurement for Boolean values.
For the BoolIndex branch, measurements are still marked with a TODO. This might skew overall statistics when boolean checks occur frequently.

Would you like me to help integrate the hardware counter here, similar to the other index types?


92-127: Consider refactoring repeated pattern logic.
The code for handling list.len() < INDEXSET_ITER_THRESHOLD vs. bigger lists is repeated. You could unify it into a helper function or macro for improved maintainability.

lib/segment/src/index/query_optimization/condition_converter.rs (1)

329-333: Unify float and datetime range logic if desired.
The repeated pattern for range checks between floats and datetimes could be consolidated to reduce duplication.

Also applies to: 335-336, 343-345, 350-355, 376-378

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9a6a96 and ca799aa.

📒 Files selected for processing (38)
  • docs/grpc/docs.md (1 hunks)
  • docs/redoc/master/openapi.json (2 hunks)
  • lib/api/src/grpc/conversions.rs (3 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (1 hunks)
  • lib/api/src/rest/models.rs (2 hunks)
  • lib/common/common/benches/mmap_hashmap.rs (2 hunks)
  • lib/common/common/src/counter/hardware_accumulator.rs (7 hunks)
  • lib/common/common/src/counter/hardware_counter.rs (10 hunks)
  • lib/common/common/src/counter/hardware_data.rs (1 hunks)
  • lib/common/common/src/mmap_hashmap.rs (7 hunks)
  • lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (2 hunks)
  • lib/segment/src/index/field_index/bool_index/mod.rs (6 hunks)
  • lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (2 hunks)
  • lib/segment/src/index/field_index/field_index_base.rs (3 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (2 hunks)
  • lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (3 hunks)
  • lib/segment/src/index/field_index/full_text_index/text_index.rs (2 hunks)
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (2 hunks)
  • lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (2 hunks)
  • lib/segment/src/index/field_index/geo_index/mod.rs (10 hunks)
  • lib/segment/src/index/field_index/map_index/immutable_map_index.rs (4 hunks)
  • lib/segment/src/index/field_index/map_index/mmap_map_index.rs (4 hunks)
  • lib/segment/src/index/field_index/map_index/mod.rs (10 hunks)
  • lib/segment/src/index/field_index/map_index/mutable_map_index.rs (4 hunks)
  • lib/segment/src/index/field_index/mmap_point_to_values.rs (3 hunks)
  • lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3 hunks)
  • lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (2 hunks)
  • lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (2 hunks)
  • lib/segment/src/index/field_index/numeric_index/mod.rs (7 hunks)
  • lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (4 hunks)
  • lib/segment/src/index/field_index/numeric_index/tests.rs (10 hunks)
  • lib/segment/src/index/query_optimization/condition_converter.rs (8 hunks)
  • lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (7 hunks)
  • lib/segment/src/index/struct_payload_index.rs (3 hunks)
  • lib/storage/src/content_manager/toc/mod.rs (1 hunks)
  • lib/storage/src/content_manager/toc/request_hw_counter.rs (2 hunks)
  • src/common/metrics.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test-consensus
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
🔇 Additional comments (101)
docs/redoc/master/openapi.json (1)

12640-12666: Field addition for payload index I/O measurement looks good.

The new field payload_index_io_read has been correctly added to the HardwareUsage object, properly defined as an unsigned integer with a minimum value of 0. This addition aligns with the PR objective of implementing I/O read measurements for payload indices.

The field is appropriately placed between other payload-related fields and vector-related fields, maintaining a logical organization. It's also correctly included in the required fields array, ensuring this metric will always be present in hardware usage reports.

This change enables tracking the amount of I/O read operations specifically performed on payload indices during request processing, enhancing the system's observability.

lib/api/src/rest/models.rs (2)

58-58: New hardware metric for payload index IO reads.

The addition of this field enhances the monitoring capabilities for tracking I/O operations specific to payload indexes.


101-101: Consistent implementation in anonymize method.

The new field is properly included in the anonymize method, maintaining consistency with the other hardware usage metrics.

lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (2)

6-6: Import added for hardware counter.

Appropriate import for the HardwareCounterCell type that will be used to track IO operations.


181-192:

❓ Verification inconclusive

Added hardware counter for IO read tracking.

The method signature has been updated to include a hardware counter parameter, and the implementation properly increments the payload index IO read counter based on the size of each value being checked.


🏁 Script executed:

#!/bin/bash
# Check if this API change is consistently applied throughout the codebase
rg -A 2 -B 2 "check_values_any\s*\(\s*.*\s*,\s*.*\s*,\s*&Hardware" --type rust

Length of output: 80


API Change Reviewed: Hardware Counter Integration for IO Reads

The updated check_values_any method in lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs now accepts a hardware counter parameter (hw_counter: &HardwareCounterCell) and correctly increments the payload IO read counter based on the size of each value processed. While this change appears correctly implemented in the shown snippet, please manually verify that all call sites of check_values_any across the codebase have been updated to pass the appropriate hardware counter.

lib/api/src/grpc/conversions.rs (2)

2728-2728: Added payload_index_io_read to HardwareUsage conversion.

The new field is correctly included in the conversion from HwMeasurementAcc to HardwareUsage, ensuring the metric is properly propagated.


2741-2741: Added payload_index_io_read to HardwareData conversion.

The new field is consistently included in the conversion from HardwareUsage to HardwareData, maintaining the flow of the metric through different data structures.

Also applies to: 2750-2750

lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (3)

106-106: Added hardware counter import for tests.

Import of HwMeasurementAcc for use in the test cases.


151-152: Created hardware counter for test cases.

Hardware counter instances are created for use in the test cases to comply with the updated filter method signature.

Also applies to: 212-212


154-157: Updated filter method calls to include hardware counter.

All calls to the filter method have been updated to pass the hardware counter parameter, ensuring the tests work with the updated API.

Also applies to: 161-164, 168-171, 179-184, 215-218, 222-225, 231-234, 240-243

lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (2)

5-5: Import looks good.

The addition of the hardware counter import is appropriate for the functionality being implemented.


201-216: Implementation of hardware counter for IO measurement looks correct.

The hw_counter parameter has been properly added to the check_values_any method and is correctly passed to the inner point_to_values.check_values_any call when checking values. This change enables tracking I/O read operations related to payload indices.

src/common/metrics.rs (2)

334-334: Hardware usage field addition looks good.

The payload_index_io_read field has been properly added to the HardwareUsage struct destructuring pattern.


353-358:

✅ Verification successful

Verify metric type for consistency.

The new payload_index_io_read metric is defined as a GAUGE while other IO metrics like payload_io_read and vector_io_read are defined as COUNTER. This might be intentional, but it's worth verifying this choice for consistency.


🏁 Script executed:

#!/bin/bash
# Check how other IO metrics are defined in the codebase
grep -r "metric_.*_io_" --include="*.rs" . | grep -E "GAUGE|COUNTER"

Length of output: 70


Please run the above script to verify how the different IO metrics are declared (i.e. whether they use GAUGE or COUNTER) and share the results.


🏁 Script executed:

#!/bin/bash
# Search for occurrences of the IO read metrics in the repository and display surrounding context to inspect their metric types.
rg -i -C 3 "(payload_index_io_read|payload_io_read|vector_io_read)" .

Length of output: 49349


Metric Types Verified: No Inconsistency Found

After reviewing the surrounding code, we can confirm that the use of a GAUGE for the payload_index_io_read metric is intentional. The corresponding IO metrics (payload_io_read and vector_io_read) are defined with MetricType::COUNTER, and this difference is reflected consistently across the codebase. No modifications are needed here unless there is further documentation or design rationale that suggests the discrepancy should be revisited.

  • Verified in src/common/metrics.rs that:
    • "collection_hardware_metric_payload_io_read" and "collection_hardware_metric_vector_io_read" use MetricType::COUNTER
    • "collection_hardware_metric_payload_index_io_read" uses MetricType::GAUGE
lib/storage/src/content_manager/toc/mod.rs (1)

683-683: Hardware metrics collection properly updated.

The new payload_index_io_read field has been correctly added to the HardwareUsage struct in the all_hw_metrics method, retrieving the value from the hardware metrics collector using the appropriate getter method.

lib/api/src/grpc/proto/points.proto (1)

1157-1159: Verify field number changes won't break backward compatibility.

The field numbers have been reordered, with field 4 changing from vector_io_read to payload_index_io_read, and vector_io_read moved to field 5. While there is a previous comment indicating this should be safe since clients aren't using it yet, it's worth double-checking if any existing code depends on these field numbers.

Protocol Buffer field number changes can break backward compatibility if clients are using the old message format definitions.

lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (2)

4-4: LGTM: Clean import addition.

The HwMeasurementAcc import is correctly added to support the new hardware measurement parameter in the filter method.


329-333:

❓ Verification inconclusive

Implement the measurement tracking in a follow-up PR.

The _hw_acc parameter has been added to the filter method signature, but it's currently unused with a TODO comment. This aligns with the PR description which mentioned that certain indices (like the boolean index) would be fully implemented in a future PR.


🏁 Script executed:

#!/bin/bash
# Check how many other filter methods have been updated with the hw_acc parameter
# and whether they have the same TODO comment

echo "Checking filter methods that have been updated with hw_acc parameter:"
rg -A 2 "fn filter.*hw_acc: HwMeasurementAcc" --type rust

Length of output: 208


Measurement Tracking Deferred – Confirm Future Implementation

The _hw_acc parameter in the filter method of lib/segment/src/index/field_index/bool_index/simple_bool_index.rs is intentionally left unused and marked with a TODO, which aligns with the PR description stating that measurement tracking will be implemented in a follow-up PR.

  • Action: In the follow-up PR, ensure that the measurement tracking is fully integrated and that its implementation is consistent with other indices (if applicable).
  • Verification Note: Our repository search for similar usages of _hw_acc returned no additional matches. Please manually verify that this deferred approach fits with the overall design of measurement tracking across the codebase.
lib/common/common/benches/mmap_hashmap.rs (3)

1-1: LGTM: Required import for hardware counter.

The HardwareCounterCell import is correctly added to support hardware metrics tracking in benchmarks.


25-25: LGTM: Hardware counter initialization.

Properly initializes the counter object that will track I/O operations during benchmarking.


27-32: LGTM: Updated API call with hardware counter parameter.

The benchmark correctly passes the hardware counter to the mmap.get() method, allowing I/O metrics to be captured during benchmark execution.

lib/storage/src/content_manager/toc/request_hw_counter.rs (2)

40-40: LGTM: Added new I/O metric to REST API response.

The payload_index_io_read field is correctly added to the REST API HardwareUsage model, ensuring the new metric is exposed via the API.


55-55: LGTM: Added new I/O metric to gRPC API response.

The payload_index_io_read field is correctly added to the gRPC API HardwareUsage message with proper type casting to u64, maintaining consistency with other hardware metrics.

lib/segment/src/index/field_index/full_text_index/text_index.rs (2)

5-5: LGTM: Clean import addition.

The HwMeasurementAcc import is correctly added to support the new hardware measurement parameter in the filter method.


318-318:

❓ Verification inconclusive

Implement the measurement tracking in a follow-up PR.

The _hw_acc parameter has been added to the filter method signature, but it's currently unused with a TODO comment. This aligns with the PR description which mentioned that certain indices would be fully implemented in a future PR.


🏁 Script executed:

#!/bin/bash
# Verify if there are any other files in the full_text_index module 
# that have implemented the hw_acc parameter usage

echo "Checking full text index implementations that use hw_acc parameter:"
rg -A 3 "\bfilter.*hw_acc.*{" --type rust --glob "lib/segment/src/index/field_index/full_text_index/*.rs" | grep -v "TODO"

Length of output: 369


Below is a revised shell script using an escaped curly brace in the regex. Please run the script to verify if any implementations using the _hw_acc parameter (other than the documented TODO in text_index.rs) exist in the full text index module:

If the output confirms that no additional usage of _hw_acc exists (aside from the indicated TODO), then the review comment remains valid. That is, the parameter is intentionally left unutilized pending future implementation of measurement tracking.

Actionable Directive:
Please verify that the _hw_acc parameter is not being used elsewhere in the full text index module beyond its declaration in lib/segment/src/index/field_index/full_text_index/text_index.rs (line 318). Once confirmed, proceed with the follow-up PR for implementing measurement tracking.


🏁 Script executed:

#!/bin/bash
echo "Searching for 'filter' method implementations using hw_acc parameter with corrected regex:"
rg -A 3 "\bfilter.*hw_acc.*\\{" --type rust --glob "lib/segment/src/index/field_index/full_text_index/*.rs" | grep -v "TODO"

Length of output: 442


Measurement Tracking Pending Implementation
The _hw_acc parameter remains unused (aside from the documented TODO) in the filter method, consistent with the PR description. Our search in the full text index module produced no additional usage of this parameter. Please manually verify that _hw_acc is not being used elsewhere beyond line 318 in lib/segment/src/index/field_index/full_text_index/text_index.rs before proceeding to merge. A follow-up PR should implement the measurement tracking as intended.

lib/segment/src/index/field_index/mmap_point_to_values.rs (3)

3-3: Good addition of hardware counter import.

The addition of the hardware counter import allows for tracking I/O operations during memory-mapped file access.


283-293: Well-implemented I/O measurement for index access.

The addition of hardware counter incrementation before calling self.get_range(point_id) properly tracks the overhead of accessing the memory-mapped structure. This is a good practice for accurate performance measurement.


299-301: Good implementation of payload read measurement.

Incrementing the counter with the length of bytes read captures the I/O cost of accessing values from the memory-mapped file. This provides valuable metrics for performance analysis.

lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (3)

4-4: Good addition of hardware accumulator import.

The import of HwMeasurementAcc is necessary for the modifications to track I/O operations in the boolean index.


368-372: Good modification of filter method signature.

Adding the hw_acc parameter to the filter method aligns with the changes to the PayloadFieldIndex trait, ensuring consistent implementation across all index types.


373-373: Good initialization of hardware counter cell.

Creating a counter cell from the accumulator follows the proper pattern for hardware measurement.

lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (3)

4-4: Good addition of hardware counter import.

The import of HardwareCounterCell is necessary for the modifications to track I/O operations in the geo index.


239-244: Good modification of check_values_any signature.

Adding the hw_counter parameter to the method signature allows for propagating hardware measurement throughout the call chain.


248-251: Good hardware counter propagation.

Properly passing the hardware counter to the underlying point_to_values.check_values_any method ensures that I/O measurements are captured at the appropriate level of abstraction.

lib/segment/src/index/field_index/field_index_base.rs (3)

4-4: Good addition of hardware accumulator import.

The import of HwMeasurementAcc is necessary for the modifications to track I/O operations across field indices.


49-53: Good modification of PayloadFieldIndex trait.

Updating the filter method signature in the PayloadFieldIndex trait to include the hw_counter parameter ensures consistent hardware measurement across all implementations.


247-252: Good propagation of hardware measurement in FieldIndex.

The implementation correctly passes the hardware accumulator to the underlying field index, ensuring that I/O measurements are captured properly regardless of the specific index type.

lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (1)

5-5: Added import for hardware counter support

The addition of the HardwareCounterCell import enables payload index I/O read measurements in this module.

lib/segment/src/index/field_index/null_index/mmap_null_index.rs (2)

3-3: Added import for hardware measurements

The addition of the HwMeasurementAcc import enables payload index I/O measurement capability in this module.


375-382: Updated test code to include hardware counter

Tests have been properly updated to pass a hardware measurement accumulator to the filter method.

lib/segment/src/index/field_index/bool_index/mod.rs (4)

1-1: Added import for hardware measurement accumulator

The addition of the HwMeasurementAcc import enables payload index I/O measurement capability in this module.


114-122: Added hardware counter parameter to filter method

The filter method in the PayloadFieldIndex implementation for BoolIndex now accepts an hw_acc parameter and correctly passes it to the underlying implementations. This maintains the enum pattern while adding hardware measurement capabilities.


210-210: Updated test code to include hardware counter

Test code has been properly updated to import and use HwMeasurementAcc. The counter is instantiated and passed to the filter method, ensuring the tests remain functional with the new parameter.

Also applies to: 285-290


346-357: Consistently updated all test cases

All test cases that call the filter method have been systematically updated to include the hardware measurement accumulator. This ensures comprehensive test coverage for the modified API.

Also applies to: 378-396

lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (2)

5-5: Added import for hardware counter

The addition of the HardwareCounterCell import enables payload index I/O read measurements in this module.


259-267:

❓ Verification inconclusive

Added disposable hardware counter with propagation TODO

The get_token_id method now creates a disposable hardware counter to measure I/O reads when accessing the vocabulary. The TODO comment indicates a consideration to propagate the counter from the caller in the future.

Consider whether propagating the hardware counter from the caller might be more consistent with other implementations in the codebase:


🏁 Script executed:

#!/bin/bash
# Check how hardware counters are propagated in other similar methods
rg -A 2 -B 2 "fn get_token_id.*HardwareCounter|fn get_.*HardwareCounter" --type rust

Length of output: 2069


Action: Revisit hardware counter propagation for consistency

The get_token_id method in lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs currently creates a disposable hardware counter with:

let hw_counter = HardwareCounterCell::disposable(); // TODO(io_measurement): Maybe propagate?

However, as our verification shows, similar methods—such as the get_iterator implementations in both immutable and mutable map indices—propagate a caller-supplied hardware counter rather than constructing one on the spot. This discrepancy could lead to subtle differences in I/O measurement behavior.

Location: lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs, lines 259–267
Observation: Other indices (e.g., in map_index modules) pass a hardware counter received from the caller, ensuring consistent accumulation of I/O measurements.
Suggestion: Consider refactoring get_token_id to accept a hardware counter from the caller or, at minimum, update the TODO to clarify why a disposable counter is used here compared to other implementations.

lib/common/common/src/counter/hardware_counter.rs (5)

16-16: LGTM: Adding payload_index_io_read_counter field

The new field for tracking I/O operations related to payload indices is well-placed within the HardwareCounterCell struct, consistent with other counter fields.


30-30: LGTM: Consistent initialization in constructors

The initialization of the new counter field in all constructor methods follows the existing pattern and ensures the field is properly initialized in all instances.

Also applies to: 46-47, 67-68, 80-81, 100-101


87-89: LGTM: Added new_accumulator helper method

This method provides a convenient way to create a new accumulator from the existing one, which will be useful for propagating measurement capabilities.


140-143: LGTM: Added accessor for the new counter

The getter method for the new counter follows the established pattern in the codebase.


182-183: LGTM: Updated get_hw_data method

The method correctly incorporates the new field into the hardware data structure when retrieving measurements.

Also applies to: 192-193

lib/common/common/src/counter/hardware_accumulator.rs (6)

15-15: LGTM: Added new counter field to HwSharedDrain

The new field is correctly added to the struct, maintaining consistency with other counter fields.


33-35: LGTM: Added accessor method

The new getter method follows the established pattern for accessing counter values.


49-50: LGTM: Updated constructor parameters

Constructor parameters properly updated to include the new counter field.

Also applies to: 57-58


69-70: LGTM: Updated accumulation logic

The accumulate_from_hw_data method correctly handles the new counter field.

Also applies to: 77-78


89-90: LGTM: Updated Clone and Default implementations

Clone and Default implementations are properly updated to handle the new field.

Also applies to: 101-102


182-184: LGTM: Added payload_index_io_read accessor to HwMeasurementAcc

The method provides access to the counter value from the HwMeasurementAcc struct, maintaining consistency with other counter fields.

lib/common/common/src/mmap_hashmap.rs (5)

19-19: LGTM: Added import for HardwareCounterCell

The import is correctly added to support hardware counter functionality.


61-61: Constant value for bucket offset overhead

Using a constant for size_of::<BucketOffset>() provides a single source of truth that can be reused across the codebase.


247-247: Updated method signature to include hardware counter

The get method now accepts a hardware counter parameter to track I/O operations.


252-255: LGTM: Added instrumentation to track I/O reads

The code correctly increments the payload index I/O read counter at two key points:

  1. After retrieving the hash value, incrementing by the overhead size
  2. After retrieving the entry data, incrementing by the actual entry size

This provides accurate tracking of I/O read operations related to payload indices.

Also applies to: 258-261


522-523: LGTM: Updated test cases to pass hardware counter

Test cases have been properly updated to pass the hardware counter parameter to the get method.

Also applies to: 544-545, 572-573, 577-577

lib/segment/src/index/struct_payload_index.rs (4)

7-7: LGTM: Added import for HwMeasurementAcc

The import is correctly added to support hardware measurement accumulation.


86-90: LGTM: Updated query_field method signature

The method now accepts a hardware counter parameter to track I/O operations during field queries.


97-97: LGTM: Updated filter call to pass hardware counter

The hardware counter is now correctly passed to the filter method, allowing I/O tracking during filtering operations.


355-361: LGTM: Added hardware counter initialization and usage

The code now properly creates a new accumulator and passes it to the query_field method, enabling I/O tracking during point iteration.

lib/segment/src/index/field_index/map_index/immutable_map_index.rs (3)

8-9: Good addition of necessary imports for the hardware counters

The imports for HardwareCounterCell and BUCKET_OFFSET_OVERHEAD are properly added to support the I/O measurement functionality being implemented.


253-270: Hardware counter implementation for value checks

The implementation properly measures both fixed overhead and per-value size costs during checks. This follows the established pattern for I/O measurements.

Regarding the past discussion about line 265-266: calculating the size inside the hot path is a reasonable trade-off here. For string values with variable sizes, avoiding double iteration is worth the potential performance impact. Consider benchmarking this approach if performance becomes a concern.


314-325: Proper implementation of hardware counters in get_iterator

The implementation correctly measures both the fixed bucket offset overhead and the variable size costs based on the range length.

lib/segment/src/index/field_index/numeric_index/tests.rs (4)

95-99: Added hardware measurement parameter to cardinality_request

The function signature has been properly updated to include the hardware measurement accumulator as needed for the implementation.


106-110: Updated filter method call to include hardware counter

The modifications properly pass the hardware counter to the filter method, enabling I/O measurements during filtering operations.


154-155: Consistent usage of hardware counters across all test cases

The test code has been consistently updated to provide hardware measurement accumulators at all call sites.

Also applies to: 164-165, 176-177, 186-187, 197-198, 208-209


436-440: Added hardware counter to test_cond function

The test helper function has been properly updated to use hardware counters and pass them to the filter method.

lib/segment/src/index/field_index/map_index/mutable_map_index.rs (4)

6-7: Added hardware counter imports

Necessary imports for hardware counter functionality have been properly added.


16-16: Updated import for MmapValue

The additional import includes the necessary constants and types for hardware counter functionality.


126-148: Properly implemented hardware counters in check_values_any

The implementation correctly tracks both fixed overhead and per-value size measurements, consistent with the immutable implementation.


189-203: Hardware counter implementation in get_iterator

The implementation properly tracks both the bucket offset overhead and the size based on the number of IDs.

lib/segment/src/index/field_index/geo_index/mod.rs (5)

6-7: Added hardware counter imports

The necessary imports for hardware counter functionality have been properly added.


506-510: Using owned accumulator for filter method

As noted in the past review comment, using an owned accumulator here avoids lifetime issues and complexity, which is a reasonable approach.


514-519: Consistent hardware counter usage in geo query filters

The implementation correctly retrieves the counter cell from the accumulator and passes it to the check_values_any method consistently across all geo query types.

Also applies to: 525-530, 536-541


988-996: Updated test code to use hardware counters

The test code has been properly updated to create and use a hardware counter cell.


1001-1005: Consistent hardware counter usage in test functions

All test functions have been updated to use hardware measurement accumulators consistently.

Also applies to: 1058-1063, 1230-1245, 1438-1443

lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (2)

6-7: No concerns with new hardware counter imports.


75-92: Clarify short-circuit read counting in check_values_any.
When using .any(...), we short-circuit upon the first match, so subsequent values are never "read" from an I/O perspective. Please verify if partial iteration is intended for I/O metrics.

lib/segment/src/index/field_index/map_index/mod.rs (6)

10-11: New hardware counter imports look good.


119-129: Confirm short-circuit measurement in check_values_any.
Similar to the numeric index usage, .any() terminates on the first match, meaning subsequent entries are not tallied. Validate whether this partial read is intended.


187-193: get_iterator addition with hardware counter param is consistent.
Implementation aligns with the rest of the code; no issues found.


516-543: Filter function updated with hw_acc param looks correct.
The approach to capturing I/O reads in AnyVariants::Strings paths is consistent with the rest of the code.


642-731: Filter logic in MapIndex<UuidIntType> now integrates hardware counter calls.
Updates reflect the same methodology as for string-based conditions. All looks good.


848-894: Extended filter logic for MapIndex<IntPayloadType> with hw_acc.
The changes follow the established pattern for capturing read counters. Looks fine.

lib/segment/src/index/field_index/numeric_index/mod.rs (4)

17-18: Hardware counter imports introduced without issues.


244-257: Check short-circuit read accounting in check_values_any.
.any(...) stops early on the first match, so not all potential reads are measured. Verify if that behavior suits your I/O measurement goals.


369-383: point_ids_by_value method includes hw_acc for mutable index range reading.
The approach is properly aligned with the rest of the I/O metric integration.


645-685: filter method upgraded to accept hw_acc.
Change is consistent with how numeric indexes handle hardware metrics. No concerns here.

lib/segment/src/index/field_index/map_index/mmap_map_index.rs (2)

8-8: No issues with the import.


158-174: Looks solid. Ensure test coverage for edge cases.
The function correctly integrates the hardware counter while filtering out deleted entries. Consider adding or updating tests to ensure coverage for corner cases (e.g., when idx is out of range or when check_fn matches no values).

lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (2)

13-22: Additional hw_acc parameter appears consistent.
These lines cleanly introduce the HwMeasurementAcc parameter. Implementation flow looks correct.


165-247: No major concerns.
The “except” checker logic follows the same pattern as “any”. Looks consistent with the other changes.

lib/segment/src/index/query_optimization/condition_converter.rs (3)

3-3: Importing HwMeasurementAcc is fine.
No issues here.


39-42: Hardware accumulator usage is logically sound.
Creating a new accumulator for each index within find_map is a clean approach, ensuring that sub-checkers remain isolated.


200-206: Match, range, and geo checkers properly receive hardware counters.
These changes consistently pass the measurement accumulator to relevant checkers, enhancing I/O tracking coverage.

Also applies to: 209-210, 213-215, 217-220, 223-225

@JojiiOfficial JojiiOfficial merged commit faeba71 into dev Mar 12, 2025
17 checks passed
@JojiiOfficial JojiiOfficial deleted the payload_index_io_read_measurement branch March 12, 2025 13:31
timvisee pushed a commit that referenced this pull request Mar 21, 2025
* Add payload index filtering IO measurements for some indices

* Add payload index metric to api and telemetry

* Also account for index access overhead

* Review remarks

* Anonymize new HardwareUsage field

* Fix tests
This was referenced Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants