Skip to content

Conversation

JojiiOfficial
Copy link
Contributor

Depends on #5954.

There are places where we would benefit from #5985 which have been left out and marked with a TODO.

@JojiiOfficial JojiiOfficial force-pushed the payload_index_estimate_cardinality_io_measurements branch from 6216639 to 570cfd0 Compare March 6, 2025 09:38
@JojiiOfficial JojiiOfficial force-pushed the payload_fulltext_index_io_read_measurement branch from c58bbae to 00c3999 Compare March 6, 2025 14:27
@JojiiOfficial JojiiOfficial force-pushed the payload_index_estimate_cardinality_io_measurements branch from 570cfd0 to e5cd057 Compare March 6, 2025 14:32
@JojiiOfficial JojiiOfficial force-pushed the payload_fulltext_index_io_read_measurement branch from 00c3999 to 57412ea Compare March 6, 2025 14:47
@JojiiOfficial JojiiOfficial force-pushed the payload_index_estimate_cardinality_io_measurements branch from e5cd057 to 8de3c8a Compare March 6, 2025 14:49
@JojiiOfficial JojiiOfficial force-pushed the payload_fulltext_index_io_read_measurement branch from 57412ea to 874eb0e Compare March 7, 2025 12:17
@JojiiOfficial JojiiOfficial force-pushed the payload_index_estimate_cardinality_io_measurements branch 2 times, most recently from df104ed to ed264d0 Compare March 7, 2025 14:37
@JojiiOfficial JojiiOfficial force-pushed the payload_fulltext_index_io_read_measurement branch from 874eb0e to d57c503 Compare March 10, 2025 14:37
@JojiiOfficial JojiiOfficial force-pushed the payload_index_estimate_cardinality_io_measurements branch from ed264d0 to 6f3ecd4 Compare March 10, 2025 14:38
@JojiiOfficial JojiiOfficial force-pushed the payload_fulltext_index_io_read_measurement branch from d57c503 to eabc66d Compare March 12, 2025 13:33
@JojiiOfficial JojiiOfficial force-pushed the payload_index_estimate_cardinality_io_measurements branch from 6f3ecd4 to 6eb4428 Compare March 12, 2025 13:51
@JojiiOfficial JojiiOfficial requested a review from coszio March 12, 2025 14:47
let len = self.postings.get(token_id as usize).map(|p| p.len());
hw_counter
.payload_index_io_read_counter()
.incr_delta(size_of::<Option<CompressedPostingList>>() + len.unwrap_or(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompressedPostingList::len() only reads the len of two vecs, does that still count toward io reads?

.map(|x| x.len());
hw_counter
.payload_index_io_read_counter()
.incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this len only reads the len of 1 vec

Base automatically changed from payload_fulltext_index_io_read_measurement to dev March 13, 2025 08:32
@JojiiOfficial JojiiOfficial force-pushed the payload_index_estimate_cardinality_io_measurements branch from 3d2b048 to c766539 Compare March 13, 2025 09:09
@JojiiOfficial JojiiOfficial requested a review from coszio March 13, 2025 09:10
Copy link
Contributor

coderabbitai bot commented Mar 13, 2025

📝 Walkthrough

Walkthrough

The changes consistently add a new parameter—a reference to a hardware counter—to numerous method signatures across the codebase. Methods related to cardinality estimation, point counting, and filtering now require an extra argument of type &HardwareCounterCell, and existing calls have been updated to pass this new parameter. The modifications span collection management, shard operations, various index implementations (including full-text, geo, numeric, and map indices), vector search functionalities, and even parts of the integration tests. Additionally, minor adjustments such as typographical corrections in parameter names and the removal of an obsolete comment have been made. Each update ensures that hardware metrics are incorporated into performance tracking or counting logic without altering the core business logic of the methods involved.

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 039f0ac and d6b5798.

📒 Files selected for processing (2)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (2 hunks)
  • lib/segment/src/index/field_index/numeric_index/mod.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/segment/src/index/field_index/numeric_index/mod.rs
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (2)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)

288-301: ⚠️ Potential issue

Hardware counter integration for values_of_hash method.

Similar to the previous method, this correctly implements hardware counter tracking for IO operations.

The same issue with missing size_of import applies here.

Make sure to add the import as suggested in the previous comment.

lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)

274-288: 💡 Verification agent

🧩 Analysis chain

Added placeholder for hardware measurement in iter_values_map

The TODO comment indicates this is a placeholder for future implementation. Consider defining a timeline or creating an issue to track the implementation of value collection in the hardware measurement.


🏁 Script executed:

#!/bin/bash
# Check if there are other similar TODO comments
echo "Searching for similar TODO comments..."
rg "TODO\(io_measurement\): Collect values" --stats

Length of output: 1027


Action Required: Establish Tracking for Hardware Measurement Implementation

The placeholder for hardware measurement in lib/segment/src/index/field_index/map_index/mmap_map_index.rs (lines 274–288) is consistently used as a temporary stub. Note that similar TODO(io_measurement): Collect values markers exist in the following files:

  • lib/segment/src/index/field_index/null_index/mmap_null_index.rs
  • lib/segment/src/index/field_index/map_index/mutable_map_index.rs
  • lib/segment/src/index/field_index/map_index/immutable_map_index.rs

Please consider defining a timeline or creating a dedicated issue to track the implementation of value collection for hardware measurements across these modules.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eeca054 and c766539.

📒 Files selected for processing (47)
  • lib/collection/src/collection_manager/holders/proxy_segment.rs (1 hunks)
  • lib/collection/src/collection_manager/segments_searcher.rs (2 hunks)
  • lib/collection/src/shards/forward_proxy_shard.rs (2 hunks)
  • lib/collection/src/shards/local_shard/mod.rs (1 hunks)
  • lib/collection/src/shards/local_shard/shard_ops.rs (1 hunks)
  • lib/collection/src/shards/proxy_shard.rs (3 hunks)
  • lib/collection/src/shards/queue_proxy_shard.rs (2 hunks)
  • lib/collection/src/shards/replica_set/update.rs (1 hunks)
  • lib/collection/src/shards/shard.rs (2 hunks)
  • lib/quantization/src/encoded_vectors.rs (1 hunks)
  • lib/segment/benches/conditional_search.rs (1 hunks)
  • lib/segment/src/entry/entry_point.rs (1 hunks)
  • lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (1 hunks)
  • lib/segment/src/index/field_index/bool_index/mod.rs (4 hunks)
  • lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (1 hunks)
  • lib/segment/src/index/field_index/facet_index.rs (3 hunks)
  • lib/segment/src/index/field_index/field_index_base.rs (2 hunks)
  • lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (1 hunks)
  • lib/segment/src/index/field_index/full_text_index/inverted_index.rs (3 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (1 hunks)
  • lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1 hunks)
  • lib/segment/src/index/field_index/full_text_index/text_index.rs (2 hunks)
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (1 hunks)
  • lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1 hunks)
  • lib/segment/src/index/field_index/geo_index/mod.rs (16 hunks)
  • lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs (3 hunks)
  • lib/segment/src/index/field_index/map_index/immutable_map_index.rs (3 hunks)
  • lib/segment/src/index/field_index/map_index/mmap_map_index.rs (3 hunks)
  • lib/segment/src/index/field_index/map_index/mod.rs (23 hunks)
  • lib/segment/src/index/field_index/map_index/mutable_map_index.rs (2 hunks)
  • lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3 hunks)
  • lib/segment/src/index/field_index/numeric_index/mod.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/hnsw.rs (2 hunks)
  • lib/segment/src/index/payload_index_base.rs (1 hunks)
  • lib/segment/src/index/plain_payload_index.rs (2 hunks)
  • lib/segment/src/index/plain_vector_index.rs (3 hunks)
  • lib/segment/src/index/query_optimization/optimizer.rs (1 hunks)
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs (2 hunks)
  • lib/segment/src/index/struct_payload_index.rs (6 hunks)
  • lib/segment/src/segment/entry.rs (5 hunks)
  • lib/segment/src/segment/facet.rs (3 hunks)
  • lib/segment/src/segment/order_by.rs (1 hunks)
  • lib/segment/src/segment/sampling.rs (1 hunks)
  • lib/segment/src/segment/scroll.rs (2 hunks)
  • lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1 hunks)
  • lib/segment/tests/integration/payload_index_test.rs (9 hunks)
  • src/actix/api/update_api.rs (0 hunks)
💤 Files with no reviewable changes (1)
  • src/actix/api/update_api.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test-consensus-compose
  • GitHub Check: test
🔇 Additional comments (119)
lib/segment/src/segment/sampling.rs (1)

22-22: Changes maintain functionality while integrating hardware counter

The modification correctly updates the call to estimate_cardinality by passing the hw_counter parameter, ensuring hardware resource usage is tracked during cardinality estimation without altering the core logic.

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1)

281-281: Now measuring deletions with disposable hardware counter

The code now properly passes a disposable hardware counter to the update_stored method when deleting a vector, which enables gathering metrics for deletion operations.

lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (1)

85-94: Hardware counter integration properly tracks posting list I/O

The implementation correctly measures I/O operations by incrementing the payload index counter with both the size of the option type and the actual size of the posting list content.

CompressedPostingList::len() only reads the length of vectors, but this still counts toward I/O reads since accessing metadata involves memory operations that should be measured.

lib/collection/src/shards/proxy_shard.rs (2)

149-155: Method signature correctly updated to include hardware counter

The estimate_cardinality method now properly accepts and forwards the hardware counter parameter to the wrapped shard, ensuring consistent measurement across the delegation chain.


180-181: Hardware counter correctly passed from accumulator

The code now properly obtains the hardware counter from the measurement accumulator and passes it to the estimate_cardinality method, ensuring metrics are collected during filter-based cardinality estimation.

lib/collection/src/shards/forward_proxy_shard.rs (2)

7-7: Adding hardware counter import for tracking resource usage

The addition of the hardware counter import is necessary to support the new parameter in the method signature below.


236-242: Added hardware counter parameter to estimate_cardinality method

This change consistently adds the hardware counter parameter to the estimate_cardinality method and passes it to the wrapped shard implementation. This aligns with the broader pattern of changes in this PR to track hardware resource usage during cardinality estimation operations.

lib/collection/src/collection_manager/segments_searcher.rs (5)

775-775: Adding hardware counter import for test code

The import is needed to support the usage of hardware counter in the test module.


804-805: Added hardware counter instantiation for tests

Creating a new hardware counter instance to be used in the test cases.


808-809: Updated method call to include hardware counter

This change adds the hardware counter parameter to the method call, ensuring that tests reflect the updated API for performance tracking.


811-813: Updated second method call to include hardware counter

Same pattern of change - adding the hardware counter to the method call to ensure test compliance with updated API.


819-824: Updated third method call to include hardware counter

This change completes the pattern of updating all calls to the is_small_enough_for_unindexed_search method to include the hardware counter parameter, ensuring all test cases properly reflect the updated API.

lib/segment/src/segment/order_by.rs (1)

35-35: Updated cardinality estimation to use hardware counter

The change properly passes the hardware counter to the estimate_cardinality method, ensuring resource usage is tracked during this operation. This is consistent with the overall PR objective of adding hardware tracking to cardinality estimation functions.

lib/segment/src/index/query_optimization/optimizer.rs (1)

129-129: Updated condition_cardinality method call to include hardware counter

This change ensures the hardware counter is passed through to the condition_cardinality method when converting conditions in query optimization. The update is consistent with the overall pattern of adding hardware tracking to cardinality-related functions across the codebase.

lib/segment/src/entry/entry_point.rs (1)

206-210: Method signature change looks good

The addition of the hw_counter parameter to the estimate_point_count method is consistent with the project's goal of tracking hardware metrics during cardinality estimation operations.

lib/segment/benches/conditional_search.rs (1)

145-145: Updated call correctly uses the hardware counter

The hw_counter parameter has been correctly added to the estimate_cardinality method call, using the variable that was already defined at line 136.

lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)

164-179: Potential issue with hardware counter calculation

The implementation correctly adds the hw_counter parameter, but there may be an issue with how the counter is being updated.

The calculation size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u8>() might not accurately represent the actual memory used by the posting list.

This calculation only considers the length of one vector as mentioned in a past review, and it multiplies by size_of::<u8>() which may not be the correct size for each element in the posting list. Consider using the actual size of the elements stored in the posting list.

lib/collection/src/shards/queue_proxy_shard.rs (1)

223-231: Method signature and implementation updated correctly

The estimate_cardinality method has been properly updated to include the hw_counter parameter and correctly forwards it to the wrapped shard's implementation.

lib/segment/src/index/sparse_index/sparse_vector_index.rs (3)

266-270: Method signature updated to include hardware counter parameter.

The method signature now includes a hw_counter parameter which allows tracking hardware performance metrics during cardinality estimation.


275-275: Hardware counter correctly propagated to payload index.

The hw_counter parameter is now passed to the estimate_cardinality method of the payload index, ensuring consistent hardware performance tracking throughout the cardinality estimation process.


438-439: Hardware counter correctly passed to cardinality estimation.

The call to get_query_cardinality now includes the hardware counter from the vector query context, ensuring performance metrics are tracked throughout the query execution path.

lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (2)

204-209: Method signature updated to include hardware counter parameter.

The get_posting_len method now accepts and utilizes a hardware counter parameter, properly propagating it to the underlying postings implementation.


213-213: Helpful comment added explaining hardware counter usage.

The comment clarifies that hardware counter propagation is not needed in this specific context since the function is only used for building the HNSW index, which is helpful for future maintenance.

lib/collection/src/shards/local_shard/mod.rs (2)

990-990: Method signature updated to include hardware counter parameter.

Added hw_counter parameter to the estimate_cardinality method to enable hardware performance tracking during cardinality estimation.


995-1000: Hardware counter correctly propagated to segment estimations.

The hardware counter is now properly passed to each segment's estimate_point_count method, ensuring consistent performance tracking across all segments during cardinality estimation operations.

lib/quantization/src/encoded_vectors.rs (2)

34-34: Fixed typo in parameter name.

Corrected the parameter name from hw_couter to hw_counter in the score_point method for consistency and correctness.


36-36: Fixed typo in parameter name.

Corrected the parameter name from hw_couter to hw_counter in the score_internal method for consistency and correctness.

lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (1)

358-362: New hardware counter parameter with TODO for future implementation.

The method signature has been updated to include _hw_counter: &HardwareCounterCell but it's not used yet, as indicated by the underscore prefix and the TODO comment. This is consistent with the PR objectives which indicate that some areas have TODOs for future improvements.

Make sure the hardware counter collection is implemented in a follow-up PR as noted in the comment.

Are there other places in this file where hardware counters could be utilized for measurements? For example, the filter method on line 339 has an _hw_acc parameter that seems related.

lib/segment/tests/integration/payload_index_test.rs (8)

467-473: Hardware counter integration for cardinality estimation.

The HardwareCounterCell is properly initialized and passed to the estimate_cardinality method.


503-503: Consistent use of hardware counter for estimation.

The hardware counter is correctly passed to the second cardinality estimation call.


565-571: Hardware counter for is_empty condition tests.

The hardware counter is properly initialized and consistently used for both struct and plain segments.

Also applies to: 577-577


678-684: Hardware counter for cardinality estimation tests.

The hardware counter is properly initialized and passed to the cardinality estimation method.


724-729: Hardware counter for nested array filter tests.

The hardware counter is properly initialized and passed to the cardinality estimation method.


788-793: Hardware counter for nested array nesting tests.

The hardware counter is properly initialized and passed to the cardinality estimation method.


879-885: Hardware counter for struct payload index tests.

The hardware counter is properly initialized and passed to the cardinality estimation method.


1087-1092: Hardware counter for nested fields tests.

The hardware counter is properly initialized and passed to the cardinality estimation method.

lib/collection/src/shards/local_shard/shard_ops.rs (1)

216-220: Add hardware counter to cardinality estimation.

The change correctly passes the hardware counter to the estimate_cardinality method from the existing hw_measurement_acc parameter, maintaining functionality while enabling performance monitoring.

lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)

4-4: Add import for HardwareCounterCell.

Import added to support the new parameter in the estimate_cardinality method.


255-259: New hardware counter parameter with TODO for future implementation.

The method signature has been updated to include _hw_counter: &HardwareCounterCell but it's not used yet, as indicated by the underscore prefix and the TODO comment. This is consistent with the PR objectives which indicate that some areas have TODOs for future improvements.

Make sure the hardware counter collection is implemented in a follow-up PR. Consider if similar measurement could be added to other methods like filter on line 209 which already has a related _hw_counter parameter.


434-440: Update test to use hardware counter.

The test has been properly updated to create a HardwareCounterCell instance and pass it to the estimate_cardinality calls.

lib/segment/src/index/field_index/facet_index.rs (4)

1-1: LGTM - Proper hardware counter import

The import for HwMeasurementAcc is correctly added to support the new parameter in the trait and implementation.


20-23: LGTM - Trait interface updated correctly

The FacetIndex trait's iter_values_map method signature is properly updated to include the hardware measurement accumulator parameter.


60-63: LGTM - Enum implementation updated consistently

The implementation of iter_values_map for FacetIndexEnum is properly updated to include the hardware measurement accumulator parameter.


65-68: LGTM - Parameter correctly propagated

The hardware measurement accumulator parameter is correctly passed to all variant implementations, maintaining consistent behavior across all enum variants.

lib/segment/src/index/field_index/full_text_index/text_index.rs (4)

131-136: LGTM - Method signature updated properly

The estimate_cardinality method in FullTextIndex correctly adds the hardware counter parameter.


138-146: LGTM - Implementation consistently updated

The hardware counter parameter is correctly passed to all the inverted index implementations in each match arm.


355-359: LGTM - PayloadFieldIndex trait implementation updated

The estimate_cardinality method implementation for PayloadFieldIndex is properly updated to include the hardware counter parameter.


361-362: LGTM - Hardware counter propagation

The hardware counter is correctly passed to the parse_query method and then used in the estimate_cardinality call.

lib/collection/src/shards/replica_set/update.rs (1)

548-551: LGTM - Properly passes hardware counter to cardinality estimation

The hardware counter is correctly passed to the estimate_request_cardinality method, allowing for proper measurement of hardware usage during the cardinality estimation process.

lib/segment/src/segment/entry.rs (5)

361-361: LGTM - Hardware counter added to filtering check

The should_pre_filter method call is properly updated to include the hardware counter parameter.


385-385: LGTM - Consistent hardware counter usage in ordered filtering

The hardware counter is correctly propagated to the should_pre_filter method in the ordered filtering path.


412-412: LGTM - Hardware counter for random filtering

The hardware counter parameter is consistently passed to the should_pre_filter method in the random filtering path.


456-460: LGTM - Method signature updated for hardware tracking

The estimate_point_count method signature is properly updated to include the hardware counter parameter, enabling resource usage tracking during point count estimation.


473-473: LGTM - Hardware counter correctly propagated to underlying estimation

The hardware counter is properly passed to the payload index's cardinality estimation method.

lib/segment/src/index/hnsw_index/hnsw.rs (2)

539-541: Implementation of hardware counter for cardinality estimation

The disposable hardware counter is correctly created and passed to the estimate_cardinality method, ensuring that hardware performance monitoring is properly integrated for internal operations.


1223-1226: TODO for hardware counter propagation

The code is currently using a disposable hardware counter that doesn't propagate measurements. The TODO comment correctly identifies this as an area for future improvement.

Based on the PR objectives, this is likely one of the TODOs mentioned as areas for improvement that will be addressed in a future PR (mentioned as depending on PR #5985).

lib/segment/src/index/field_index/numeric_index/mod.rs (1)

687-691: Method signature updated with hardware counter parameter

The estimate_cardinality method has been updated to accept a hardware counter parameter, consistent with other similar method signature updates across the codebase. The TODO comment correctly indicates that actual hardware counter collection is planned for future implementation.

lib/segment/src/index/plain_payload_index.rs (2)

112-116: Added hardware counter parameter with helpful comment

The method signature has been properly updated to include the hardware counter parameter, with a clear comment indicating that no measurements are needed for this particular implementation.


126-134: Updated nested cardinality estimation to pass hardware counter

The estimate_nested_cardinality method correctly forwards the hardware counter to the estimate_cardinality method, maintaining consistency in the API.

lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (1)

397-414: Implemented hardware counter tracking for boolean index operations

This change properly integrates the hardware counter to track I/O operations during cardinality estimation. The code correctly increments the counter by the slice length before counting ones in the slice.

lib/segment/src/index/payload_index_base.rs (2)

61-65: LGTM: Added hardware counter parameter to estimate_cardinality.

The addition of the hw_counter parameter to the trait method signature is appropriate and consistent with the hardware counter integration pattern being implemented throughout the codebase.


68-73: LGTM: Added hardware counter parameter to estimate_nested_cardinality.

Consistent with the changes to estimate_cardinality, this change correctly adds the hardware counter parameter to the nested cardinality estimation method.

lib/collection/src/collection_manager/holders/proxy_segment.rs (1)

974-994: LGTM: Hardware counter parameter added to estimate_point_count method.

The implementation properly passes the hardware counter to both the wrapped segment and write segment estimations, ensuring consistent performance tracking throughout the cardinality estimation process.

lib/collection/src/shards/shard.rs (1)

286-297: LGTM: Hardware counter added to estimate_request_cardinality method.

The implementation correctly passes the hardware counter to the underlying estimate_cardinality method when dealing with filter-based effect areas.

lib/segment/src/index/field_index/field_index_base.rs (2)

56-62: Integration of hardware counter for cardinality estimation

The estimate_cardinality method in the PayloadFieldIndex trait has been updated to include a hardware counter parameter, which will help track performance metrics during cardinality estimation operations.


261-268: Consistent parameter passing to underlying implementation

The estimate_cardinality implementation in FieldIndex correctly passes the new hardware counter parameter to the payload field index, maintaining proper delegation.

lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs (4)

6-6: Added hardware counter import

Added import for HardwareCounterCell to support the new parameter in method signatures.


183-184: Updated delegate macro with hardware counter parameter

The delegate macro has been updated to include the hardware counter parameter in the method signatures for points_of_hash and values_of_hash.


236-240: Added hardware counter instrumentation for GeoHash reads

The method now increments the payload index IO read counter by the size of a GeoHash before performing the read operation, providing valuable metrics on IO usage.


243-247: Added hardware counter instrumentation for GeoHash values reads

Similarly to points_of_hash, this method now tracks IO reads when retrieving values associated with a GeoHash.

lib/segment/src/segment/facet.rs (3)

39-39: Updated cardinality estimation with hardware counter

Added the hardware counter parameter to the estimate_cardinality call to track performance metrics during filter cardinality estimation.


79-79: Added hardware counter accumulator for value iteration

Updated the iter_values_map call to include a new accumulator from the hardware counter, enabling performance tracking during value iteration.


127-127: Consistent hardware counter usage for facet values filtering

Added the hardware counter parameter to another instance of estimate_cardinality, maintaining consistency in performance tracking across different code paths.

lib/segment/src/segment/scroll.rs (3)

18-23: Updated method signature to include hardware counter

The should_pre_filter method now accepts a hardware counter parameter, allowing for performance tracking during the pre-filtering decision process.


26-26: Added hardware counter to cardinality estimation in filtering decision

Updated the estimate_cardinality call to include the hardware counter parameter, enabling accurate tracking of performance metrics.


104-104: Consistent hardware counter usage in filtered reads

Added the hardware counter parameter to the estimate_cardinality call in the filtered_read_by_index method, ensuring consistent performance tracking throughout the filtering process.

lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (2)

104-117: Good implementation of hardware counter for binary search complexity

The addition of the hardware counter parameter and the implementation of I/O read estimation based on binary search complexity is well-designed. The logarithmic calculation for the complexity is mathematically accurate and the multiplication by the size of Counts properly accounts for the memory footprint.


119-130: Consistent implementation for values_of_hash method

This implementation follows the same pattern as points_of_hash, properly applying the hardware counter to track I/O read operations. The only difference is that it doesn't multiply by the size of Counts, which is appropriate since we're only counting the complexity of the search itself, not the data access.

lib/segment/src/index/field_index/map_index/mmap_map_index.rs (2)

8-9: New import for hardware counter accumulator

The addition of the hardware accumulator import supports the changes in the iter_values_map method.


210-227: Updated get_count_for_value to use hardware counter for metrics

The method signature has been properly updated to accept a hardware counter parameter which is now passed to the get method. This allows for consistent tracking of I/O operations throughout the call chain.

lib/segment/src/index/field_index/map_index/mutable_map_index.rs (3)

6-7: New import for hardware counter accumulator

The addition of the hardware accumulator import supports the changes in the iter_values_map method.


176-182: Added hardware counter parameter with TODO comment

The method has been properly modified to accept a hardware counter parameter, although it's currently unused as indicated by the underscore prefix. The TODO comment clarifies the intention for future implementation.


188-195: Added hardware measurement accumulator with TODO comment

Similar to the previous change, this method has been modified to accept a hardware measurement accumulator parameter, currently unused. The TODO comment indicates the same future implementation intention as in the get_count_for_value method.

lib/segment/src/index/plain_vector_index.rs (3)

47-52: Added hardware counter parameter to is_small_enough_for_unindexed_search

The method signature has been properly updated to accept a hardware counter parameter which will be used for cardinality estimation.


62-63: Updated estimate_cardinality call to pass hardware counter

The call to estimate_cardinality has been properly updated to pass the hardware counter parameter, ensuring consistent hardware metrics tracking throughout the cardinality estimation process.


86-91: Updated method call to pass the hardware counter from query context

The call to is_small_enough_for_unindexed_search now correctly passes the hardware counter from the query context, ensuring proper propagation of hardware measurement capabilities.

lib/segment/src/index/field_index/bool_index/mod.rs (3)

131-139: LGTM: Successfully updated method signature and implementations

The estimate_cardinality method has been updated to include the hw_counter parameter, and the corresponding implementations in the match arms correctly pass this parameter to the underlying index implementations.


168-171: Implementation needed for TODO comment

The iter_values_map method signature has been updated to include hardware measurement accumulation, but the parameter is currently unused as indicated by the TODO comment.

Consider implementing the hardware measurement functionality according to the TODO comment or provide a timeline for when this will be addressed.


472-481: LGTM: Tests updated correctly with hardware counter

The test code has been properly updated to instantiate a HardwareCounterCell and pass it to the estimate_cardinality method.

lib/segment/src/index/field_index/map_index/immutable_map_index.rs (2)

293-301: Implementation needed for TODO comment

The get_count_for_value method signature has been updated to include a _hw_counter parameter, but it's currently unused as indicated by the TODO comment.

Consider implementing the hardware measurement functionality according to the TODO comment. The underscore prefix indicates it's intentionally unused for now, but this should be addressed in the future.


309-314: LGTM: Successfully updated method signature with hardware counter

The iter_values_map method has been correctly updated to use the hardware counter from the hw_acc parameter, and it's being properly passed to the get_iterator method.

lib/segment/src/index/field_index/geo_index/mod.rs (6)

110-116: LGTM: Successfully updated method signature and implementations

The points_of_hash method has been updated to include the hw_counter parameter, and the implementations in all three variants properly pass this parameter.


118-124: LGTM: Successfully updated method signature and implementations

The values_of_hash method has been updated to include the hw_counter parameter, and the implementations in all three variants properly pass this parameter.


217-268: LGTM: Successfully updated method signature and internal calls

The match_cardinality method has been updated to include the hw_counter parameter, and all internal calls to points_of_hash and values_of_hash now correctly pass this parameter along.


551-601: LGTM: Successfully updated method signature and internal calls

The estimate_cardinality method has been updated to include the hw_counter parameter, and all internal calls to match_cardinality now correctly pass this parameter along, ensuring the hardware counter is used consistently throughout the cardinality estimation flow.


1055-1057: LGTM: Tests updated correctly with hardware counter

The test code has been properly updated to instantiate a HardwareCounterCell and pass it to the points_of_hash method.


947-950: LGTM: Tests updated correctly with hardware counter

The match_cardinality test has been updated to instantiate a HardwareCounterCell and pass it to the updated method calls correctly.

lib/segment/src/index/field_index/full_text_index/inverted_index.rs (3)

96-97: LGTM: Updated method signature with hardware counter

The get_posting_len method signature has been updated to include the hw_counter parameter, aligning with the hardware counter integration pattern used throughout the codebase.


99-114: LGTM: Successfully updated method signature and internal calls

The estimate_cardinality method has been updated to include the hw_counter parameter, and the internal call to get_posting_len now correctly passes this parameter along.


184-188: Fixed parameter name typo

Corrected the parameter name from hw_coutner to hw_counter, ensuring consistent naming throughout the codebase.

lib/segment/src/index/struct_payload_index.rs (5)

71-83: Add hardware counter parameter to estimate_field_condition

The addition of hw_counter: &HardwareCounterCell into both the function signature (line 71) and the subsequent invocation of index.estimate_cardinality(&full_path_condition, hw_counter) (line 83) looks consistent. This clearly propagates the hardware counter to underlying field indexes.


246-296: Extend condition_cardinality with hw_counter usage

All added parameter references (lines 246, 253, 259, 266, and 296) properly pass hw_counter into nested or field-level estimations. This consistently integrates hardware counter measurement for each condition type. The logic remains coherent, and there are no immediate concerns.


446-453: Incorporate hw_counter in estimate_cardinality

Lines 446–450 introduce a new parameter in the function signature, while lines 452–453 successfully reference the parameter in a closure. Passing hw_counter into self.condition_cardinality allows consistent tracking of hardware metrics.


461-466: Nested cardinality estimation with hardware counter

Lines 461–466 similarly extend the nested cardinality estimation logic to accept and utilize hw_counter. This is consistent with the rest of the cardinality estimation methods.


473-476: Propagate hw_counter in query_points

These lines (473, 476) ensure that hw_counter is provided when calling self.estimate_cardinality(...). Maintaining consistency with the rest of the index logic, this extension helps accurately measure hardware performance during queries.

lib/segment/src/index/field_index/map_index/mod.rs (14)

179-184: New parameter in get_count_for_value

Adding hw_counter: &HardwareCounterCell (line 179) and passing it down for Mutable, Immutable, and Mmap indexes (lines 181–184) is aligned with the broader hardware counter integration. This extension maintains uniform coverage across all index types.


211-219: Hardware accumulator in iter_values_map

Lines 211–219 introduce hw_acc: HwMeasurementAcc and forward it to specific index variants. This consistency ensures measurement coverage while iterating values.


234-240: Extend match_cardinality with hardware counter

In lines 234–240, the new parameter hw_counter aids cardinality estimation for matched values. The changes are minimal yet fully consistent with the existing design.


317-364: Use hw_counter in except_cardinality

Within lines 317 and 361–364, the function calculates how many points are excluded/retained, now also factoring in hardware counter calls with self.get_count_for_value(...). This is a coherent extension of metric tracking to the “except” logic.


421-427: Integrate HardwareCounterCell into except_set

Adding hw_acc (line 421) and retrieving hw_counter (line 427) ensures uniform measurement coverage for set-based exclusions. No issues spotted.


565-565: Leverage self.except_set with hw_acc for string filtering

Line 565 adds the hardware counter argument when calling self.except_set(...) for string-based exclusions. This keeps the filter logic consistent.


578-624: Implement hw_counter in estimate_cardinality for MapIndex<str>

These additions unify cardinality estimation under the hardware counter approach (lines 578–582, 586, 599, 624). The pattern of retrieving and passing hw_counter to match_cardinality or except_cardinality is well-executed.


755-817: Extend MapIndex<UuidIntType> cardinality with hardware counter

Within lines 755–759, 764, 784–785, and 816–817, the updated function signature and calls consistently pass the hardware counter. This parallels the other map index implementations, ensuring uniform coverage.


913-913: Filter except set for MapIndex<IntPayloadType>

Line 913 references the Match::Except variant, with line 921 calling self.except_set(integers, hw_acc). This extension properly includes the hardware counter for integer-based filters.

Also applies to: 921-921


927-983: Integrate hardware counter in estimate_cardinality for MapIndex<IntPayloadType>

From lines 927–931, 936–937, 957–958, and 982–983, hardware metrics are consistently passed for integer-based cardinality estimations. The approach matches the other map index types.


997-1001: Add disposable HardwareCounterCell in payload_blocks

The code (lines 997–1001) calls get_count_for_value(...) with a disposable hardware counter, noting that payload_blocks usage is not measured. This maintains consistency in the interface while avoiding unneeded overhead.


1262-1268: Initialize and use hw_counter in tests (int map index)

Defining let hw_counter = HardwareCounterCell::new(); (lines 1262–1263) and passing it to .except_cardinality(...) (lines 1267–1268) ensures the test code fully exercises the new interface. Looks good.


1305-1311: Initialize and pass hw_counter in tests (string map index)

As with the int map index tests, lines 1305–1306 create the hardware counter, then lines 1310–1311 pass it to .except_cardinality(...). The logic is consistent and verifies the updated code path.


1331-1332: Verify empty index scenario with hw_counter

At lines 1331–1332, passing hw_counter to .except_cardinality(...) confirms that an empty index yields zero cardinality. This ensures test coverage aligns with the new parameter.

@JojiiOfficial JojiiOfficial force-pushed the payload_index_estimate_cardinality_io_measurements branch from c766539 to 260d8db Compare March 13, 2025 09:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)

165-180: The hardware counter logic could be simplified

The method calculates the length first and then uses it for both the hardware counter increment and the return value. This is good for code clarity, but there's a redundant .as_ref() call in the chain that could be removed.

fn get_posting_len(
    &self,
    token_id: TokenId,
    hw_counter: &HardwareCounterCell,
) -> Option<usize> {
    let len = self
        .postings
        .get(token_id as usize)
        .and_then(|posting| posting.as_ref())
-       .as_ref()
        .map(|x| x.len());
    hw_counter
        .payload_index_io_read_counter()
        .incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u8>());
    len
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c766539 and 260d8db.

📒 Files selected for processing (2)
  • lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (2 hunks)
  • lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-low-resources
  • GitHub Check: test-consensus
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consistency
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test-consensus-compose
🔇 Additional comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)

175-179: Consider whether counting only size_of::<u8>() per element is accurate

Similar to the issue in the filter method, there's a concern about whether size_of::<u8>() accurately represents the memory footprint of each element in the posting list.

@JojiiOfficial JojiiOfficial force-pushed the payload_index_estimate_cardinality_io_measurements branch from 260d8db to 83fb468 Compare March 13, 2025 09:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)

165-180: Consider making hardware counter incrementation more consistent with the filter method.

The implementation correctly calculates and increments the hardware counter based on the posting list access, but there's a minor inefficiency in first retrieving the length in a separate variable and then using it for the counter increment.

Consider refactoring to a more direct approach:

    fn get_posting_len(
        &self,
        token_id: TokenId,
        hw_counter: &HardwareCounterCell,
    ) -> Option<usize> {
-        let len = self
-            .postings
-            .get(token_id as usize)
-            .and_then(|posting| posting.as_ref())
-            .as_ref()
-            .map(|x| x.len());
-        hw_counter
-            .payload_index_io_read_counter()
-            .incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u32>());
-        len
+        let postings = self
+            .postings
+            .get(token_id as usize)
+            .and_then(|posting| posting.as_ref());
+
+        let len = postings.map(|x| x.len());
+        hw_counter
+            .payload_index_io_read_counter()
+            .incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u32>());
+        len
    }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 260d8db and 83fb468.

📒 Files selected for processing (2)
  • lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (2 hunks)
  • lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus
  • GitHub Check: test (macos-latest)
  • GitHub Check: test
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)

145-148: The calculation correctly accounts for posting list element size.

The hardware counter incrementation for posting list size is correctly calculated using size_of::<u32>() which matches the PointOffsetType definition in the codebase.

@JojiiOfficial JojiiOfficial force-pushed the payload_index_estimate_cardinality_io_measurements branch from 83fb468 to 4f9a385 Compare March 13, 2025 09:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/segment/src/index/field_index/mmap_point_to_values.rs (1)

300-302: Optimize the cloning operation for hardware counter calculation.

The code clones the value to calculate its memory-mapped size and then passes the original value to the check function. This cloning operation could be unnecessary and adds overhead, especially for larger values.

Consider calculating the size without cloning by using a method that accepts a reference:

-                    let mmap_size = T::mmapped_size(value.clone());
-                    hw_acc.payload_index_io_read_counter().incr_delta(mmap_size);
-                    if check_fn(value) {
+                    let mmap_size = T::mmapped_size(value);
+                    hw_acc.payload_index_io_read_counter().incr_delta(mmap_size);
+                    if check_fn(value) {

This would require updating the MmapValue trait to ensure mmapped_size can work with referenced values directly, potentially improving performance for large values.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c66af2 and 94c04af.

📒 Files selected for processing (1)
  • lib/segment/src/index/field_index/mmap_point_to_values.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test-consensus
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
🔇 Additional comments (2)
lib/segment/src/index/field_index/mmap_point_to_values.rs (2)

305-305: Good optimization: reusing calculated value.

Reusing the pre-calculated mmap_size variable is a good optimization that avoids recalculating the value's size.


287-288: Well-implemented hardware counter parameter.

The addition of the hw_acc parameter aligns with the PR objective of adding hardware counter measurements for more accurate I/O tracking during cardinality estimation operations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/segment/src/index/field_index/map_index/mod.rs (1)

314-416: Consider splitting this large function for clarity.
The logic in except_cardinality is correct and makes good use of saturating arithmetic to handle edge cases. However, it’s quite lengthy. Refactoring into smaller helper methods (e.g., to compute minimal or maximal scenarios separately) could improve readability and maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f17138 and 039f0ac.

📒 Files selected for processing (24)
  • lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (1 hunks)
  • lib/segment/src/index/field_index/bool_index/mod.rs (7 hunks)
  • lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (2 hunks)
  • lib/segment/src/index/field_index/field_index_base.rs (2 hunks)
  • lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_chunks_reader.rs (6 hunks)
  • lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_list.rs (3 hunks)
  • lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_visitor.rs (4 hunks)
  • lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (4 hunks)
  • lib/segment/src/index/field_index/full_text_index/inverted_index.rs (10 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (2 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (4 hunks)
  • lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (3 hunks)
  • lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (5 hunks)
  • lib/segment/src/index/field_index/full_text_index/postings_iterator.rs (2 hunks)
  • lib/segment/src/index/field_index/full_text_index/text_index.rs (4 hunks)
  • lib/segment/src/index/field_index/geo_index/mod.rs (24 hunks)
  • lib/segment/src/index/field_index/map_index/mod.rs (29 hunks)
  • lib/segment/src/index/field_index/null_index/mmap_null_index.rs (10 hunks)
  • lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (1 hunks)
  • lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1 hunks)
  • lib/segment/src/index/field_index/numeric_index/mod.rs (4 hunks)
  • lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (2 hunks)
  • lib/segment/src/index/field_index/numeric_index/tests.rs (4 hunks)
  • lib/segment/src/index/struct_payload_index.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs
  • lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
  • lib/segment/src/index/struct_payload_index.rs
  • lib/segment/src/index/field_index/null_index/mmap_null_index.rs
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
  • lib/segment/src/index/field_index/field_index_base.rs
  • lib/segment/src/index/field_index/full_text_index/inverted_index.rs
  • lib/segment/src/index/field_index/bool_index/mod.rs
  • lib/segment/src/index/field_index/full_text_index/text_index.rs
  • lib/segment/src/index/field_index/numeric_index/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (76)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (2)

95-99: Explicit lifetime annotations improve borrowing relationships

The added lifetime parameter 'a properly ties the returned ChunkReader's lifetime to the references it depends on (self, header, and hw_counter). This change enhances the clarity and safety of the borrowing relationships, ensuring the ChunkReader can't outlive any of its source data.


128-132: Consistent lifetime annotations for method chain

The lifetime annotations here are consistent with those added to get_reader, ensuring proper lifetime propagation through the method chain. This is particularly important since get calls get_reader on line 139. This change maintains memory safety while making the borrowing relationships explicit.

lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (2)

209-214: Good implementation of hardware counter tracking

The updated values_range method now properly tracks read operations using the hardware counter. The lifetime parameter 'a ensures that both &self and &hw_counter share the same lifetime, which is correctly propagated to the returned iterator.


217-221:

✅ Verification successful

Ensure proper size calculation in counter increment

The code increments the payload IO read counter with size_of::<Point<T>>() which measures the size of the Point<T> type. This looks correct, but verify that this matches the actual byte count read during this operation.


🏁 Script executed:

#!/bin/bash
# Verify consistent memory size approach across the codebase
rg -A 2 "size_of::<Point<" --type rust

Length of output: 1080


Ensure Correct Byte Calculation in Payload IO

Our search confirms that the usage of size_of::<Point<T>>() is consistent across both immutable_numeric_index.rs and mmap_numeric_index.rs. The counter increment accurately reflects the byte size of a Point<T>. Please double-check that this computed size fully represents the actual bytes read during the operation and that no additional overhead needs to be accounted for.

lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)

244-257:

❓ Verification inconclusive

Good implementation with correct counter usage

The updated values_range method correctly integrates the hardware counter with proper lifetime annotations. Nice use of inspect to track the IO operations without modifying the data flow.

Note that this implementation increments the payload_index_io_read_counter while the implementation in immutable_numeric_index.rs increments the payload_io_read_counter. Ensure this difference is intentional.


🏁 Script executed:

#!/bin/bash
# Compare counter usage between implementations
echo "Checking counter type differences between implementations:"
rg -A 3 "hw_counter\s+.*\.(payload_.*_read_counter)" --type rust

Length of output: 655


Minor Verification Required: Confirm Hardware Counter Differentiation

The implementation in mmap_numeric_index.rs uses proper lifetime annotations and integrates the hardware counter correctly via the inspect method. However, as noted, it increments the payload_index_io_read_counter—while, according to earlier context, the implementation in immutable_numeric_index.rs (and similarly seen in other modules like mutable_inverted_index.rs) uses a different counter (payload_io_read_counter). Please verify that this discrepancy in counter usage is intentional and that the design rationale for using different counters across these modules is well documented.

  • File: lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs – Uses payload_index_io_read_counter()
  • File: lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs – Confirm that the use of payload_io_read_counter() is deliberate
lib/segment/src/index/field_index/numeric_index/tests.rs (4)

1-1: Good addition of necessary import

The import of HwMeasurementAcc is necessary for the changes in this file.


101-102: Properly extracting the counter cell from the accumulator

Good approach to get the counter cell from the accumulator. This improves the design by allowing the counter to be shared across multiple operations.


111-111: Consistent passing of hardware counter reference

This change correctly passes the hardware counter reference to the filter method, maintaining consistency with other related changes.


439-441: Good test update for hardware counter usage

The test has been correctly updated to create a hardware counter and pass it to the filter method, ensuring consistent test coverage with the new parameter requirements.

lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (2)

108-122: Properly updated method signature and implementation

The values_range method has been correctly updated with appropriate lifetime parameters and to use the hardware counter.


291-343: Good refactoring from macro to inline methods

The change from a delegate macro to explicit inline method implementations improves code readability and maintainability. Each method clearly delegates to the corresponding method in in_memory_index, and the #[inline] annotations help maintain performance.

lib/segment/src/index/field_index/geo_index/mod.rs (21)

109-115: Ensure consistent usage of hardware counter in points_of_hash
These changes correctly forward hw_counter across all index variants. This appears consistent and maintains clarity.


117-123: Mirror the hw_counter approach in values_of_hash
Similar to points_of_hash, this method correctly delegates the hardware counter to underlying index variants.


216-220: Extended signature for match_cardinality
Adding hw_counter aligns this function’s signature with the rest of the codebase’s performance tracking strategy.


230-231: Properly invoke points_of_hash & values_of_hash with hw_counter
This ensures that I/O readings are tracked when retrieving common_hash statistics.


235-235: Inline call to points_of_hash in map
Passing hw_counter here maintains instrumentation consistency.


509-513: Updated filter method signature
Introducing the lifetime parameter and hw_counter reference is consistent with the pattern used elsewhere, enabling fine-grained performance monitoring.


518-518: Use hw_counter in check_values_any
Continuing the consistent hardware counter pass-through helps with uniform tracing of I/O.


528-528: Repeated check_values_any usage
Same comment as above: passing hw_counter is properly aligned with the instrumentation approach.


538-538: Another check_values_any call
No issues; consistent with the rest of the code.


547-551: Expanded signature for estimate_cardinality
Welcomed consistency in providing hw_counter for each cardinality estimation method.


554-554: Leverage match_cardinality with hw_counter
Seamless alignment with the new signature.


573-579: Polygon interior and exterior cardinality
Calls to match_cardinality now incorporate the counter parameter, ensuring measurement is captured.


803-804: Test usage of hw_counter
Creating and using HardwareCounterCell in tests ensures coverage of the new parameter.


944-944: Checking cardinality in tests
Using field_index.estimate_cardinality(&field_condition, &hw_counter) here confirms consistent test coverage.


996-1013: Extended test flow with hardware counter
From counter creation to the filter calls, these lines confirm that the new signature is well-tested for standard filtering logic.


1053-1076: hw_counter in block-based test logic
This chunk ensures block-based filtering leverages consistent I/O measurement.


1100-1161: Multi-location cardinality checks
All references to hw_counter in this block—Berlin, NYC, Tokyo—confirm test coverage for multiple geographic scenarios.


1182-1205: hw_counter usage for close geo payload
Retains performance measurement logic across close proximity tests (Berlin <-> Potsdam).


1245-1263: hw_counter in disk-load test
Validates that loading from disk maintains the same instrumentation approach.


1367-1404: Empty index cardinality checking with hw_counter
Ensures hardware counter usage in edge cases with zero or partial data.


1406-1465: Boundaries across the antimeridian
Again, passing hw_counter through bounding box queries demonstrates well-rounded test coverage.

lib/segment/src/index/field_index/full_text_index/postings_iterator.rs (3)

58-59: Imported HardwareCounterCell in tests
Great addition to measure read operations while iterating compressed postings.


90-91: Initialize hw_counter for intersection tests
Allows verifying that I/O reads are tracked correctly when merging postings.


96-98: Passing hw_counter to compressed posting readers
Integrating instrumentation into each posting list ensures consistent performance metrics.

lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (3)

151-153: Introduce hw_acc and hw_counter
Clear approach to obtain the counter cell from the hardware accumulator, avoiding duplicated counters.


156-174: Apply hw_counter in test filters
These lines confirm that full-text indexing logic with match queries now includes hardware metrics.


211-246: Persist and re-check hw_counter usage
All subsequent filter calls maintain a consistent pattern of passing the same counter cell, ensuring reliable instrumentation data.

lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (4)

47-51: Refined lifetime signature and hw_counter
Adopting fn filter<'a>(…, hw_counter: &'a HardwareCounterCell) -> … clarifies ownership and ensures safe reference usage.


71-74: Pass hw_counter to posting.reader
Good approach to accumulate read metrics for each token’s postings.


83-93: Introduce hw_counter in get_posting_len
Incrementing payload_index_io_read_counter() upon retrieving the posting length is a logical place for measuring I/O overhead.


123-124: Use hw_counter in check_match
Ensures posting lookups also contribute to read metrics. No concurrency problems apparent.

lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_visitor.rs (5)

96-96: Improved encapsulation by using a dedicated method

The change from directly calling remainder_postings.binary_search(&val).is_ok() to using the new search_in_remainder method improves encapsulation and allows for proper hardware counter tracking.


99-99: Improved encapsulation by using accessor method

Using get_chunk_index instead of direct array access provides better encapsulation and allows for tracking hardware metrics when accessing chunk data.


125-129: Improved encapsulation with dedicated accessors

Using the new chunks_len() method and get_remainder_posting improves encapsulation and enables hardware counter tracking, which is essential for IO monitoring.


150-158: Added hardware counter support to test code

Properly updated test code to include hardware counter initialization and usage, ensuring that tests accurately reflect the production code behavior with IO measurements.


173-175: Added hardware counter in second test

Correctly added hardware counter initialization and usage in the second test, maintaining consistency across all test cases.

lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (3)

376-380: Updated filter method signature to use HardwareCounterCell

Changed from using HwMeasurementAcc to &'a HardwareCounterCell for hardware counter tracking, which aligns with the broader changes across the codebase for consistent IO measurement.


388-388: Fixed bit calculation for hardware counter increment

Updated counter increment to properly account for bits by dividing slice length by u8::BITS, which provides more accurate measurement of IO operations.


395-412: Refactored estimate_cardinality to include hardware counter

The implementation now correctly tracks IO operations by incrementing the hardware counter after retrieving and processing the slice. The code is cleaner and more modular with better separation of concerns.

lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_chunks_reader.rs (6)

2-2: Added import for HardwareCounterCell

Added the necessary import for the hardware counter functionality.


10-15: Improved encapsulation and added hardware counter

Changed field visibility from public to private and added a new hw_counter field to track hardware metrics, enhancing the class's encapsulation and measurement capabilities.


18-31: Updated constructor to include hardware counter

Modified the constructor to accept and store the hardware counter, maintaining consistency with the new field added to the struct.


100-105: Added hardware counter tracking for chunk decompression

Added IO tracking with hardware counter when decompressing chunks, providing visibility into the performance characteristics of this operation.


105-105: Used u8::BITS instead of magic number for bit calculation

Replaced hardcoded value with the more semantic u8::BITS, improving code readability and maintainability.


118-141: Added accessor methods with hardware counter tracking

Implemented new helper methods for accessing chunks, remainder postings, and performing searches, each with appropriate hardware counter tracking. This improves encapsulation and enables detailed IO measurement.

lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_list.rs (4)

2-2: Added import for HardwareCounterCell

Added the necessary import for hardware counter functionality.


38-46: Updated reader method to accept hardware counter

Modified the reader method signature to include a hardware counter parameter and pass it to the ChunkReader constructor, enabling IO tracking in the reader.


53-61: Updated iter method to use hardware counter

Modified the iter method to accept and pass the hardware counter to the reader, ensuring consistent hardware metric tracking throughout the iteration process.


97-105: Updated test to use hardware counter

Properly updated the test code to initialize a hardware counter and pass it to the reader, ensuring that tests reflect the production code behavior with IO measurements.

lib/segment/src/index/field_index/map_index/mod.rs (16)

179-185: Implementation looks correct.
This update to get_count_for_value consistently passes the hw_counter parameter to specialized indices, ensuring hardware metrics are properly accounted for.


211-219: No issues with the revised signature.
The iter_values_map method now takes an HwMeasurementAcc and cleanly propagates it to each variant implementation.


234-242: Accurate pass-through of the hardware counter.
The match_cardinality method correctly uses get_count_for_value to compute the exact cardinality, properly reusing the hw_counter.


421-433: Consistent hardware counter usage.
In except_set, the hardware counter is correctly passed to get_iterator, ensuring performance metrics can be tallied for exclusion logic.


530-569: Filter updates look solid.
The refactored filter method for string-based payloads correctly invokes get_iterator or except_set with hw_counter. The logic aligns with the updated signatures.


571-631: Clean integration of hw_counter into cardinality estimation.
Within estimate_cardinality, the additional parameter is properly utilized in calls to match_cardinality and except_cardinality. Design remains consistent.


639-653: Correct usage of HardwareCounterCell::disposable().
The approach of passing a disposable counter in payload_blocks is valid, especially since measurement is not needed for HNSW construction.


680-742: Filter method changes appear proper.
For UuidIntType, the updated lines consistently pass hw_counter to get_iterator, preserving metrics collection.


744-819: Proper integration with cardinality estimation for UUID-based payloads.
All calls to match_cardinality and except_cardinality reflect the new hw_counter parameter. Implementation aligns well with the enhancements.


821-844: Payload blocks logic is consistent.
Using a disposable hardware counter in payload_blocks avoids unnecessary measurements while enabling a unified interface. Implementation is coherent.


871-905: Improved filter logic for integer payloads.
Passing hw_counter into get_iterator and except_set ensures consistent performance tracking for filtering operations.


910-970: Cardinality estimation flows well with the new parameter.
For integer payloads, the expanded estimate_cardinality method handles all matching variants and passes hw_counter as needed.


980-992: Accurate block condition building.
Within integer-based payload_blocks, the reuse of HardwareCounterCell::disposable() maintains a cohesive approach to measuring or ignoring costs as appropriate.


1245-1253: Unit test updates confirm new behavior.
The creation and usage of hw_counter for except_cardinality verify that the code handles empty exclusions properly.


1288-1296: Additional test coverage for string-based map index.
Similarly verifies that hw_counter is well-integrated for empty sets.


1309-1317: Ensures correct edge-case handling for an empty index.
Passing an empty iterator and checking that the result yields zero cardinality demonstrates robust test coverage with the new parameter.

) -> Option<ChunkReader<'_>> {
fn get_reader<'a>(
&'a self,
header: &'a PostingListHeader,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header doesn't need a lifetime here, since it is not part of the returned value.

) -> Option<Box<dyn Iterator<Item = PointOffsetType> + '_>> {
fn filter<'a>(
&'a self,
condition: &'a FieldCondition,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No lifetime needed for condition

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be concerned about double-counting, where we count some operation twice during the cardinality estimation and also while doing the operation itself?

I did some basic test with this, but definitely was not able to test all possible paths. Since this covers so much, it's very hard to keep track of what branches there are, and which we do or do not count.

Copy link
Member

@generall generall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made test measurements on keyword and text indexes, after some fixes they now match my expectations. Also, found a potential place for optimization - we can try to exclude primary condition from second stage filtering.

I propose to merge this and proceed with further steps: create a checklist and some docs, which would allow external people to understand what to expect from measurement and how to test them.

@JojiiOfficial JojiiOfficial merged commit e957d6f into dev Mar 14, 2025
17 checks passed
@JojiiOfficial JojiiOfficial deleted the payload_index_estimate_cardinality_io_measurements branch March 14, 2025 10:05
timvisee pushed a commit that referenced this pull request Mar 21, 2025
* Cardinality estimation measurements

* Apply hw measurements to latest changes from dev

* Clippy

* Also measure cardinality estimation for geo index

* Make measured units 'bytes'

* Use PointOffsetType instead of u32 for size calculation

* fix memory cost for check_values_any in mmap index

* fix double counting for value reading in mmap, remove hw_counter from mmap hashmap

* fmt

* fix hw measurement for text index

* Remove non necessary lifetime annotations

---------

Co-authored-by: generall <andrey@vasnetsov.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants