Skip to content

Conversation

JojiiOfficial
Copy link
Contributor

@JojiiOfficial JojiiOfficial commented Feb 7, 2025

Depends on #5951

Adds IO read measurements for filtering over full text index and bool index.

@JojiiOfficial JojiiOfficial force-pushed the payload_index_io_read_measurement branch from f15ea01 to e7e4c15 Compare March 5, 2025 11:27
@JojiiOfficial JojiiOfficial force-pushed the payload_fulltext_index_io_read_measurement branch from 4f03a64 to c58bbae Compare March 5, 2025 11:43
@JojiiOfficial JojiiOfficial force-pushed the payload_index_io_read_measurement branch from 8264688 to e38752a Compare March 6, 2025 14:25
@JojiiOfficial JojiiOfficial force-pushed the payload_fulltext_index_io_read_measurement branch 2 times, most recently from 00c3999 to 57412ea Compare March 6, 2025 14:47
@JojiiOfficial JojiiOfficial force-pushed the payload_index_io_read_measurement branch from dd5ae10 to bd123ba Compare March 7, 2025 12:14
@JojiiOfficial JojiiOfficial force-pushed the payload_fulltext_index_io_read_measurement branch from 57412ea to 874eb0e Compare March 7, 2025 12:17
@JojiiOfficial JojiiOfficial force-pushed the payload_index_io_read_measurement branch from bd123ba to f10c290 Compare March 10, 2025 14:31
@JojiiOfficial JojiiOfficial force-pushed the payload_fulltext_index_io_read_measurement branch from 874eb0e to d57c503 Compare March 10, 2025 14:37
@JojiiOfficial JojiiOfficial force-pushed the payload_index_io_read_measurement branch from f10c290 to ca799aa Compare March 12, 2025 09:03
Base automatically changed from payload_index_io_read_measurement to dev March 12, 2025 13:31
@JojiiOfficial JojiiOfficial force-pushed the payload_fulltext_index_io_read_measurement branch from d57c503 to eabc66d Compare March 12, 2025 13:33
Copy link
Contributor

coderabbitai bot commented Mar 12, 2025

📝 Walkthrough

Walkthrough

The changes integrate a hardware performance counter into several components of the system. A new test function has been added to verify the correct behavior of the hardware counter accumulator. Method signatures across multiple modules, including various boolean and full-text index implementations, now accept a reference to a hardware counter. These modifications involve incrementing the hardware counter during read operations or when parsing queries and documents, ensuring that performance metrics are captured as operations are executed. Functions related to query optimization, payload storage, condition checking, and inverted index handling have been updated accordingly. The overall logic and structure of the affected operations remain unchanged, with the primary adjustment being the addition of the hardware counter for performance measurement purposes.

Suggested reviewers

  • generall
  • timvisee
  • agourlay
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (1)

341-342: Consider implementing hardware counter for filter method

The TODO comment indicates that there's still work to be done to measure this index in the filter method. Consider adding the hardware counter implementation here as well for consistency with the rest of the changes.

lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)

130-149: Avoid overshadowing the hardware counter variable to improve clarity.

In this block, the line let hw_counter = hw_counter.payload_index_io_read_counter(); reuses the same variable name for the new local reference, which may cause confusion. Consider renaming the local variable to avoid overshadowing.

 fn filter(
     &self,
     query: &ParsedQuery,
     hw_counter: &HardwareCounterCell,
 ) -> Box<dyn Iterator<Item = PointOffsetType> + '_> {
-    let hw_counter = hw_counter.payload_index_io_read_counter();
+    let local_hw_counter = hw_counter.payload_index_io_read_counter();

     ...
 }
lib/segment/src/index/field_index/full_text_index/inverted_index.rs (1)

181-186: Fix typo in parameter name

There's a typo in the parameter name: hw_coutner should be hw_counter.

-    fn check_match(
-        &self,
-        parsed_query: &ParsedQuery,
-        point_id: PointOffsetType,
-        hw_coutner: &HardwareCounterCell,
-    ) -> bool;
+    fn check_match(
+        &self,
+        parsed_query: &ParsedQuery,
+        point_id: PointOffsetType,
+        hw_counter: &HardwareCounterCell,
+    ) -> bool;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faeba71 and eabc66d.

📒 Files selected for processing (16)
  • lib/common/common/src/counter/hardware_counter.rs (1 hunks)
  • lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (2 hunks)
  • lib/segment/src/index/field_index/bool_index/mod.rs (2 hunks)
  • lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (2 hunks)
  • lib/segment/src/index/field_index/field_index_base.rs (3 hunks)
  • lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (5 hunks)
  • lib/segment/src/index/field_index/full_text_index/inverted_index.rs (7 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (3 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (4 hunks)
  • lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (4 hunks)
  • lib/segment/src/index/field_index/full_text_index/tests/mod.rs (3 hunks)
  • lib/segment/src/index/field_index/full_text_index/text_index.rs (5 hunks)
  • lib/segment/src/index/query_optimization/condition_converter.rs (2 hunks)
  • lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (2 hunks)
  • lib/segment/src/payload_storage/in_memory_payload_storage_impl.rs (1 hunks)
  • lib/segment/src/payload_storage/query_checker.rs (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (62)
lib/segment/src/payload_storage/in_memory_payload_storage_impl.rs (1)

195-195: LGTM: Added hardware counter to check_payload function call

This addition properly integrates the hardware counter into the test function, which is consistent with the change in the function signature that now requires this parameter.

lib/common/common/src/counter/hardware_counter.rs (1)

232-251: LGTM: Well-structured test for hardware counter accumulator

This new test effectively validates that when a new accumulator is created from a counter cell, measurements are properly aggregated in the original accumulator. The test confirms that increments (42 + 27 = 69) are correctly tracked across different scopes.

lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (2)

5-5: LGTM: Added necessary import for hardware counter

Properly added the import needed for the hardware counter integration.


186-195: LGTM: Properly integrated hardware counter into boolean check operation

The implementation correctly:

  1. Adds the hardware counter parameter to the function signature
  2. Increments the counter with the size of boolean value being read
  3. Preserves the original function behavior

This change aligns with the PR objective to measure IO reads during filtering operations.

lib/segment/src/index/field_index/bool_index/mod.rs (2)

2-2: LGTM: Added necessary import for hardware counter

Properly added the required import for the HardwareCounterCell type.


64-73: LGTM: Successfully propagated hardware counter to implementations

The implementation correctly:

  1. Updates the function signature to include the hardware counter parameter
  2. Passes the counter to the appropriate implementation based on the enum variant

This ensures consistent hardware counter usage throughout the boolean index operations.

lib/segment/src/index/query_optimization/condition_converter.rs (2)

50-50: Added performance measurement capability

The hardware counter is now passed to the check_field_condition function, enabling measurement of I/O operations during field condition checking. The implementation correctly integrates with the broader hardware counter functionality.


170-172: Proper hardware counter integration for nested conditions

The hardware counter is now correctly passed to the check_payload function when processing nested conditions, ensuring consistent I/O measurement throughout the query execution path.

lib/segment/src/index/field_index/field_index_base.rs (3)

5-5: Added hardware counter import

Added the necessary import for HardwareCounterCell to support I/O measurement functionality.


156-161: Enhanced method signature with hardware counter parameter

The special_check_condition method now accepts a hardware counter, allowing for I/O measurement during condition checking. This change is consistent with the PR's objective of measuring filtering operations.


172-176: Full text index operations now include I/O measurement

The full text index operations (parse_query, parse_document, check_match) are now properly instrumented with the hardware counter, enabling detailed measurement of I/O operations in text search functionality.

lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (2)

52-55: Added hardware counter for boolean index operations

The check_values_any call now includes the hardware counter parameter, enabling measurement of I/O operations during boolean value checks.


252-260: Implemented I/O measurement for full text matching

Previously unused parameter _hw_acc is now properly utilized as hw_acc and the hardware counter is passed to both the parse_query and check_match functions for full text queries. This change completes the I/O measurement integration for text search operations.

lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (1)

5-5: Added hardware counter import

Added the necessary import for HardwareCounterCell to support I/O measurement in the boolean index.

lib/segment/src/index/field_index/full_text_index/tests/mod.rs (3)

157-158: Clean import for hardware counter.

Good addition of the hardware counter import that will be used for measurement.


174-175: Hardware counter initialization looks good.

The counter is properly initialized before it's passed to the various methods.


189-190: Properly passing hardware counter to all relevant methods.

The code consistently passes the hardware counter to all query operations that need to track IO operations: query, parse_query, and check_match. This ensures consistent measurement throughout the query flow.

Also applies to: 191-192, 194-195, 199-200, 201-202, 204-205

lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (5)

5-6: Good import addition for hardware counter.

The hardware counter is correctly imported along with the other dependencies.


95-99: Well-structured parameter addition for the get_reader method.

The hardware counter parameter is properly added to the method signature, and the counter is accessed at the beginning of the method for payload index IO read operations.

Also applies to: 100-101


103-104: Accurate measurement of memory-mapped data structure sizes.

The code correctly increments the counter with the size of the data structures being read:

  • Point offset type (line 103)
  • Compressed posting chunks index (line 107)
  • Remainder postings based on count (line 116)

This provides precise measurements of IO read operations.

Also applies to: 107-108, 116-117


128-136: Properly implemented counter for the get method.

The hardware counter is correctly added to the method signature and immediately used to track the PostingListHeader size before delegating to get_reader, ensuring all read operations are measured.


139-140: Correctly passing the hardware counter to the nested method call.

The hardware counter is properly forwarded to the get_reader method, maintaining the measurement chain.

lib/segment/src/payload_storage/query_checker.rs (6)

115-123: Well-structured integration of hardware counter in payload checking.

The function signature is extended to include the hardware counter parameter, which is needed for proper measurement of IO operations during payload filtering.


127-134: Properly passing hardware counter to field condition check.

The hardware counter is correctly forwarded to the field condition checker, ensuring IO operations are measured consistently across the call stack.


161-162: Hardware counter properly passed to nested payload checks.

The hardware counter is correctly forwarded to the nested check_payload call, ensuring that IO operations in nested objects are also measured.


187-192: Appropriate hardware counter parameter for field condition checking.

The hardware counter parameter is properly added to the method signature for consistent measurement throughout the condition checking process.


208-210: Properly using hardware counter in special condition check.

The code correctly passes the hardware counter to the special_check_condition method call, ensuring IO operations during condition checking are measured.


319-320: Creating a new hardware counter for testing.

Good approach for testing environments - creating a fresh hardware counter for each test case ensures clean measurements.

lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (6)

3-5: Good imports for hardware counter support.

The code properly imports both the hardware counter and the necessary constant for overhead calculation.


47-53: Well-structured integration of hardware counter in the filter method.

The hardware counter parameter is properly added to the method signature, and the counter for payload index IO reads is correctly extracted at the beginning of the method.


59-66: Accurate measurement of compressed posting list access.

The counter is incremented with the size of the optional compressed posting list plus its actual length when accessed. This provides precise measurement of memory read operations.

The approach of adding the size of the Option wrapper and the actual content size is appropriate for measuring the total memory access. This pattern has been reviewed previously and confirmed to be the correct approach.


97-102: Hardware counter properly integrated in check_match method.

The hardware counter parameter is correctly added to the method signature, and the payload index IO read counter is properly accessed.

Also applies to: 111-112


116-120: Accurate measurement of posting list checks.

The counter is incremented with the size of the CompressedPostingList plus its length, providing precise measurement of posting list access during match checks.


140-145: Proper integration of hardware counter in token ID lookup.

The code correctly adds the hardware counter parameter to the method and increments the counter with the appropriate overhead plus the size of TokenId when performing vocabulary lookups.

lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (3)

3-4: Imports look fine.

These imports are valid and necessary for the subsequent hardware counter logic and bucket overhead usage.


182-190: Logic for delegated match checking looks good.

The code correctly delegates to parsed_query.check_match(doc, hw_counter). This is straightforward and appears correct for capturing performance metrics.


208-211: Structural usage of the hardware counter is consistent.

Incrementing the counter prior to looking up the token in the vocabulary is appropriate. No issues found here.

lib/segment/src/index/field_index/full_text_index/text_index.rs (8)

6-6: New hardware counter import.

This import is aligned with the rest of the PR’s instrumentation changes.


99-105: Expanded signature to include hardware counter.

Passing the HardwareCounterCell to the underlying index retrieval is consistent. No immediate concerns.


107-116: Filter method updated for performance measurement.

Forwarding the hw_counter to the inverted index is a clean approach to capturing metrics.


143-160: Match checking integration.

Propagating the hardware counter to each backend implementation ensures consistent performance measurement across mutable, immutable, and mmap indexes.


218-223: parse_query with hardware metrics.

Injecting the hardware counter into token retrieval is consistent with the rest of the code. No issues.


228-235: parse_document with hardware metrics.

Gathering tokens while incrementing the hardware counter is an effective approach to measure I/O overhead.


239-246: Test helper query function.

Wiring hw_counter through parse and filter calls is coherent with the overall design.


338-352: Filtering with a hardware measurement accumulator.

The approach of extracting hw_counter from hw_acc and using it in parsing and filtering is well structured.

lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (5)

174-178: Filter method updated for reading metrics.

Fetching postings with the hw_counter ensures the I/O read measurement is tracked. This is consistent with the rest of the codebase.

Also applies to: 185-185


204-207: Disposable hardware counter usage.

A disposable counter is used here. Consider verifying whether we need to track these increments more permanently or propagate the existing counter for consistency across the call stack.


210-215: Vocab iteration with ephemeral hardware counter.

Similar to get_posting_len, we create a disposable counter. Ensure that ephemeral usage does not omit needed measurements.


219-224: Comprehensive match check instrumentation.

The read measurement is integrated properly by retrieving the postings with hw_counter.


271-278: Token lookup with hardware counter.

Passing the hardware counter into self.vocab.get(token, hw_counter) ensures consistent instrumentation.

lib/segment/src/index/field_index/full_text_index/inverted_index.rs (12)

3-3: LGTM - Adding HardwareCounterCell import

The addition of the hardware counter import is appropriate for implementing IO read measurements.


47-55: Nice implementation of hardware counter tracking

This is a good approach to measure IO operations by tracking the number of tokens being processed. The counter is incremented by the sum of query tokens and document tokens, which accurately represents the total number of token comparisons needed for matching.


90-94: LGTM - Updated filter method signature

The method signature has been correctly updated to include the hardware counter parameter.


194-194: LGTM - Updated get_token_id method signature

The method signature has been correctly updated to include the hardware counter parameter.


201-201: LGTM - Added hardware counter import to tests

Appropriate import added for test module.


314-315: LGTM - Hardware counter instantiation in tests

Creating a hardware counter instance for the tests ensures the new parameter requirements are satisfied.


320-320: LGTM - Updated get_token_id call with hardware counter

Test code properly updated to pass the hardware counter.


325-325: LGTM - Updated postings getter with hardware counter

Test code properly updated to pass the hardware counter when getting postings.


373-374: LGTM - Hardware counter instantiation for query tests

Creating a hardware counter instance for the query tests ensures consistency.


377-379: LGTM - Updated token ID lookup with hardware counter

Test code properly updated to pass the hardware counter when looking up token IDs.


383-386: LGTM - Updated filter calls with hardware counter

Test code properly updated to pass the hardware counter when filtering.


405-408: LGTM - Updated filter calls after deletion tests

Test code properly updated to pass the hardware counter when filtering after deletion operations.

@JojiiOfficial JojiiOfficial requested review from timvisee and removed request for timvisee March 13, 2025 08:07
@JojiiOfficial JojiiOfficial merged commit eeca054 into dev Mar 13, 2025
17 checks passed
@JojiiOfficial JojiiOfficial deleted the payload_fulltext_index_io_read_measurement branch March 13, 2025 08:32
timvisee pushed a commit that referenced this pull request Mar 21, 2025
* FullTextIndex filter measurements

* Clippy

* Add test for `new_accumulator`
@xzfc xzfc mentioned this pull request Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants