-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Payload fulltext index IO read measurements #5954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Payload fulltext index IO read measurements #5954
Conversation
lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
Show resolved
Hide resolved
f15ea01
to
e7e4c15
Compare
4f03a64
to
c58bbae
Compare
8264688
to
e38752a
Compare
00c3999
to
57412ea
Compare
dd5ae10
to
bd123ba
Compare
57412ea
to
874eb0e
Compare
bd123ba
to
f10c290
Compare
874eb0e
to
d57c503
Compare
f10c290
to
ca799aa
Compare
d57c503
to
eabc66d
Compare
📝 WalkthroughWalkthroughThe changes integrate a hardware performance counter into several components of the system. A new test function has been added to verify the correct behavior of the hardware counter accumulator. Method signatures across multiple modules, including various boolean and full-text index implementations, now accept a reference to a hardware counter. These modifications involve incrementing the hardware counter during read operations or when parsing queries and documents, ensuring that performance metrics are captured as operations are executed. Functions related to query optimization, payload storage, condition checking, and inverted index handling have been updated accordingly. The overall logic and structure of the affected operations remain unchanged, with the primary adjustment being the addition of the hardware counter for performance measurement purposes. Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (1)
341-342
: Consider implementing hardware counter for filter methodThe TODO comment indicates that there's still work to be done to measure this index in the
filter
method. Consider adding the hardware counter implementation here as well for consistency with the rest of the changes.lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
130-149
: Avoid overshadowing the hardware counter variable to improve clarity.In this block, the line
let hw_counter = hw_counter.payload_index_io_read_counter();
reuses the same variable name for the new local reference, which may cause confusion. Consider renaming the local variable to avoid overshadowing.fn filter( &self, query: &ParsedQuery, hw_counter: &HardwareCounterCell, ) -> Box<dyn Iterator<Item = PointOffsetType> + '_> { - let hw_counter = hw_counter.payload_index_io_read_counter(); + let local_hw_counter = hw_counter.payload_index_io_read_counter(); ... }lib/segment/src/index/field_index/full_text_index/inverted_index.rs (1)
181-186
: Fix typo in parameter nameThere's a typo in the parameter name:
hw_coutner
should behw_counter
.- fn check_match( - &self, - parsed_query: &ParsedQuery, - point_id: PointOffsetType, - hw_coutner: &HardwareCounterCell, - ) -> bool; + fn check_match( + &self, + parsed_query: &ParsedQuery, + point_id: PointOffsetType, + hw_counter: &HardwareCounterCell, + ) -> bool;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
lib/common/common/src/counter/hardware_counter.rs
(1 hunks)lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs
(2 hunks)lib/segment/src/index/field_index/bool_index/mod.rs
(2 hunks)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
(2 hunks)lib/segment/src/index/field_index/field_index_base.rs
(3 hunks)lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
(5 hunks)lib/segment/src/index/field_index/full_text_index/inverted_index.rs
(7 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs
(3 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
(4 hunks)lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs
(4 hunks)lib/segment/src/index/field_index/full_text_index/tests/mod.rs
(3 hunks)lib/segment/src/index/field_index/full_text_index/text_index.rs
(5 hunks)lib/segment/src/index/query_optimization/condition_converter.rs
(2 hunks)lib/segment/src/index/query_optimization/condition_converter/match_converter.rs
(2 hunks)lib/segment/src/payload_storage/in_memory_payload_storage_impl.rs
(1 hunks)lib/segment/src/payload_storage/query_checker.rs
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (62)
lib/segment/src/payload_storage/in_memory_payload_storage_impl.rs (1)
195-195
: LGTM: Added hardware counter tocheck_payload
function callThis addition properly integrates the hardware counter into the test function, which is consistent with the change in the function signature that now requires this parameter.
lib/common/common/src/counter/hardware_counter.rs (1)
232-251
: LGTM: Well-structured test for hardware counter accumulatorThis new test effectively validates that when a new accumulator is created from a counter cell, measurements are properly aggregated in the original accumulator. The test confirms that increments (42 + 27 = 69) are correctly tracked across different scopes.
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (2)
5-5
: LGTM: Added necessary import for hardware counterProperly added the import needed for the hardware counter integration.
186-195
: LGTM: Properly integrated hardware counter into boolean check operationThe implementation correctly:
- Adds the hardware counter parameter to the function signature
- Increments the counter with the size of boolean value being read
- Preserves the original function behavior
This change aligns with the PR objective to measure IO reads during filtering operations.
lib/segment/src/index/field_index/bool_index/mod.rs (2)
2-2
: LGTM: Added necessary import for hardware counterProperly added the required import for the
HardwareCounterCell
type.
64-73
: LGTM: Successfully propagated hardware counter to implementationsThe implementation correctly:
- Updates the function signature to include the hardware counter parameter
- Passes the counter to the appropriate implementation based on the enum variant
This ensures consistent hardware counter usage throughout the boolean index operations.
lib/segment/src/index/query_optimization/condition_converter.rs (2)
50-50
: Added performance measurement capabilityThe hardware counter is now passed to the
check_field_condition
function, enabling measurement of I/O operations during field condition checking. The implementation correctly integrates with the broader hardware counter functionality.
170-172
: Proper hardware counter integration for nested conditionsThe hardware counter is now correctly passed to the
check_payload
function when processing nested conditions, ensuring consistent I/O measurement throughout the query execution path.lib/segment/src/index/field_index/field_index_base.rs (3)
5-5
: Added hardware counter importAdded the necessary import for
HardwareCounterCell
to support I/O measurement functionality.
156-161
: Enhanced method signature with hardware counter parameterThe
special_check_condition
method now accepts a hardware counter, allowing for I/O measurement during condition checking. This change is consistent with the PR's objective of measuring filtering operations.
172-176
: Full text index operations now include I/O measurementThe full text index operations (parse_query, parse_document, check_match) are now properly instrumented with the hardware counter, enabling detailed measurement of I/O operations in text search functionality.
lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (2)
52-55
: Added hardware counter for boolean index operationsThe
check_values_any
call now includes the hardware counter parameter, enabling measurement of I/O operations during boolean value checks.
252-260
: Implemented I/O measurement for full text matchingPreviously unused parameter
_hw_acc
is now properly utilized ashw_acc
and the hardware counter is passed to both theparse_query
andcheck_match
functions for full text queries. This change completes the I/O measurement integration for text search operations.lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (1)
5-5
: Added hardware counter importAdded the necessary import for
HardwareCounterCell
to support I/O measurement in the boolean index.lib/segment/src/index/field_index/full_text_index/tests/mod.rs (3)
157-158
: Clean import for hardware counter.Good addition of the hardware counter import that will be used for measurement.
174-175
: Hardware counter initialization looks good.The counter is properly initialized before it's passed to the various methods.
189-190
: Properly passing hardware counter to all relevant methods.The code consistently passes the hardware counter to all query operations that need to track IO operations:
query
,parse_query
, andcheck_match
. This ensures consistent measurement throughout the query flow.Also applies to: 191-192, 194-195, 199-200, 201-202, 204-205
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (5)
5-6
: Good import addition for hardware counter.The hardware counter is correctly imported along with the other dependencies.
95-99
: Well-structured parameter addition for theget_reader
method.The hardware counter parameter is properly added to the method signature, and the counter is accessed at the beginning of the method for payload index IO read operations.
Also applies to: 100-101
103-104
: Accurate measurement of memory-mapped data structure sizes.The code correctly increments the counter with the size of the data structures being read:
- Point offset type (line 103)
- Compressed posting chunks index (line 107)
- Remainder postings based on count (line 116)
This provides precise measurements of IO read operations.
Also applies to: 107-108, 116-117
128-136
: Properly implemented counter for theget
method.The hardware counter is correctly added to the method signature and immediately used to track the
PostingListHeader
size before delegating toget_reader
, ensuring all read operations are measured.
139-140
: Correctly passing the hardware counter to the nested method call.The hardware counter is properly forwarded to the
get_reader
method, maintaining the measurement chain.lib/segment/src/payload_storage/query_checker.rs (6)
115-123
: Well-structured integration of hardware counter in payload checking.The function signature is extended to include the hardware counter parameter, which is needed for proper measurement of IO operations during payload filtering.
127-134
: Properly passing hardware counter to field condition check.The hardware counter is correctly forwarded to the field condition checker, ensuring IO operations are measured consistently across the call stack.
161-162
: Hardware counter properly passed to nested payload checks.The hardware counter is correctly forwarded to the nested
check_payload
call, ensuring that IO operations in nested objects are also measured.
187-192
: Appropriate hardware counter parameter for field condition checking.The hardware counter parameter is properly added to the method signature for consistent measurement throughout the condition checking process.
208-210
: Properly using hardware counter in special condition check.The code correctly passes the hardware counter to the
special_check_condition
method call, ensuring IO operations during condition checking are measured.
319-320
: Creating a new hardware counter for testing.Good approach for testing environments - creating a fresh hardware counter for each test case ensures clean measurements.
lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (6)
3-5
: Good imports for hardware counter support.The code properly imports both the hardware counter and the necessary constant for overhead calculation.
47-53
: Well-structured integration of hardware counter in the filter method.The hardware counter parameter is properly added to the method signature, and the counter for payload index IO reads is correctly extracted at the beginning of the method.
59-66
: Accurate measurement of compressed posting list access.The counter is incremented with the size of the optional compressed posting list plus its actual length when accessed. This provides precise measurement of memory read operations.
The approach of adding the size of the Option wrapper and the actual content size is appropriate for measuring the total memory access. This pattern has been reviewed previously and confirmed to be the correct approach.
97-102
: Hardware counter properly integrated in check_match method.The hardware counter parameter is correctly added to the method signature, and the payload index IO read counter is properly accessed.
Also applies to: 111-112
116-120
: Accurate measurement of posting list checks.The counter is incremented with the size of the CompressedPostingList plus its length, providing precise measurement of posting list access during match checks.
140-145
: Proper integration of hardware counter in token ID lookup.The code correctly adds the hardware counter parameter to the method and increments the counter with the appropriate overhead plus the size of TokenId when performing vocabulary lookups.
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (3)
3-4
: Imports look fine.These imports are valid and necessary for the subsequent hardware counter logic and bucket overhead usage.
182-190
: Logic for delegated match checking looks good.The code correctly delegates to
parsed_query.check_match(doc, hw_counter)
. This is straightforward and appears correct for capturing performance metrics.
208-211
: Structural usage of the hardware counter is consistent.Incrementing the counter prior to looking up the token in the vocabulary is appropriate. No issues found here.
lib/segment/src/index/field_index/full_text_index/text_index.rs (8)
6-6
: New hardware counter import.This import is aligned with the rest of the PR’s instrumentation changes.
99-105
: Expanded signature to include hardware counter.Passing the
HardwareCounterCell
to the underlying index retrieval is consistent. No immediate concerns.
107-116
: Filter method updated for performance measurement.Forwarding the
hw_counter
to the inverted index is a clean approach to capturing metrics.
143-160
: Match checking integration.Propagating the hardware counter to each backend implementation ensures consistent performance measurement across mutable, immutable, and mmap indexes.
218-223
: parse_query with hardware metrics.Injecting the hardware counter into token retrieval is consistent with the rest of the code. No issues.
228-235
: parse_document with hardware metrics.Gathering tokens while incrementing the hardware counter is an effective approach to measure I/O overhead.
239-246
: Test helper query function.Wiring
hw_counter
through parse and filter calls is coherent with the overall design.
338-352
: Filtering with a hardware measurement accumulator.The approach of extracting
hw_counter
fromhw_acc
and using it in parsing and filtering is well structured.lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (5)
174-178
: Filter method updated for reading metrics.Fetching postings with the
hw_counter
ensures the I/O read measurement is tracked. This is consistent with the rest of the codebase.Also applies to: 185-185
204-207
: Disposable hardware counter usage.A disposable counter is used here. Consider verifying whether we need to track these increments more permanently or propagate the existing counter for consistency across the call stack.
210-215
: Vocab iteration with ephemeral hardware counter.Similar to
get_posting_len
, we create a disposable counter. Ensure that ephemeral usage does not omit needed measurements.
219-224
: Comprehensive match check instrumentation.The read measurement is integrated properly by retrieving the postings with
hw_counter
.
271-278
: Token lookup with hardware counter.Passing the hardware counter into
self.vocab.get(token, hw_counter)
ensures consistent instrumentation.lib/segment/src/index/field_index/full_text_index/inverted_index.rs (12)
3-3
: LGTM - Adding HardwareCounterCell importThe addition of the hardware counter import is appropriate for implementing IO read measurements.
47-55
: Nice implementation of hardware counter trackingThis is a good approach to measure IO operations by tracking the number of tokens being processed. The counter is incremented by the sum of query tokens and document tokens, which accurately represents the total number of token comparisons needed for matching.
90-94
: LGTM - Updated filter method signatureThe method signature has been correctly updated to include the hardware counter parameter.
194-194
: LGTM - Updated get_token_id method signatureThe method signature has been correctly updated to include the hardware counter parameter.
201-201
: LGTM - Added hardware counter import to testsAppropriate import added for test module.
314-315
: LGTM - Hardware counter instantiation in testsCreating a hardware counter instance for the tests ensures the new parameter requirements are satisfied.
320-320
: LGTM - Updated get_token_id call with hardware counterTest code properly updated to pass the hardware counter.
325-325
: LGTM - Updated postings getter with hardware counterTest code properly updated to pass the hardware counter when getting postings.
373-374
: LGTM - Hardware counter instantiation for query testsCreating a hardware counter instance for the query tests ensures consistency.
377-379
: LGTM - Updated token ID lookup with hardware counterTest code properly updated to pass the hardware counter when looking up token IDs.
383-386
: LGTM - Updated filter calls with hardware counterTest code properly updated to pass the hardware counter when filtering.
405-408
: LGTM - Updated filter calls after deletion testsTest code properly updated to pass the hardware counter when filtering after deletion operations.
* FullTextIndex filter measurements * Clippy * Add test for `new_accumulator`
Depends on #5951
Adds IO read measurements for filtering over full text index and bool index.