-
Notifications
You must be signed in to change notification settings - Fork 1.8k
IsEmpty/IsNull index #6088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IsEmpty/IsNull index #6088
Conversation
generall
commented
Mar 2, 2025
- create initial strucutres
- clippy
- start field-query refactoring
- start field-query refactoring (2/N)
- start field-query refactoring (3/N): duplicate is_empty/null condiftions as field condition
…ons as field condition
…case new index is not built yet
📝 WalkthroughWalkthroughThis pull request introduces two new fields, Suggested reviewers
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
lib/segment/src/index/query_optimization/condition_converter.rs
Outdated
Show resolved
Hide resolved
lib/segment/src/index/field_index/null_index/mmap_null_index.rs
Outdated
Show resolved
Hide resolved
for value in payload { | ||
match value { | ||
Value::Null => { | ||
is_null = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_null = true; | |
has_values = true; | |
is_null = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, we don't consider null as a value
@@ -8397,6 +8397,16 @@ | |||
"nullable": true | |||
} | |||
] | |||
}, | |||
"is_empty": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both alternative syntaxes need an openapi integration test for sanity check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, found couple of problems with new tests
Co-authored-by: Tim Visée <tim+github@visee.me>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (12)
docs/grpc/docs.md (1)
2802-2803
: New Boolean Fields Added for Field Condition ChecksThe two optional boolean fields,
is_empty
andis_null
, have been introduced in theFieldCondition
schema. Their descriptions (“Check if field is empty” and “Check if field is null”) clearly correspond to the PR objectives of improving empty/null field handling. Please ensure that these additions are also reflected consistently in all related documentation and API specs (e.g., OpenAPI, proto files) so that users can accurately understand their intended usage.lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (1)
217-238
: New convenience method for dynamic flag resizing.The
set_with_resize
method is a valuable addition that provides amortized growth when needed. The implementation handles edge cases well:
- Only resizes when necessary (when the index is beyond current length)
- Only allocates new space when setting a true value (since false is the default)
- Returns the previous value correctly
This will simplify client code that needs to work with dynamically sized flags.
Consider adding a unit test specifically for the
set_with_resize
method to ensure its behavior is verified, especially for edge cases like:
- Setting a value beyond current length
- Setting false beyond current length (should not resize)
- Verifying the return value is correct both for existing and new indices
#[test] fn test_set_with_resize() { let dir = Builder::new().prefix("storage_dir").tempdir().unwrap(); let mut dynamic_flags = DynamicMmapFlags::open(dir.path()).unwrap(); // Initialize with length 10 dynamic_flags.set_len(10).unwrap(); // Test: Set true beyond current length let prev = dynamic_flags.set_with_resize(20, true).unwrap(); assert_eq!(prev, false); // Previous value should be false assert!(dynamic_flags.len() >= 21); // Length should be extended assert!(dynamic_flags.get(20)); // New value should be true // Test: Set false beyond current length let current_len = dynamic_flags.len(); let prev = dynamic_flags.set_with_resize(current_len + 10, false).unwrap(); assert_eq!(prev, false); // Previous value should be false assert_eq!(dynamic_flags.len(), current_len); // Length should not change // Test: Toggle existing value dynamic_flags.set(5, true).unwrap(); let prev = dynamic_flags.set_with_resize(5, false).unwrap(); assert_eq!(prev, true); // Previous value should be true assert!(!dynamic_flags.get(5)); // New value should be false }lib/segment/tests/integration/payload_index_test.rs (1)
622-661
: Great refactoring to improve code clarity and maintainability.The refactored code improves readability by:
- Separating checks for each key (
INT_KEY
,INT_KEY_2
,INT_KEY_3
) into distinct blocks- Using iterator methods with
any()
to check for specific index types- Using clear assertions with meaningful error messages
This makes the test logic easier to understand and maintain, particularly when debugging failures.
lib/segment/src/index/field_index/index_selector.rs (4)
53-84
: Consider extracting repeated logic in the match arms
The expanded match arms now manage a mutableindexes
collection to push multiple index types, including a null index. While fully functional, this section is rather large. Extracting common logic into helper functions could improve maintainability.
86-88
: Fallback tonew_null_index
is coherent
This approach gracefully adds a null index when available. As seen in the comment, aToDo
remains for the RocksDb case. Logging a warning or delegating a partial index creation could guide future development.
239-246
:null_builder
is straightforward but incomplete for RocksDb
The function currently returnsNone
for RocksDb. This is acceptable as a first step, but keep in mind the “ToDo” comment and incorporate RocksDb support to align with other builder methods.
248-255
:new_null_index
mirrors the builder logic
This follows the same RocksDb/OnDisk distinction. The code is concise. Future expansions could add anappendable
approach for RocksDb.lib/segment/src/index/struct_payload_index.rs (1)
357-359
: Fallback to full ID iteration if no index is found
Relying on a full scan whenquery_field
returnsNone
might have performance implications for large segments. Consider adding trace logging or user notification if repeated usage of this fallback is expected.lib/segment/src/index/query_optimization/condition_converter.rs (2)
55-83
: Dedicated null index usage forCondition::IsEmpty
Leveraging a pre-built null index significantly boosts efficiency for empty checks. Fallback mechanisms for payload inspection are sensible. You might add a debug log when defaulting to the payload-based approach.
415-435
:get_is_empty_checker
remains limited toNullIndex
As implemented, this only returns a checker forNullIndex
and otherwise yieldsNone
. That’s appropriate for the current scope, but future expansions could incorporate boolean or other specialized indexes if needed.lib/segment/src/index/field_index/null_index/mmap_null_index.rs (1)
204-246
: Consider partial iteration for non-empty conditions.
When filtering for non-empty or non-null fields, the logic returnsNone
and expects external iteration. If many points exist, this approach might be less efficient. Consider providing an explicit iterator for all points not set, or discuss a fallback approach to reduce overhead.lib/segment/src/types.rs (1)
2230-2242
: Add coverage for newis_empty
andis_null
constructors.
The new builders are convenient but ensure they’re tested. Consider unit tests or integration tests verifying filters like{ "is_empty": true }
or{ "is_null": true }
to confirm query correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
docs/grpc/docs.md
(1 hunks)docs/redoc/master/openapi.json
(2 hunks)lib/api/src/grpc/conversions.rs
(4 hunks)lib/api/src/grpc/proto/points.proto
(1 hunks)lib/api/src/grpc/qdrant.rs
(1 hunks)lib/api/src/grpc/validate.rs
(2 hunks)lib/segment/src/index/field_index/field_index_base.rs
(18 hunks)lib/segment/src/index/field_index/index_selector.rs
(7 hunks)lib/segment/src/index/field_index/mod.rs
(2 hunks)lib/segment/src/index/field_index/null_index/mmap_null_index.rs
(1 hunks)lib/segment/src/index/field_index/null_index/mod.rs
(1 hunks)lib/segment/src/index/query_estimator.rs
(2 hunks)lib/segment/src/index/query_optimization/condition_converter.rs
(9 hunks)lib/segment/src/index/query_optimization/condition_converter/match_converter.rs
(6 hunks)lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs
(1 hunks)lib/segment/src/index/struct_payload_index.rs
(4 hunks)lib/segment/src/payload_storage/condition_checker.rs
(2 hunks)lib/segment/src/problems/unindexed_field.rs
(2 hunks)lib/segment/src/segment_constructor/segment_builder.rs
(1 hunks)lib/segment/src/types.rs
(10 hunks)lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs
(3 hunks)lib/segment/tests/integration/payload_index_test.rs
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- lib/segment/src/index/field_index/null_index/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (57)
lib/api/src/grpc/qdrant.rs (1)
6281-6286
: LGTM - Properly implemented new field condition checks.The addition of
is_empty
andis_null
optional boolean fields to theFieldCondition
message is implemented correctly with appropriate protobuf tags, types, and documentation. This enhancement aligns well with the PR objective of improving empty/null field handling in field queries.From the PR objectives and AI summary, it appears these fields are consistently added across the codebase to support checking for empty or null field values, which is a useful enhancement to the filtering capabilities.
lib/segment/src/segment_constructor/segment_builder.rs (1)
244-244
: LGTM! Correctly handles the new NullIndex case.The match arm
FieldIndex::NullIndex(_) => {}
properly follows the existing pattern for index types that don't contribute to ordering values. This addition is necessary to support the new null indexing capability and is implemented consistently with other similar index types.lib/api/src/grpc/proto/points.proto (1)
1048-1049
: Good addition of empty/null field checks to enhance filtering capabilities.These new optional boolean fields enable explicit filtering on empty and null field states, which is a valuable feature for queries. The implementation follows the protobuf best practice of using optional fields with descriptive comments.
lib/segment/src/index/query_optimization/rescore_formula/value_retriever.rs (1)
178-178
: LGTM! Properly handles the new NullIndex case.The handler correctly returns
None
forNullIndex
, which is the expected behavior since null values don't have retrievable content. The comment also provides valuable context that this index type is expected to be used alongside other indices for the same field.lib/segment/src/problems/unindexed_field.rs (2)
220-222
: LGTM! Successfully extracted the new fields from the FieldCondition struct.The addition of
is_empty
andis_null
to the destructuring pattern correctly extracts these new fields from theFieldCondition
struct for further processing.
259-262
: LGTM! Enhanced condition to support new field checks.The updated condition now correctly considers the new
is_empty
andis_null
fields when determining if indexes should be inferred. This ensures that queries using these new field checks will have the opportunity to benefit from appropriate indexing.lib/api/src/grpc/conversions.rs (4)
1376-1377
: Proper integration of new field conditions.The addition of
is_empty
andis_null
fields to theFieldCondition
struct is a good enhancement, allowing for more expressive query conditions to check for empty or null values.
1400-1401
: Consistent field handling in type conversion.The proper handling of
is_empty
andis_null
fields in thetry_from
implementation ensures field conditions are correctly transferred from gRPC representations to internal types.
1416-1417
: Appropriate field inclusion in struct destructuring.The inclusion of the new fields in struct destructuring ensures they're properly captured during conversion.
1435-1436
: Bidirectional conversion support for new fields.The proper mapping of
is_empty
andis_null
fields in the reverse conversion direction ensures consistency in data transformation between internal and API representations.lib/segment/src/index/field_index/mod.rs (2)
5-5
: Simplified type imports matching refactored condition types.The import statement has been updated to remove
IsEmptyCondition
andIsNullCondition
types, which aligns with the refactoring to integrate these condition types into theFieldCondition
struct.
27-32
:✅ Verification successful
Simplified PrimaryCondition enum with integrated condition handling.
The
PrimaryCondition
enum has been simplified by removing dedicated variants forIsEmpty
andIsNull
conditions. This is a good refactoring that reduces duplication and simplifies the condition handling by integrating these checks into theFieldCondition
struct.
🏁 Script executed:
#!/bin/bash # Find usages of PrimaryCondition to confirm the refactoring is applied consistently rg "PrimaryCondition" --type rustLength of output: 7045
PrimaryCondition Refactor Verified and Approved
The updated
PrimaryCondition
enum shows a consistent application of refactoring by removing the dedicatedIsEmpty
andIsNull
variants in favor of usingFieldCondition
for condition handling. The grep output confirms that all usages across tests and index implementations now rely on theCondition
variant (as well as theIds
andHasVector
variants), ensuring that the refactoring is applied uniformly without impacting the overall functionality.
- The simplified enum in
lib/segment/src/index/field_index/mod.rs
(lines 27-32) is correctly reflected.- All references in integration tests and various index modules consistently use the
PrimaryCondition::Condition
variant.- The integration of condition handling into
FieldCondition
reduces duplication and streamlines the logic.lib/api/src/grpc/validate.rs (2)
209-211
: Thorough field extraction in validation.The destructuring of the
FieldCondition
struct has been updated to include the newis_empty
andis_null
fields, ensuring they are properly accessible for validation.
219-221
: Complete validation of all field condition types.The
all_fields_none
check has been extended to include the new fields, ensuring that validation correctly identifies when no field conditions are specified. This maintains the requirement that at least one field condition must be specified.lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs (1)
148-150
: Enhanced documentation for set_len behavior.The addition of documentation clarifying the capacity behavior is helpful. It clearly communicates that capacity can be up to twice the current length, which is important for understanding performance implications.
lib/segment/src/index/field_index/field_index_base.rs (12)
22-22
: Appropriate import for the new null index types.This import correctly brings in the necessary types for supporting null indexing functionality.
124-124
: Good addition of NullIndex variant.Adding the NullIndex variant to the FieldIndex enum is a clean way to extend the system's capability to handle null values explicitly.
140-140
: Debug implementation for NullIndex properly added.The Debug implementation correctly follows the established pattern for other index types.
181-182
: Consistent implementation for special_check_condition.The implementation returns None for NullIndex, following the same pattern as most other index types.
197-198
: Correct delegation to payload_field_index in get_payload_field_index.The implementation correctly returns the payload_field_index for the NullIndex variant.
213-214
: Comprehensive implementation across all FieldIndex methods.All necessary methods for handling the NullIndex variant are properly implemented, including load, cleanup, add_point, remove_point, get_telemetry_data, values_count, and values_is_empty. Each follows the established pattern for other index types.
Also applies to: 229-230, 297-299, 315-316, 331-332, 347-348, 363-364
378-380
: Correct exclusion of NullIndex from numeric representation.The NullIndex variant is correctly excluded from having a numeric representation, similar to FullTextIndex and other non-numeric indices.
394-396
: Appropriate exclusion of NullIndex from facet index representation.The NullIndex is correctly excluded from having a facet index representation, following the pattern for other index types that don't support this functionality.
444-444
: Builder pattern correctly extended with NullIndex variant.The FieldIndexBuilder enum is properly extended with the NullIndex variant to support the creation of null indices.
470-471
: Complete implementation of builder methods for NullIndex.All necessary methods (init, add_point, finalize) for the NullIndex builder are properly implemented, maintaining consistency with other index types.
Also applies to: 496-497, 520-521
22-521
:✅ Verification successful
Verify integration with the rest of the system.
The implementation looks solid, but it's important to ensure that the system correctly uses this new index type for null value queries. Make sure that:
- The null index is created when needed
- Queries properly utilize the null index when filtering for null values
- The indexing mechanism correctly identifies and indexes null values
🏁 Script executed:
#!/bin/bash # Description: Verify how the NullIndex is used in the codebase # Search for where NullIndex is created and built echo "Checking for NullIndex creation:" rg -A 5 "NullIndex|null_index" --type rust # Check if there's code that decides when to create a NullIndex echo -e "\nChecking index selection logic:" rg -A 10 "index.*selector|select.*index" --type rust # Check if null fields are properly handled in queries echo -e "\nChecking query handling of null fields:" rg -A 5 "is_null|null.*field|field.*null" --type rustLength of output: 73216
Review Approved: NullIndex Integration Verified
The integration of the new NullIndex has been thoroughly verified across the codebase. The verification output confirms that:
- The NullIndex is created when needed (e.g. in the index selector using
MmapNullIndex::builder
).- Query logic correctly handles null fields by explicitly checking for
FieldIndex::NullIndex
in condition converters and payload estimations.- The indexing mechanism properly identifies and indexes null values (via
values_is_null
and related functions).No further action is required.
22-22
:❓ Verification inconclusive
Verify correctness of the null index implementation.
Let's make sure the imported MmapNullIndex and MmapNullIndexBuilder are properly implemented.
🏁 Script executed:
#!/bin/bash # Description: Examine the implementation of null index # Check the MmapNullIndex implementation echo "Examining MmapNullIndex implementation:" fd -e rs "null_index.*mmap_null_index\.rs" | xargs cat # Check if there are tests for the null index functionality echo -e "\nChecking for null index tests:" rg -A 5 "test.*null_index|null_index.*test" --type rustLength of output: 280
Action: Manually verify the null index implementation.
Our automated scans did not return any results for the definitions of
MmapNullIndex
andMmapNullIndexBuilder
. Please ensure that these components are implemented correctly in the codebase and that the import path inlib/segment/src/index/field_index/field_index_base.rs
is accurate. If the files exist under a different name or location, update the import accordingly.
🏁 Script executed:
#!/bin/bash # Search for definitions of MmapNullIndex and MmapNullIndexBuilder in the repository echo "Searching for 'struct MmapNullIndex' definition:" rg -n "struct\s\+MmapNullIndex" --type rust echo -e "\nSearching for 'struct MmapNullIndexBuilder' definition:" rg -n "struct\s\+MmapNullIndexBuilder" --type rust echo -e "\nSearching for file names containing 'mmap_null_index':" fd -i 'mmap_null_index'Length of output: 563
Action: Please manually verify the implementation.
Our automated checks confirm that the file exists at
lib/segment/src/index/field_index/null_index/mmap_null_index.rs
,
but we couldn’t locate explicit definitions forMmapNullIndex
orMmapNullIndexBuilder
using automated searches. It’s recommended to manually inspect the file to ensure that these types (or their equivalents) are properly defined and that their behavior matches the intended functionality. Also, verify that corresponding tests exist to validate their correctness.
- File to check:
lib/segment/src/index/field_index/null_index/mmap_null_index.rs
- Focus Areas:
- Ensure that
MmapNullIndex
andMmapNullIndexBuilder
are either explicitly defined or correctly re-exported.- Confirm that public interfaces and related tests cover the null index functionality.
docs/redoc/master/openapi.json (3)
8358-8358
: Documentation improvement for geo_bounding_box descriptionThis change correctly updates the terminology from "geo location" to "geolocation" for better consistency in the documentation.
8401-8405
: Addition ofis_empty
field checking capabilityThis new field provides an alternative syntax for checking if a specified field is empty, enhancing the filtering capabilities. The description is clear and the field is correctly marked as nullable since it's optional.
As per a previous comment by agourlay, integration tests should be added for this alternative syntax to ensure it works as expected. Please verify that integration tests have been added for this feature.
8406-8410
: Addition ofis_null
field checking capabilityThis new field adds the ability to check if a specified field is null using an alternative syntax. Similar to
is_empty
, this enhances the filtering capabilities and is well-documented.As with the
is_empty
field, please ensure that integration tests are added for this alternative syntax as recommended by agourlay's previous comment to verify correct behavior in various scenarios.lib/segment/src/index/query_estimator.rs (3)
286-287
: Nice addition of the new fields to support empty and null checks.These fields will allow for more flexible filtering capabilities in accordance with the PR objective of implementing IsEmpty/IsNull indexing.
331-333
: Good refactoring of the IsEmpty condition implementation.The change properly encapsulates the condition within a generic
PrimaryCondition::Condition
containing a boxedFieldCondition::new_is_empty()
. This approach aligns with the PR's objective of creating a more consistent structure for field conditions.
339-341
: Good refactoring of the IsNull condition implementation.Similar to the IsEmpty change, this properly encapsulates the condition using
PrimaryCondition::Condition
with a boxedFieldCondition::new_is_null()
, which maintains consistency in how conditions are represented.lib/segment/src/payload_storage/condition_checker.rs (3)
46-47
: Cleanly integrated the new fields in the struct destructuring.The new
is_empty
andis_null
fields are properly added to the destructuring pattern, ensuring they're available for use in the function body.
71-78
: Well-implemented matching logic for the is_empty condition.The implementation correctly handles different value types:
- For
Value::Null
, it returns theis_empty
value directly- For scalar types (Bool, Number, String, Object), it returns the negation of
is_empty
- For arrays, it checks if the array's emptiness matches the requested
is_empty
valueThis aligns with the documented behavior where "this condition will match all records where the field reports either does not exist, or has null or [] value."
79-86
: Well-implemented matching logic for the is_null condition.The implementation correctly handles different value types:
- For
Value::Null
, it returns theis_null
value directly- For scalar types (Bool, Number, String, Object), it returns the negation of
is_null
- For arrays, it checks if any element is null and compares that to the
is_null
valueThis provides a logical extension to the null checking capabilities.
lib/segment/src/index/query_optimization/condition_converter/match_converter.rs (4)
56-56
: Complete coverage of NullIndex in match_value_checker.The additions properly integrate the new
NullIndex
type into the match patterns, ensuring it's handled appropriately for all value variant types (Bool, Integer, String). This ensures that value matching will work correctly with null indexes.Also applies to: 66-66, 74-75
133-133
: Complete coverage of NullIndex in match_any_checker.The additions properly integrate the new
NullIndex
type into the match patterns for bothAnyVariants::Integers
andAnyVariants::Strings
, ensuring that "any" matching will work correctly with null indexes.Also applies to: 141-142
196-196
: Complete coverage of NullIndex in match_except_checker.The additions properly integrate the new
NullIndex
type into the match patterns for bothAnyVariants::Strings
andAnyVariants::Integers
, ensuring that "except" matching will work correctly with null indexes.Also applies to: 205-206
235-236
: Complete coverage of NullIndex in match_text_checker.The additions properly integrate both
UuidMapIndex
andNullIndex
types into the match patterns, ensuring that text matching will work correctly with these index types.lib/segment/src/index/field_index/index_selector.rs (2)
23-23
: Use ofMmapNullIndex
import is valid
This import is necessary for the newly introduced null index functionality.
358-360
:null_dir
extension is consistent
This function correctly follows the pattern used in other directory helpers such asmap_dir
andnumeric_dir
.lib/segment/src/index/struct_payload_index.rs (3)
40-41
: Expanded imports for payload schema
The inclusion ofPayloadContainer
and similar items at these lines aligns with new usage in the cardinality and condition code paths.
253-257
:IsEmpty
condition bridging toFieldCondition
This snippet transforms anIsEmptyCondition
into a fullFieldCondition
for estimation. The pattern is consistent with how other conditions are handled.
260-264
:IsNull
condition bridging toFieldCondition
Similarly,IsNullCondition
is converted into aFieldCondition
and carries through the same evaluation flow. This fosters uniformity in condition handling.lib/segment/src/index/query_optimization/condition_converter.rs (6)
9-9
: AddedMmapNullIndex
import
Required for indexing null/empty values. Import is correct.
85-107
: Custom logic forCondition::IsNull
This logic cleanly attempts a null-index-based check first, then gracefully defaults to payload inspection. The approach is consistent and reduces code duplication in repeated checks.
269-270
: AddedNullIndex
to the match arms returningNone
For geo or numeric checks,NullIndex
is correctly bypassed to the fallback because it doesn’t handle these operations. This explicit coverage prevents silent mismatches.Also applies to: 291-292, 312-313, 345-346, 369-370
373-389
:get_is_empty_indexes
provides a primary/fallback strategy
Splitting out null-index-based checks from any other available index is clear. The function nicely returns a two-part tuple to guide the next step.
391-413
:get_fallback_is_empty_checker
merges index checks with payload
Combining index results with a final payload reference is pragmatic to ensure consistency. This approach is particularly valuable if payload data can differ from partially built indexes.
437-453
:get_is_null_checker
parallelsget_is_empty_checker
The logic is nearly identical yet adapted for null checks. Implementation is consistent. Similar expansions could apply if other indexes begin tracking null states differently.lib/segment/src/index/field_index/null_index/mmap_null_index.rs (4)
41-52
: Ensure robust directory creation and error handling.
When creating new directories, consider race conditions under concurrent usage. For instance, if another thread/process is creating the same directory at the same time, handleAlreadyExists
gracefully rather than treating it as a terminal error.
95-130
: Validate correct handling for multi-valued payloads.
Withinadd_point
, ifpayload
contains multiple arrays or nested data, ensurehas_values
is set accurately (e.g., for non-empty arrays). Otherwise, legitimate values might be ignored.
112-116
: Revisit array emptiness logic and consider adding a dedicated test.
A previous comment suggested markinghas_values
astrue
for non-empty arrays. If the intention is to treat non-empty arrays as 'has values,' make sure to updatehas_values
. Also, adding a targeted unit test would help validate various array payload cases.
307-316
: Check concurrency safety for builder usage.
While the builder approach is straightforward, if multiple threads use the same builder, there may be data races on the internal slices. Document or enforce single-thread usage if concurrency is not intended.lib/segment/src/types.rs (2)
2122-2125
: Clarify semantics ofis_empty
vs.is_null
.
Having both options can lead to overlapping semantics (e.g., an array could be empty but not null). Provide guidance or checks to avoid confusion, especially if user expects “empty” to include or excludenull
.
2319-2332
: Streamline usage ofFrom<JsonPath>
for null/empty checks.
These conversions are handy, but verify that they work as intended with nested JSON paths. For instance, “key.subkey” for null vs. empty might behave differently ifkey
orsubkey
is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (2)
95-133
: Add additional defensive checks for payload handlingThe
add_point
implementation could benefit from some additional defensive programming to handle edge cases.Consider adding a check for empty payload:
pub fn add_point(&mut self, id: PointOffsetType, payload: &[&Value]) -> OperationResult<()> { let mut is_null = false; let mut has_values = false; + + // Empty payload is equivalent to no values and no nulls + if payload.is_empty() { + self.has_values_slice.set_with_resize(id, false)?; + self.is_null_slice.set_with_resize(id, false)?; + return Ok(()); + } for value in payload { // Rest of the implementation... } // ... }
99-124
: Early return optimization in add_point match logicIn the current implementation, we continue processing elements even after determining that a value is null.
Consider breaking out of the loop earlier when both
is_null
andhas_values
are true:for value in payload { match value { // existing implementation... } - if is_null { + if is_null && has_values { break; } }This optimization avoids unnecessary processing when we've already determined the complete state of the entry.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs
(1 hunks)lib/segment/src/payload_storage/condition_checker.rs
(4 hunks)tests/openapi/test_basic_retrieve_api.py
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (10)
lib/segment/src/payload_storage/condition_checker.rs (4)
46-48
: Well-structured implementation of these new fieldsThe addition of
is_empty
andis_null
fields to theFieldCondition
struct is clean and follows the existing pattern of optional condition fields.
71-78
: Comprehensive implementation of the is_empty condition checkingThe implementation correctly handles different types of JSON values according to your documentation. For arrays, it checks if any non-null elements exist, which aligns with the expected behavior where an array is considered empty if it contains only null values or is an empty array.
According to the past discussion, this matches your documentation: "This condition will match all records where the field reports either does not exist, or has null or [] value."
79-86
: Logical implementation of the is_null condition checkingThe implementation for
is_null
properly handles all JSON value types. For arrays, it checks if any element in the array is null, which aligns with the expected behavior where an array is considered to have null if at least one element is null.
323-419
: Comprehensive test coverage for the new condition checksThe test suite thoroughly validates the behavior of the
is_empty
andis_null
conditions against various JSON value types, including edge cases like empty arrays, arrays with null values, and arrays with a mix of null and non-null values.tests/openapi/test_basic_retrieve_api.py (3)
183-205
: Additional alternative is_empty syntax test improves coverageThis test complements the existing
is_empty
condition test by verifying that the newer syntax (using"key": "city", "is_empty": true
) produces the same results as the older syntax (using"is_empty": {"key": "city"}
). This ensures backward compatibility while introducing the more intuitive syntax.
236-259
: Comprehensive test for is_null condition with consistent result verificationGood test implementation that verifies both syntaxes for the
is_null
condition produce identical results. The assertion of identical IDs between the two approaches ensures consistency in your API.
294-317
: Well-structured test for is_null negationThe test validates that the
must_not
/is_null
combination is equivalent to usingis_null: false
directly. This is important as it demonstrates both approaches produce consistent results and helps users understand the relationship between these query patterns.lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)
18-28
: Well-designed index structure for empty/null conditionsThe
MmapNullIndex
structure is well-designed with a clear separation of concerns between tracking values and nulls. The documentation clearly explains the purpose and expected usage of this specialized index.
207-250
: Clear and efficient filter implementationThe filter implementation correctly handles the different scenarios for
is_empty
andis_null
conditions. The implementation is efficient by leveraging the precomputed flags and the comments clearly explain the reasoning behind each branch.
334-428
: Comprehensive test coverage for null indexThe test for the null index is thorough and covers all important cases:
- Testing various combinations of values (null, array with null, empty, non-null)
- Validating both positive and negative conditions
- Verifying cardinality estimation
The test structure makes it easy to understand what's being tested and the expected outcomes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/payload_storage/condition_checker.rs (1)
40-49
: Clarify logic incheck_is_empty
.Using
array.iter().any(|x| !x.is_null()) != is_empty
is correct but slightly opaque. A short inline explanation or a more direct condition like.all(|x| x.is_null())
might improve clarity.- Value::Array(array) => array.iter().any(|x| !x.is_null()) != is_empty, + Value::Array(array) => { + let has_non_null = array.iter().any(|x| !x.is_null()); + has_non_null != is_empty + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
lib/segment/src/common/utils.rs
(1 hunks)lib/segment/src/payload_storage/condition_checker.rs
(5 hunks)lib/segment/src/payload_storage/query_checker.rs
(1 hunks)lib/segment/src/types.rs
(11 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/segment/src/types.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (8)
lib/segment/src/payload_storage/query_checker.rs (1)
193-195
: Verify the early return behavior for empty fields.When
field_values
is empty, the code returns immediately without applying potential field-index logic. If this behavior is intentional, no changes are required; otherwise, consider integrating a fallback check for indexes to ensure consistent results.lib/segment/src/common/utils.rs (1)
21-21
: Definition of empty arrays now includes all-null arrays.Returning true for arrays whose elements are all
null
aligns with the documented requirement that null values are treated as non-data. This is consistent with the new semantics of “empty” in the codebase.lib/segment/src/payload_storage/condition_checker.rs (6)
33-38
: Defaultcheck_empty
implementation may mask specialized behaviors.By always returning
false
, custom types that implementValueChecker
won't detect emptiness unless overridden. If this is the intended fallback, it’s correct. Otherwise, you might consider documenting that implementers must overridecheck_empty
where appropriate.
51-59
: Partial null checks in arrays.
array.iter().any(|x| x.is_null()) == is_null
means an array with at least one null satisfiesis_null = true
. This matches the stated requirement that “null is not a value” and is consistent with your documentation.
73-75
: Addingis_empty
andis_null
checks toFieldCondition
.Including these fields in the destructured logic and OR condition ensures that empty/null checks seamlessly integrate with existing match, range, and geo checks. The approach is sound.
Also applies to: 98-99
103-127
: Order of condition checks can skip multi-condition scenarios.When both
values_count
andis_empty
/is_null
are present, the function handlesvalues_count
first. If you intend to prioritizevalues_count
over empty/null checks, this ordering is correct. Otherwise, consider making the logic more explicit about combining conditions.
129-148
:check_empty
favoringis_empty
overis_null
.The code returns
is_empty
first if present; if not, it returns the inverse ofis_null
. This is logical but means specifying both fields simultaneously might lead to unexpected outcomes for users unless well-documented.
377-473
: Extensive test coverage for null and empty conditions.Good job adding thorough tests, including edge cases like
[null]
and partial-null arrays, ensuring robust verification of the new logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (1)
95-127
: 🛠️ Refactor suggestionImprove array handling logic
The current implementation for handling arrays has a potential issue. According to best practices, an empty array should be considered as having no values.
Modify the array handling to properly distinguish between empty arrays and arrays with values:
Value::Array(array) => { - if array.iter().any(|v| v.is_null()) { + if !array.is_empty() { + if array.iter().any(|v| v.is_null()) { + is_null = true; + } + if array.iter().any(|v| !v.is_null()) { + has_values = true; + } + } - is_null = true; - } - if array.iter().any(|v| !v.is_null()) { - has_values = true; - } }This implementation considers empty arrays as neither having values nor nulls, which aligns with the expected behavior.
🧹 Nitpick comments (4)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (4)
141-143
: Consider renaming method for clarityThe method name
values_count
is potentially misleading since it returns either 0 or 1 based on the presence of values, not an actual count of values.- pub fn values_count(&self, id: PointOffsetType) -> usize { - usize::from(self.has_values_slice.get(id)) - } + pub fn has_values(&self, id: PointOffsetType) -> bool { + self.has_values_slice.get(id) + }This makes the API more intuitive and consistent with the other boolean methods like
values_is_empty
andvalues_is_null
.
145-151
: Consistent method naming patternThe method names
values_is_empty
andvalues_is_null
follow a consistent pattern, which is good. However, consider standardizing the naming to improve readability.- pub fn values_is_empty(&self, id: PointOffsetType) -> bool { + pub fn is_empty(&self, id: PointOffsetType) -> bool { !self.has_values_slice.get(id) } - pub fn values_is_null(&self, id: PointOffsetType) -> bool { + pub fn has_null(&self, id: PointOffsetType) -> bool { self.is_null_slice.get(id) }This makes the API more concise and aligns better with the filter condition naming.
163-176
: Consider adding validation for cleanupThe
cleanup
method directly callsremove_dir_all
without checking if the directory exists first. While this likely works fine in practice, it's a good practice to validate existence before removal.fn cleanup(self) -> OperationResult<()> { - std::fs::remove_dir_all(&self.base_dir)?; + if self.base_dir.exists() { + std::fs::remove_dir_all(&self.base_dir)?; + } Ok(()) }This prevents potential errors if the directory was already removed by another process.
185-192
: Optimize the flusher functionThe flusher function creates closures for both slices and then calls them one after another. For better error handling, consider the following approach:
fn flusher(&self) -> Flusher { let Self { base_dir: _, has_values_slice, is_null_slice, } = self; - let is_empty_flusher = has_values_slice.flusher(); - let is_null_flusher = is_null_slice.flusher(); + let has_values_slice_cloned = has_values_slice.clone(); + let is_null_slice_cloned = is_null_slice.clone(); Box::new(move || { - is_empty_flusher()?; - is_null_flusher()?; + has_values_slice_cloned.flusher()()?; + is_null_slice_cloned.flusher()()?; Ok(()) }) }This approach would only work if
DynamicMmapFlags
implementsClone
. If it doesn't, the current implementation is fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consensus-compose
- GitHub Check: test-consensus
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test (macos-latest)
- GitHub Check: test-consistency
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
🔇 Additional comments (4)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (4)
18-28
: Well-designed specialized index structureThis implementation of a specialized index for null and empty values is a good approach to optimize query performance for these specific conditions. The design choice to keep separate flags for tracking values and nulls enables efficient filtering.
223-250
: Well-implemented filtering logicThe filter implementation correctly handles the different filtering conditions based on
is_empty
andis_null
. The approach to iterate only over known values when possible and returnNone
when external iteration is required is a good optimization.
304-325
: Builder pattern implementationThe builder pattern is properly implemented with clear initialization and finalization steps. The delegation to the underlying
MmapNullIndex
foradd_point
ensures consistency.
335-428
:❓ Verification inconclusive
Comprehensive test coverage
The test case thoroughly checks various scenarios including null values, null values in arrays, empty payloads, and non-null values. It validates the behavior of both filtering and cardinality estimation.
Consider adding a test for edge cases like large point IDs near the capacity of
DynamicMmapFlags
to ensure proper resizing behavior.How does
DynamicMmapFlags
handle resizing for large point IDs? Is there a risk of performance degradation or memory issues?
🏁 Script executed:
#!/bin/bash # Check how DynamicMmapFlags handles resizing rg -A 10 "set_with_resize" lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rsLength of output: 511
Edge Case Test Recommendation
The current test case robustly checks filtering and cardinality logic for various payloads like null values, arrays with nulls, empty payloads, and non-null values. However, we should consider adding tests for edge cases that use very large point IDs.
Our verification shows that in
lib/segment/src/vector_storage/dense/dynamic_mmap_flags.rs
, theset_with_resize
function dynamically resizes the index by setting the new length tokey + 1
when a true value is provided, while it skips resizing for false values (given the default is false). Although this resizing strategy works as intended, it could lead to performance or memory issues if many high point IDs trigger frequent resizes.Please add tests to:
- Verify the resizing behavior for large point IDs.
- Assess potential performance or memory overhead when resizing is triggered at scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (4)
95-127
: Consider reordering the Value pattern matching for better performance.The current pattern matching implementation is correct, but you might get slight performance improvements by reordering the match arms to handle the most common cases first. Typically, strings, numbers and booleans are more common than arrays, objects and null values.
for value in payload { match value { + Value::String(_) => { + has_values = true; + } + Value::Number(_) => { + has_values = true; + } + Value::Bool(_) => { + has_values = true; + } Value::Null => { is_null = true; } - Value::Bool(_) => { - has_values = true; - } - Value::Number(_) => { - has_values = true; - } - Value::String(_) => { - has_values = true; - } Value::Array(array) => { if array.iter().any(|v| v.is_null()) { is_null = true; } if !array.is_empty() { has_values = true; } } Value::Object(_) => { has_values = true; } } if is_null && has_values { break; } }
207-250
: Add documentation explaining whenNone
is returned from the filter method.The method could benefit from more detailed documentation explaining when and why it returns
None
versus an iterator. Currently, a reader needs to carefully analyze the code to understand thatNone
is returned when external filtering is needed.fn filter<'a>( &'a self, condition: &'a FieldCondition, + // Returns: + // - Some(iter) when the condition can be efficiently filtered using this index + // - None when the condition requires external filtering (e.g., for negative conditions + // like !is_empty or !is_null where we don't have complete information) ) -> Option<Box<dyn Iterator<Item = PointOffsetType> + 'a>> {
252-292
: Consider refactoring shared logic between filter and estimate_cardinality.The logic in
estimate_cardinality
closely mirrors that infilter
. This duplication could potentially lead to maintenance issues if one is updated but not the other. Consider extracting the shared condition evaluation logic into a private helper method.For example:
// Private helper method to determine if the condition can be handled efficiently fn can_handle_condition(&self, condition: &FieldCondition) -> Option<(bool, bool)> { if let Some(is_empty) = condition.is_empty { if is_empty { None // External filtering needed } else { Some((true, false)) // has_values, is_non_empty_condition } } else if let Some(is_null) = condition.is_null { if is_null { Some((false, true)) // is_null_slice, is_null_condition } else { None // External filtering needed } } else { None } } // Then use in both methods: // if let Some((use_has_values, is_positive)) = self.can_handle_condition(condition) { // let slice = if use_has_values { &self.has_values_slice } else { &self.is_null_slice }; // ... // }
168-171
: Consider documenting why the load method is a no-op.The
load
method is implemented as a no-op but without explanation. It would be helpful to document why no loading is needed for this type of index, especially since it's implementing a trait where this method might be expected to do something.fn load(&mut self) -> OperationResult<bool> { - // Nothing needed + // No loading needed as the memory-mapped flags are loaded on construction Ok(true) }lib/segment/src/payload_storage/condition_checker.rs (1)
102-127
: Prioritization logic may need commentsThe
check
method establishes a clear priority between checks:values_count
first, thenis_empty
, thenis_null
, and finally the default check. This prioritization logic might be non-obvious to future maintainers.Consider adding a comment explaining the rationale behind this prioritization:
fn check(&self, payload: &Value) -> bool { let FieldCondition { r#match: _, range: _, geo_radius: _, geo_bounding_box: _, geo_polygon: _, values_count, key: _, is_empty, is_null, } = self; + // Priority order: values_count > is_empty > is_null > other conditions if values_count.is_some() { self.values_count .as_ref() .unwrap() .check_count_from(payload) } else if is_empty.is_some() { check_is_empty(is_empty.unwrap(), payload) } else if is_null.is_some() { check_is_null(is_null.unwrap(), payload) } else { self._check(payload) } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs
(1 hunks)lib/segment/src/payload_storage/condition_checker.rs
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test-consensus
🔇 Additional comments (11)
lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)
18-28
: Well-designed index struct for optimizing null and empty value queries.The
MmapNullIndex
struct is well-designed for its specialized purpose of optimizing IsNull and IsEmpty conditions. The two separate flags for tracking whether values exist (has_values_slice
) and whether null values exist (is_null_slice
) allow for efficient querying of both conditions independently.
112-119
: The array handling logic correctly implements the design decision.The logic for handling arrays correctly implements the design decision that null values don't count as values (based on previous review comments). The code correctly checks for nulls and non-empty arrays separately.
334-428
: Comprehensive test with good coverage of different cases.The test thoroughly verifies the behavior of the null index with various combinations of values (null, null in array, empty, non-null). It validates both the filtering functionality and cardinality estimation.
lib/segment/src/payload_storage/condition_checker.rs (8)
33-37
: Strong addition of empty value handling to the ValueChecker traitThe addition of the
check_empty
method to the ValueChecker trait provides a clear way to determine if a condition should match when a field is empty or missing from a payload. The default implementation returningfalse
is appropriate for most condition types.
51-59
: Well-structured null checking logicThe implementation of
check_is_null
is clear and properly handles different value types. The approach for arrays (checking if any array element is null) is particularly useful for JSON data processing.
63-99
: Effective integration of is_empty and is_null conditions into field checkingThe updated
check_match
method properly integrates the new conditions into the existing logical structure, maintaining the clarity of the code while expanding its functionality.
129-148
: Clarify relationship between empty and nullIn the
check_empty
method, whenis_empty
is not specified butis_null
is, the method returns!*is_null
. This implies that "empty" is the opposite of "null", which might not be immediately intuitive.Is this relationship (empty being the opposite of null) aligned with the intended semantics of these conditions? This is especially important since the documentation seems to treat null and empty arrays as similar concepts.
296-298
: Elegant implementation of check_empty for ValuesCountUsing
check_count(0)
to determine emptiness for ValuesCount is a clean and elegant approach that leverages existing functionality.
377-473
: Comprehensive test coverage for new functionalityThe test suite thoroughly covers all combinations of value types and conditions, providing excellent verification of the new functionality. The tests clearly demonstrate the expected behavior for each combination of payload value and condition.
The test cases confirm that:
- Empty arrays ([]) match when is_empty=true
- Arrays with null values ([null]) match when is_null=true, not when is_empty=true
- Arrays with both values and nulls match is_null=true if they contain at least one null
- Non-array values behave consistently with their expected emptiness/nullness
456-464
:❓ Verification inconclusive
Verify array null behavior matches requirements
The test shows that an array containing both a value and null (
[true, null]
) matchesis_null: true
. This means any array containing at least one null element is treated as "null" for filtering purposes.Is this behavior (matching arrays with any null as opposed to only nulls) aligned with user expectations? If a user filters with
is_null: true
, they might expect to get only fields that are exclusively null, not fields that contain some non-null values.
🏁 Script executed:
#!/bin/bash # Check for any documentation or comments about how is_null should behave with arrays rg -A 5 -B 5 "is_null.*?array" --type rust rg -A 5 -B 5 "array.*?is_null" --type rustLength of output: 3398
Action Required: Confirm the Intended “is_null” Behavior for Arrays
The current implementation determines an array as null if it contains any null element (via
array.iter().any(|x| x.is_null()) == is_null
). The tests inlib/segment/src/payload_storage/condition_checker.rs
and similar logic in the null index module confirm this behavior. However, note that a user filtering withis_null: true
might expect to retrieve only arrays made up exclusively of null values. Please verify that this “any null” matching is indeed the intended design for filtering.
- File Checked:
lib/segment/src/payload_storage/condition_checker.rs
- Key Logic:
Value::Array(array) => array.iter().any(|x| x.is_null()) == is_null
- Implication: Arrays containing both null and non-null values (e.g.
[true, null]
) are considered null.
40-49
:❓ Verification inconclusive
Verify if array with null values is considered empty
The
check_is_empty
function treats array emptiness based on length, but there's a distinction in the implementation compared to the documentation reference: an array with null values ([null]) is not considered empty here.According to the past review comment referencing your documentation:
This condition will match all records where the field reports either does not exist, or has null or [] value.
Is this intentional that an array containing only null ([null]) is not considered empty, while an empty array ([]) is?
🏁 Script executed:
#!/bin/bash # Looking for other usages of is_empty to verify consistency rg -A 3 -B 3 "is_empty.*?true" --type rustLength of output: 4521
Review: Clarify Handling of Arrays with Null Values
The current implementation in
lib/segment/src/payload_storage/condition_checker.rs
at lines 40–49 treats arrays by checking their length (array.is_empty() == is_empty
). This means an array like[null]
is not considered empty, even though the associated documentation states that a field with a null or empty array ([]
) should be treated as empty. We should verify whether this behavior is intentional. If the intention is to treat an array containing onlynull
as empty, the function might need to be updated to inspect array elements (or adjust the documentation if the current behavior is desired).
- File:
lib/segment/src/payload_storage/condition_checker.rs
(Lines 40–49)- Current behavior: For
Value::Array(array)
, emptiness is determined solely by the length of the array.- Documentation expectation: Fields with either a
null
value or an empty array should be considered empty, implying that[null]
might also qualify.Please confirm if this discrepancy is by design or if additional checks (e.g., treating arrays containing only
null
as empty) should be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding the tests 👍
* create initial strucutres * clippy * start field-query refactoring * start field-query refactoring (2/N) * start field-query refactoring (3/N): duplicate is_empty/null condiftions as field condition * start field-query refactoring (4/N): re-instate is_empty fallback in case new index is not built yet * filter for is_empty/is_null * implement add/remove point * upd schema * open and create of null-index * create null-index * fix test * Update lib/segment/src/index/query_optimization/condition_converter.rs Co-authored-by: Tim Visée <tim+github@visee.me> * unit test for null-index * more unit tests * add openapi tests * fmt * fix for integartion tests * rabbit review fix * make [null] non-empty --------- Co-authored-by: Tim Visée <tim+github@visee.me>