-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add mutable numeric payload index on top of Gridstore #6609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
Outdated
Show resolved
Hide resolved
ed8abf3
to
98705fc
Compare
I've done a basic end to end benchmark to validate the approach in this PR is acceptable. I measure upsert time, hot/cold load time, disk usage and memory consumption for RocksDB and Gridstore storage. Benchmark
Memory usageMemory usage when loading 1M points, each having 1 to 10 integer values in payload. I've measured RocksDB storage, and Gridstore with block size 8, 16, 64 and 128 bytes. bfb commandbfb -d 4 --int-payloads 100 --int-payloads-range --max-int-payloads 10 --indexing-threshold 0 -n 1000000 ConclusionLooking at Gridstore, the smallest possible block size for a value type behaves best. It has faster upsert/load times and less disk usage. It makes sense because data is more compact and therefore less IO is needed. When comparing RocksDB versus Gridstore (block size 8), the new Gridstore implementation performs better across the board though a cold load performs similar. The memory usage measurement during load shows both implementations are equal within margin of error, so there's no unexpected memory spikes. You can see each line shifted a bit because of slightly different load times. Based on these results I conclude that we can go through with Gridstore as backing storage. I'll pick the value size as block size (8 bytes for numbers, 16 bytes for UUID). Let me know if I should test more scenarios, and/or if I should plot the results. |
5d9120f
to
147eee9
Compare
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (1)
25-38
: Gridstore configuration looks optimal based on benchmarks.The 4MB page size and block size matching the value type size align with the PR's benchmark findings. The decision to disable compression for numeric values is appropriate.
🧹 Nitpick comments (5)
lib/segment/src/index/field_index/index_selector.rs (1)
186-190
: Track TODO items for completing Gridstore migration.The fallback to RocksDB for non-numeric indices is properly implemented with clear TODO comments. This aligns with the PR's phased approach.
Would you like me to create GitHub issues to track the remaining index types that need Gridstore implementation (map, geo, text, bool indices)?
Also applies to: 206-210, 265-269, 305-308, 329-332, 353-360, 380-386, 406-412
lib/segment/src/index/field_index/numeric_index/tests.rs (1)
375-378
: Consider error handling consistency.While the implementation is correct, consider making error handling consistent with other index types. The other cases don't use
unwrap()
on the result ofnew_*
methods.Consider returning the error instead of unwrapping:
- IndexType::MutableGridstore => { - NumericIndexInner::<FloatPayloadType>::new_gridstore(temp_dir.path().to_path_buf()) - .unwrap() - } + IndexType::MutableGridstore => { + NumericIndexInner::<FloatPayloadType>::new_gridstore(temp_dir.path().to_path_buf())? + }Note: This would require updating the function signature to return a
Result
.lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (2)
315-335
: Hardware counter usage in load_gridstore.The method creates a disposable hardware counter for loading, which might not track actual I/O operations correctly.
Consider accepting a hardware counter parameter for consistent tracking:
- fn load_gridstore(&mut self) -> bool { + fn load_gridstore(&mut self, hw_counter: &HardwareCounterCell) -> bool { let Storage::Gridstore(store) = &self.storage else { return false; }; - let hw_counter = HardwareCounterCell::disposable(); - let hw_counter_ref = hw_counter.ref_payload_io_write_counter(); + let hw_counter_ref = hw_counter.ref_payload_io_read_counter();Note: This would also require updating the
load()
method to pass the counter through.
419-420
: Potential inaccuracy in I/O counter delta calculation.The delta calculation uses
size_of::<T>()
which returns the size of the type itself, but for variable-sized types or when T contains pointers, this might not reflect the actual serialized size.Consider using the actual serialized size if available from Gridstore's put operation, or document this limitation if exact I/O tracking is not critical for numeric indices.
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
729-732
: Builder initialization pattern could be simplified.The current pattern uses
Option<NumericIndex>
which requires runtime checks. Consider initializing the index in the constructor.Consider this alternative pattern:
pub struct NumericIndexGridstoreBuilder< T: Encodable + Numericable + MmapValue + Blob + Send + Sync + Default, P, > where NumericIndex<T, P>: ValueIndexer<ValueType = P>, { - dir: PathBuf, - index: Option<NumericIndex<T, P>>, + index: NumericIndex<T, P>, } impl<T: Encodable + Numericable + MmapValue + Blob + Send + Sync + Default, P> NumericIndexGridstoreBuilder<T, P> where NumericIndex<T, P>: ValueIndexer<ValueType = P>, { - fn new(dir: PathBuf) -> Self { - Self { dir, index: None } + fn new(dir: PathBuf) -> OperationResult<Self> { + Ok(Self { + index: NumericIndex::new_gridstore(dir)?, + }) } }This would eliminate the need for
Option
checks inadd_point
andfinalize
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
lib/common/common/src/flags.rs
(4 hunks)lib/gridstore/src/blob.rs
(1 hunks)lib/gridstore/src/gridstore.rs
(3 hunks)lib/segment/src/index/field_index/field_index_base.rs
(5 hunks)lib/segment/src/index/field_index/index_selector.rs
(15 hunks)lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs
(4 hunks)lib/segment/src/index/field_index/numeric_index/mod.rs
(16 hunks)lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
(7 hunks)lib/segment/src/index/field_index/numeric_index/tests.rs
(12 hunks)lib/segment/src/index/struct_payload_index.rs
(2 hunks)lib/segment/src/payload_storage/mmap_payload_storage.rs
(3 hunks)lib/segment/src/payload_storage/payload_storage_enum.rs
(1 hunks)lib/segment/src/payload_storage/tests.rs
(1 hunks)lib/segment/src/segment_constructor/segment_constructor_base.rs
(1 hunks)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
lib/segment/src/segment_constructor/segment_constructor_base.rs (3)
lib/segment/src/payload_storage/payload_storage_enum.rs (4)
from
(29-31)from
(35-37)from
(41-43)from
(47-49)lib/segment/src/payload_storage/mmap_payload_storage.rs (1)
open_or_create
(35-46)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
open_or_create
(42-50)
lib/segment/src/index/struct_payload_index.rs (1)
lib/common/common/src/flags.rs (1)
feature_flags
(87-95)
lib/segment/src/index/field_index/index_selector.rs (11)
lib/segment/src/index/field_index/numeric_index/mod.rs (8)
new_rocksdb
(170-176)new_rocksdb
(484-489)builder_rocksdb
(506-511)new_gridstore
(192-196)new_gridstore
(499-504)builder_gridstore
(540-545)is_on_disk
(443-449)new
(729-731)lib/segment/src/index/field_index/full_text_index/text_index.rs (4)
new_rocksdb
(37-53)builder_rocksdb
(86-92)config
(106-112)is_on_disk
(285-291)lib/segment/src/index/field_index/map_index/mod.rs (3)
new_rocksdb
(86-92)builder_rocksdb
(108-110)is_on_disk
(451-457)lib/segment/src/index/field_index/geo_index/mod.rs (2)
new_memory
(48-55)builder
(68-70)lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (2)
builder
(37-41)open_or_create
(48-60)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (1)
builder
(202-204)lib/segment/src/index/field_index/field_index_base.rs (1)
is_on_disk
(426-440)lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
is_on_disk
(395-397)lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
is_on_disk
(338-340)lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
is_on_disk
(360-362)lib/segment/src/index/field_index/bool_index/mod.rs (1)
is_on_disk
(86-91)
lib/gridstore/src/blob.rs (4)
lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
to_bytes
(20-22)from_bytes
(24-26)lib/sparse/src/common/sparse_vector.rs (2)
to_bytes
(246-248)from_bytes
(250-252)lib/gridstore/src/fixtures.rs (2)
to_bytes
(20-22)from_bytes
(24-26)lib/segment/src/vector_storage/sparse/stored_sparse_vectors.rs (2)
to_bytes
(67-69)from_bytes
(71-73)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: lint
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
🔇 Additional comments (27)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
301-301
: LGTM! Clean code simplification.The removal of explicit
Ok(())
return is good - the implicit return aftermap_err
is more idiomatic Rust and maintains the same functionality.lib/segment/src/payload_storage/payload_storage_enum.rs (1)
285-288
: Correct alignment with method signature change.The conversion to
to_path_buf()
properly aligns with the updatedMmapPayloadStorage::open_or_create
signature that now accepts an ownedPathBuf
instead of a borrowed&Path
reference.lib/segment/src/payload_storage/tests.rs (1)
139-139
: Consistent with API changes.The
to_path_buf()
conversion correctly matches the updated method signature and maintains consistency with similar changes across the codebase.lib/segment/src/segment_constructor/segment_constructor_base.rs (1)
336-338
:✅ Verification successful
Verify consistent API usage across the codebase.
The
to_path_buf()
conversion correctly aligns with the updatedMmapPayloadStorage::open_or_create
signature. This change maintains functionality while conforming to the new ownership model.Let me verify that all calls to
MmapPayloadStorage::open_or_create
have been consistently updated across the codebase:
🏁 Script executed:
#!/bin/bash # Description: Search for any remaining calls to open_or_create that might need updating # Expected: All calls should pass PathBuf (owned) rather than &Path (borrowed) # Search for open_or_create calls echo "=== Searching for open_or_create method calls ===" rg -A 2 -B 2 "open_or_create" --type rust echo "=== Searching for MmapPayloadStorage usage ===" rg -A 3 -B 1 "MmapPayloadStorage::" --type rustLength of output: 13383
All
MmapPayloadStorage::open_or_create
calls now usePathBuf
A code‐wide search confirms no remaining invocations pass a&Path
. This aligns with the updated API and preserves functionality—merging as is.lib/segment/src/payload_storage/mmap_payload_storage.rs (3)
1-1
: LGTM - Good import cleanupRemoving the unused
Path
import aligns with the method signature change below and improves code cleanliness.
35-35
: LGTM - Improved ownership semanticsThe change from
&Path
toPathBuf
provides clearer ownership semantics and aligns with the broader codebase pattern mentioned in the AI summary.
226-226
: LGTM - Cleaner closure implementationRemoving the explicit
Ok(())
is a good stylistic improvement since themap_err
operation already returns the appropriateResult
type.lib/gridstore/src/gridstore.rs (2)
154-179
: LGTM - Improved parameter naming for clarityRenaming
path
tobase_path
provides better clarity about the parameter's purpose and aligns with its usage as the base directory for storage files. All internal references have been updated consistently.
1068-1068
: LGTM - Removed unnecessary cloningSimplifying the call to avoid
path.clone()
is a good optimization since the path is not used after this point in the test.lib/common/common/src/flags.rs (3)
22-24
: LGTM - Well-structured feature flag additionThe new feature flag follows the established pattern and includes clear documentation. The TODO comment indicating future default enablement shows good rollout planning.
44-44
: LGTM - Conservative default for new featureSetting the default to
false
is appropriate for a new feature, allowing for controlled rollout as indicated by the TODO comment.
65-65
: LGTM - Complete feature flag integrationThe new flag is properly integrated into the initialization logic, including both the destructuring pattern and the "enable all" behavior, maintaining consistency with other feature flags.
Also applies to: 74-74
lib/segment/src/index/struct_payload_index.rs (2)
20-22
: LGTM - Clean import additionThe addition of
IndexSelectorGridstore
import is well-organized and follows the existing pattern for selector imports.
431-443
:✅ Verification successful
LGTM - Well-structured conditional backend selection
The conditional logic properly uses the feature flag to select between Gridstore and RocksDB backends for appendable storage. The implementation ensures backward compatibility with appropriate fallback behavior.
Let me verify that the feature flag is properly accessible and that this conditional logic aligns with the broader Gridstore integration:
🏁 Script executed:
#!/bin/bash # Description: Verify feature flag usage and Gridstore integration patterns # Expected: Find consistent usage of the feature flag and proper IndexSelector variants # Check feature flag usage across the codebase echo "=== Feature flag usage ===" rg "payload_index_skip_mutable_rocksdb" -A 2 -B 2 # Check IndexSelectorGridstore usage and implementation echo "=== IndexSelectorGridstore usage ===" rg "IndexSelectorGridstore" -A 5 -B 2 # Verify IndexSelector enum variants echo "=== IndexSelector enum definition ===" ast-grep --pattern 'enum IndexSelector { $$$ }'Length of output: 12432
🏁 Script executed:
#!/bin/bash # Verify that the `IndexSelector` enum includes the `Gridstore` variant echo "=== Enum IndexSelector Definition ===" rg "enum IndexSelector" -n lib/segment/src/index/field_index/index_selector.rs -A 20Length of output: 791
LGTM – Feature flag usage and Gridstore integration verified
The
payload_index_skip_mutable_rocksdb
flag is properly declared and defaulted inlib/common/common/src/flags.rs
, and is consistently applied in the conditional at lines 431–443 oflib/segment/src/index/struct_payload_index.rs
. TheIndexSelector
enum inlib/segment/src/index/field_index/index_selector.rs
includes theGridstore
variant, and existing Gridstore fallback patterns confirm correct integration.• Flag declaration: lib/common/common/src/flags.rs
• Conditional selector logic: lib/segment/src/index/struct_payload_index.rs (lines 431–443)
• Enum variant inclusion: lib/segment/src/index/field_index/index_selector.rs (lines 32–39)No further changes required.
lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (2)
155-157
: LGTM! Trait bounds correctly extended for Gridstore support.The addition of
Blob + Send + Sync
trait bounds is appropriate for Gridstore integration, ensuring serialization capability and thread safety.
158-161
: Good documentation fix.Corrected the documentation from "immutable map index" to "immutable numeric index".
lib/segment/src/index/field_index/field_index_base.rs (3)
16-19
: LGTM! Import correctly updated for Gridstore builder.The addition of
NumericIndexGridstoreBuilder
to the imports aligns with the new Gridstore-backed numeric index builder.
511-514
: LGTM! New builder variants properly added.The three new
FieldIndexBuilder
variants (IntGridstoreIndex
,DatetimeGridstoreIndex
,FloatGridstoreIndex
) correctly mirror the existing builder structure for Gridstore support.Also applies to: 521-522
540-550
: LGTM! Trait implementation correctly handles new variants.All new Gridstore builder variants are properly integrated into the
FieldIndexBuilderTrait
implementation with correct delegation to underlying builders ininit()
,add_point()
, andfinalize()
methods.Also applies to: 572-582, 601-611
lib/segment/src/index/field_index/index_selector.rs (3)
33-39
: LGTM! Well-documented enum variants.The addition of the
Gridstore
variant with clear documentation for each storage type improves code clarity.
53-61
: LGTM! Struct properly designed for migration phase.The
IndexSelectorGridstore
struct correctly includes a temporarydb
field for indices that don't yet support Gridstore, with a clear TODO comment indicating this is temporary.
213-228
: LGTM! Numeric index correctly implements Gridstore support.The
numeric_new
method properly handles all three storage backends, with Gridstore creating indices using the newnew_gridstore
method. The trait bounds are correctly updated to includeBlob + Send + Sync
.lib/segment/src/index/field_index/numeric_index/tests.rs (2)
66-71
: LGTM! Builder instantiation is correct.The Gridstore builder is properly instantiated with a temporary directory path, which is appropriate for tests.
550-561
: Good trait bounds update for Gridstore support.The additional trait bounds (
Blob + Send + Sync
) are necessary for Gridstore compatibility.lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (1)
406-408
: Empty value handling for Gridstore is correct.The decision to delete entries with empty values in Gridstore (rather than storing empty vectors) is a good design choice that saves storage space.
lib/segment/src/index/field_index/numeric_index/mod.rs (2)
163-163
: Trait bounds are appropriately extended for Gridstore.The addition of
Blob + Send + Sync
trait bounds is necessary for Gridstore's requirements and properly propagated throughout the type hierarchy.Also applies to: 169-169, 474-474, 483-483
765-773
: Good error handling in finalize method.The error messages are descriptive and the method properly validates the index state before finalizing.
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
Outdated
Show resolved
Hide resolved
b6adba7
to
7bfa4c2
Compare
2317280
to
da5d9a7
Compare
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
Show resolved
Hide resolved
@@ -35,6 +41,7 @@ impl Default for FeatureFlags { | |||
FeatureFlags { | |||
all: false, | |||
payload_index_skip_rocksdb: true, | |||
payload_index_skip_mutable_rocksdb: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
payload_index_skip_mutable_rocksdb: false,
Have you tried setting this to true
and run all tests on our CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC it is true in tests already, but I can validate that to be sure.
Either way, I did already run all tests locally with the flag explicitly enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC it is true in tests already,
Do you mean that all unit tests run with all the features enabled?
What about integration tests?
I believe you told me once that the CI benchmark uses all feature flag, right?
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
Show resolved
Hide resolved
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs
Outdated
Show resolved
Hide resolved
Co-authored-by: Luis Cossío <luis.cossio@qdrant.com>
067cbf5
to
c7d0b2b
Compare
* Extend Gridstore blob trait, support basic primitives and vector of them * Implement mutable numeric index on top of Gridstore * Wrap Gridstore in rwlock, flush asynchronously * Set custom Gridstore options tuned to numeric index * Fix compilation errors * Destruct RocksDB selector everywhere * Add Gridstore payload index selector, build numeric, fallback others * Add feature flag to Gridstore mutable payload index * Don't store empty values in gridstore, it's not supported * Reformat * Initialize Gridstore during init of index builder * Use owned and referenced paths correctly around Gridstore * Change Gridstore page size in numeric index as suggested * Make Gridstore block size dependent on size of numeric type * Add tests * Add separate fixed size blob trait for vec, prevent footgun * Improve page size calculation Co-authored-by: Luis Cossío <luis.cossio@qdrant.com> * Simplify Blob requirement for numeric payload index * Add missing test cases * Replace ref mut with &mut * Use correct hardware counter * Implement clear cache for numeric payload index on Gridstore * The block size is dynamic * Error if trying to load index from storage that doesn't match backend * fix counter * Add comment on why we change page size --------- Co-authored-by: Luis Cossío <luis.cossio@qdrant.com> Co-authored-by: Andrey Vasnetsov <andrey@vasnetsov.com>
Depends on #6614
As part of our bigger effort to remove RocksDB, we need a replacement for our mutable payload indices. All of them currently still rely on RocksDB.
This PR introduces a mutable numeric payload index on top of Gridstore. If benchmark results are satisfactory, the other payload index types will follow in separate PRs.
It adds the
payload_index_skip_mutable_rocksdb
feature flag skip using RocksDB for mutable payload indices and start using Gridstore. The flag is disabled by default for now. Note that this PR only implements the numeric index, and all the other index types will still fallback to RocksDB.Tasks
dev
Explicitly drop Gridstore caches after load?All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
cargo +nightly fmt --all
command prior to submission?cargo clippy --all --all-features
command?Changes to Core Features: