-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add in-memory payload indices on mmap storage (numeric, map) #6444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42de216
to
585d2dd
Compare
a80f044
to
e37a6f8
Compare
This comment was marked as resolved.
This comment was marked as resolved.
16 tasks
coszio
reviewed
May 6, 2025
lib/segment/src/index/field_index/map_index/immutable_map_index.rs
Outdated
Show resolved
Hide resolved
lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs
Outdated
Show resolved
Hide resolved
Comment on lines
16
to
+19
Mutable, | ||
Immutable, | ||
Mmap, | ||
RamMmap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is only for tests, but now we have multiple storages which can be used under the same settings.
We should document this somewhere in the code
mutable | on_disk | legacy storage | new storage |
---|---|---|---|
✅ | ✅ | Mutable | Mutable |
✅ | ❌ | Mutable | Mutable |
❌ | ✅ | Mmap | Mmap |
❌ | ❌ | Immutable | RamMmap |
Do this to minimize code duplication since the majority of the code is exactly the same.
9aead56
to
c365ee9
Compare
coszio
approved these changes
May 7, 2025
generall
approved these changes
May 8, 2025
generall
pushed a commit
that referenced
this pull request
May 22, 2025
* Add RAM wrapper to mmap numeric index * Add RAM wrapper to mmap map index * Fix test compilation * Merge RAM wrappers into existing immutable indices Do this to minimize code duplication since the majority of the code is exactly the same. * Minor tweaks * Remove unnecessary pub visibility, remove unused assertion * Test immutable numeric index on mmap * In numeric index test, remove some points * Add map index test * Restructure index opening and loading, make it consistent * don't pre-collect intermediate mapping * review nits * Make init unreachable for immutable and mmap, clear must consume index --------- Co-authored-by: Luis Cossío <luis.cossio@outlook.com>
6 tasks
9 tasks
9 tasks
This was referenced Jul 22, 2025
Merged
7 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add a new payload index variant that functions as RAM wrapper around mmap based payload indices. The RAM wrapper functions as immutable payload storage having everything in-memory, but uses the mmap payload index as backing storage.
Pulling everything into RAM is more efficient as there is less indirection. Sharing the mmap storage for this allows to quickly switch between an in-memory and on-disk index type.
In terms of implementation, the new index types are cloned from the immutable RocksDB variant. It's internals have been adjusted to switch from RocksDB backing storage to mmaps.
Index performance during search was measured before. This repurposes the immutable index, and so performance will be the same.
Load time on startup using the new index does change. Here's a basic benchmark measuring load time:
bfb --int-payloads 100 -n 5000000 -d1 --indexing-threshold 100
:Tasks
Implement RAM wrapper for other payload index types(will be done in separate PR)All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
cargo +nightly fmt --all
command prior to submission?cargo clippy --all --all-features
command?Changes to Core Features: