-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add in-memory payload index on mmap storage (full text) #6495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Gainz 💪 |
9aead56
to
c365ee9
Compare
67c5c27
to
f7ed774
Compare
15 tasks
f7ed774
to
08244e9
Compare
This comment was marked as resolved.
This comment was marked as resolved.
generall
approved these changes
May 9, 2025
6 tasks
generall
added a commit
that referenced
this pull request
May 22, 2025
* Add storage enum to full text index * Add compressed mmap posting list iterator * Don't check second condition and return early if possible * Use helper function for creating empty iterator * Construct immutable inverted index from mmap inverted index * Construct immutable text index from mmap text index * Load immutable text index from mmap storage if on-disk is false * Match implementation for mutable to immutable inverted index, drop empty * Point tokens should be none if empty * Fix chunk reader iterator not buffering remaining postings correctly * Add immutable mmap full text index to congruence test * Debug assert deleted points have no tokens * Add another test * Fix point deletion regression in immutable full text index * Use consistent function naming * Restructure full text index opening and loading, make it consistent * When immutable full text index is loaded into memory, clear mmap cache * Remove init from immutable text index * Prevent potential panic * avoid panic * Fix custom iterator size reporting, add test for it --------- Co-authored-by: generall <andrey@vasnetsov.com>
6 tasks
1 task
This was referenced Jun 11, 2025
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Depends on #6444.
Add in-memory payload index for full text using mmap as storage.
Loading the in-memory full text payload index from mmap is 33x faster than from RocksDB, as shown in this simple benchmark:
bfb --text-payloads --text-payload-length 40 -n 500000 -d1 --indexing-threshold 100
:bfb --text-payloads --text-payload-length 100 --text-payload-vocabulary 50000 -n 500000 -d1 --indexing-threshold 100
:Tasks
Switch(will be done separately)point_to_tokens_count
key from option to numberdev
All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
cargo +nightly fmt --all
command prior to submission?cargo clippy --all --all-features
command?Changes to Core Features: