Skip to content

Conversation

timvisee
Copy link
Member

@timvisee timvisee commented May 6, 2025

Depends on #6444.

Add in-memory payload index for full text using mmap as storage.

Loading the in-memory full text payload index from mmap is 33x faster than from RocksDB, as shown in this simple benchmark:

  • bfb --text-payloads --text-payload-length 40 -n 500000 -d1 --indexing-threshold 100:

    • Old RocksDB based index: 2.4s, 2.5s, 2.5s, 2.6s = avg 2.5s
    • New mmap based index: 77.2ms, 75.1ms, 77.2ms, 74.7ms = avg 0.076s (33x faster!)
  • bfb --text-payloads --text-payload-length 100 --text-payload-vocabulary 50000 -n 500000 -d1 --indexing-threshold 100:

    • Old RocksDB based index: 6.6s, 6.6s, 6.7s, 6.6s = avg 6.6s
    • New mmap based index: 202.8ms, 200.6ms, 211.2ms, 201.8ms = avg 0.2s (33x faster!)

Tasks

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Copilot

This comment was marked as resolved.

@coszio
Copy link
Contributor

coszio commented May 6, 2025

33x faster!

Gainz 💪

@timvisee timvisee force-pushed the payload-index-ram-on-mmap branch from 9aead56 to c365ee9 Compare May 7, 2025 09:08
@timvisee timvisee force-pushed the payload-index-ram-on-mmap-full-text branch 2 times, most recently from 67c5c27 to f7ed774 Compare May 7, 2025 15:34
Base automatically changed from payload-index-ram-on-mmap to dev May 8, 2025 10:34
@timvisee timvisee force-pushed the payload-index-ram-on-mmap-full-text branch from f7ed774 to 08244e9 Compare May 8, 2025 10:35
@timvisee timvisee marked this pull request as ready for review May 8, 2025 10:36
@timvisee timvisee requested a review from JojiiOfficial May 8, 2025 10:36

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@timvisee timvisee merged commit 01e61ae into dev May 12, 2025
17 checks passed
@timvisee timvisee deleted the payload-index-ram-on-mmap-full-text branch May 12, 2025 08:58
generall added a commit that referenced this pull request May 22, 2025
* Add storage enum to full text index

* Add compressed mmap posting list iterator

* Don't check second condition and return early if possible

* Use helper function for creating empty iterator

* Construct immutable inverted index from mmap inverted index

* Construct immutable text index from mmap text index

* Load immutable text index from mmap storage if on-disk is false

* Match implementation for mutable to immutable inverted index, drop empty

* Point tokens should be none if empty

* Fix chunk reader iterator not buffering remaining postings correctly

* Add immutable mmap full text index to congruence test

* Debug assert deleted points have no tokens

* Add another test

* Fix point deletion regression in immutable full text index

* Use consistent function naming

* Restructure full text index opening and loading, make it consistent

* When immutable full text index is loaded into memory, clear mmap cache

* Remove init from immutable text index

* Prevent potential panic

* avoid panic

* Fix custom iterator size reporting, add test for it

---------

Co-authored-by: generall <andrey@vasnetsov.com>
@coderabbitai coderabbitai bot mentioned this pull request Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants