Skip to content

Conversation

timvisee
Copy link
Member

@timvisee timvisee commented Apr 25, 2025

Add a new payload index variant that functions as RAM wrapper around mmap based payload indices. The RAM wrapper functions as immutable payload storage having everything in-memory, but uses the mmap payload index as backing storage.

Pulling everything into RAM is more efficient as there is less indirection. Sharing the mmap storage for this allows to quickly switch between an in-memory and on-disk index type.

In terms of implementation, the new index types are cloned from the immutable RocksDB variant. It's internals have been adjusted to switch from RocksDB backing storage to mmaps.

Index performance during search was measured before. This repurposes the immutable index, and so performance will be the same.

Load time on startup using the new index does change. Here's a basic benchmark measuring load time:

  • bfb --int-payloads 100 -n 5000000 -d1 --indexing-threshold 100:
    • Old RocksDB based index:
      • num: 3.13s, 3.19s, 3.25s, 3.20s = avg 3.1925s
      • map: 2.22s, 2.27s, 2.35s, 2.26s = avg 2.275s
    • New mmap based index:
      • num: 2.78s, 2.78s, 2.94s, 2.94s = avg 2.86s (10% faster)
      • map: 1.09s, 1.11s, 1.21s, 1.17s = avg 1.145s (50% faster)

Tasks

  • Implement RAM wrapper for other payload index types (will be done in separate PR)
  • Remove inefficient intermediate structures
  • Properly load/implement new structures
  • Benchmark

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@timvisee timvisee force-pushed the payload-index-ram-on-mmap branch from 42de216 to 585d2dd Compare April 25, 2025 16:49
@timvisee timvisee changed the title Add RAM wrappers for mmap payload indices Add in-memory payload indices on mmap storage Apr 29, 2025
@timvisee timvisee marked this pull request as ready for review April 30, 2025 09:33
@timvisee timvisee force-pushed the payload-index-ram-on-mmap branch from a80f044 to e37a6f8 Compare April 30, 2025 09:34

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@timvisee timvisee changed the title Add in-memory payload indices on mmap storage Add in-memory payload indices on mmap storage (numeric, map) Apr 30, 2025
@timvisee timvisee requested a review from coszio May 6, 2025 16:00
coderabbitai[bot]

This comment was marked as resolved.

Comment on lines 16 to +19
Mutable,
Immutable,
Mmap,
RamMmap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is only for tests, but now we have multiple storages which can be used under the same settings.

We should document this somewhere in the code

mutable on_disk legacy storage new storage
Mutable Mutable
Mutable Mutable
Mmap Mmap
Immutable RamMmap

@timvisee timvisee force-pushed the payload-index-ram-on-mmap branch from 9aead56 to c365ee9 Compare May 7, 2025 09:08
@timvisee
Copy link
Member Author

timvisee commented May 7, 2025

In c365ee9 I made the interface more consistent. It now has better method names, and also loads mmap storage in a separate stage.

There is more to be renamed on a higher level, but I vote to do that in a separate PR to not increase the blast radius of this.

cc @coszio

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@timvisee timvisee merged commit 424de4b into dev May 8, 2025
17 checks passed
@timvisee timvisee deleted the payload-index-ram-on-mmap branch May 8, 2025 10:34
generall pushed a commit that referenced this pull request May 22, 2025
* Add RAM wrapper to mmap numeric index

* Add RAM wrapper to mmap map index

* Fix test compilation

* Merge RAM wrappers into existing immutable indices

Do this to minimize code duplication since the majority of the code is
exactly the same.

* Minor tweaks

* Remove unnecessary pub visibility, remove unused assertion

* Test immutable numeric index on mmap

* In numeric index test, remove some points

* Add map index test

* Restructure index opening and loading, make it consistent

* don't pre-collect intermediate mapping

* review nits

* Make init unreachable for immutable and mmap, clear must consume index

---------

Co-authored-by: Luis Cossío <luis.cossio@outlook.com>
@coderabbitai coderabbitai bot mentioned this pull request Jun 6, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants