Skip to content

Conversation

coszio
Copy link
Contributor

@coszio coszio commented Jun 13, 2025

The mutable inverted index currently uses a Vec as a container for ids, this means that, while it is expensive to keep adding random values to it, it is still good to binary-search over it.

From the history, we can see that it was switched from BTreeSet to Vec in the past. However, indexing with Vec means that we have to shift elements all over the place every time we insert a new id which is lower than the max in the list. This creates a bottleneck directly on ingestion, not during optimization, which "breaks" the mental model of ingestion being separated from optimization because it's supposed to be quick.

Since changing the container is a small (but big impact) change, I've ran the following experiment:

// For upsert
bfb -n 5000000 --text-payloads --text-payload-length 100 --vectors-per-point 0 --max-id 2000000

// For queries
bfb -n 100000 --text-payloads --text-payload-length 100 --vectors-per-point 0 --skip-create --skip-upload --scroll

These were the results:

Container Final upsert RPS Median Scroll RPS Memory usage
Vec < 2,300 4373 1725M
BTreeSet ~10,996 2399 2741M
RoaringBitmap ~15,115 5201 1322M

@coszio coszio requested review from timvisee and generall June 13, 2025 15:44

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Are there more places we can potentially benefit from this?

Copy link
Contributor

@ffuugoo ffuugoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a little bit to understand why/how bitset would work here. Nice gainz! 💪

@coszio coszio merged commit 3051a4f into dev Jun 16, 2025
18 checks passed
@coszio coszio deleted the faster-mutable-posting-list branch June 16, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants