Skip to content

Conversation

proton-lisandro-pin
Copy link
Contributor

@proton-lisandro-pin proton-lisandro-pin commented Jun 5, 2025

What problem are we solving?

#6804

How are we solving the problem?

This PR introduces a new CompactMap() implementation for in-memory map of needle indices, optimized for memory usage.

The idea behind it is replacing the bucket+overflow approach with a map of sorted indices segments, which are in turn accessed through binary search. This guarantees a best-case scenario (ordered inserts/updates) of O(1) and a worst case scenario of O(log n) runtime; notably, memory usage is unaffected by index ordering.

Note that even at O(log n), the clock time for both reads and writes is very low, with weed benchmark actually running slightly faster than the existing implementation.

The end result is 20-30% improvement in memory usage, compared with v3.89. See #6804 for more details and benchmark results.

How is the PR tested?

Every single existing unit and integration test should pass without issues; the existing API and behavior for CompactMap() is unchanged.

Checks

  • I have added unit tests if possible.
  • I will add related wiki document changes and link to this PR after merging.

This slightly complicates the code, but makes a **massive** difference
in memory efficiency - preliminary results show a ~30% reduction in
heap usage, with no measurable performance impact otherwise.
@chrislusf
Copy link
Collaborator

Could you keep the old compact map implementation and tests in separate files, just for reference and easier comparison.

@proton-lisandro-pin
Copy link
Contributor Author

Could you keep the old compact map implementation and tests in separate files, just for reference and easier comparison.

As in, compact_map_old.go and compact_map_old_test.go?

@chrislusf
Copy link
Collaborator

As in, compact_map_old.go and compact_map_old_test.go?

That would work. Or name it as CompactMapWithOverflow, since old will change over time.

Also, the test can have a benchmark to compare the two implementations.

@proton-lisandro-pin
Copy link
Contributor Author

As in, compact_map_old.go and compact_map_old_test.go?

That would work. Or name it as CompactMapWithOverflow, since old will change over time.

Done!

Also, the test can have a benchmark to compare the two implementations.

I've uploaded benchmark results on #6804 (comment).

@chrislusf
Copy link
Collaborator

Also, the test can have a benchmark to compare the two implementations.

I've uploaded benchmark results on #6804 (comment).

I was thinking about https://pkg.go.dev/testing#hdr-Benchmarks

@chrislusf
Copy link
Collaborator

As in, compact_map_old.go and compact_map_old_test.go?

That would work. Or name it as CompactMapWithOverflow, since old will change over time.

Done!

I prefer no go build tags for this. You can put it in a sub directory if you don't want to deal with naming conflicts.

@proton-lisandro-pin
Copy link
Contributor Author

I prefer no go build tags for this. You can put it in a sub directory if you don't want to deal with naming conflicts.

Done. Note that i still had to tweak a couple lines, as compact_map.go shares a lot of code with the rest of weed/storage/needle_map 😞

@chrislusf chrislusf merged commit bed0a64 into seaweedfs:master Jun 5, 2025
7 of 13 checks passed
@proton-lisandro-pin proton-lisandro-pin deleted the new_needle_map branch June 6, 2025 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants