Skip to content

Rewrite needle_map.CompactMap() for more efficient memory usage #6813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 23, 2025

Conversation

proton-lisandro-pin
Copy link
Contributor

@proton-lisandro-pin proton-lisandro-pin commented May 23, 2025

What problem are we solving?

#6804

How are we solving the problem?

CompactMap() relies on fixed-size storage buckets of 10,000 elements, and a separate storage for overflow (i.e. out of order) needle IDs. The problem is that most of that fixed storage bucket space ends up allocated but never used; this is particularly true for out-of-order needle IDs, but happens also with perfectly ordered needle writes.

This MR reworks CompactMap() so:

  • Underlying data structures are simplified.
  • Needle buckets are now variable-size slices instead of fixed-size arrays.
  • Data handling is performed via optimized built-in functions such as copy() instead of ad-hoc loops.
  • if and for sections are unrolled, whenever possible.

The result is a dramatic improvement in memory usage, at the cost of slightly increased memory fragmentation - whose impact will be entirely platform-dependent.

#6804 has more details, but this MR improves memory usage of weed processes by ~95% in best-case scenarios, and ~99% in worst-case scenarios, with no measurable performance impact.

How is the PR tested?

Every single existing unit and integration test should pass without issues; the existing API and behavior for CompactMap() is unchanged.

Checks

  • I have added unit tests if possible.
  • I will add related wiki document changes and link to this PR after merging.

Has no measurable performance impact, but greatly increases memory
efficiency when out-of-order needle IDs are written.

TBD what's the impact on memory fragmentation, though it will
be very platform dependent.
@chrislusf chrislusf merged commit 2e1506c into seaweedfs:master May 23, 2025
7 of 8 checks passed
@proton-lisandro-pin proton-lisandro-pin deleted the needle_map_fix branch May 23, 2025 15:48
@chrislusf chrislusf mentioned this pull request Jun 3, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants