Fix #735: improve throughput of zone allocation #2443

densh · 2021-12-05T16:23:31Z

This change replaces naive implementation of the zone allocator we had previously with one that performs bump allocation for small allocations (<2K).

Allocation throughput after this change:

2 bytes : 21115 ns / 1000 allocations
4 bytes : 20467 ns / 1000 allocations
8 bytes : 20535 ns / 1000 allocations
16 bytes : 20005 ns / 1000 allocations
32 bytes : 19696 ns / 1000 allocations
64 bytes : 19915 ns / 1000 allocations
128 bytes : 20309 ns / 1000 allocations
256 bytes : 27632 ns / 1000 allocations
512 bytes : 41469 ns / 1000 allocations
1024 bytes : 55459 ns / 1000 allocations
2048 bytes : 83120 ns / 1000 allocations
4096 bytes : 773972 ns / 1000 allocations
8192 bytes : 1755135 ns / 1000 allocations
16384 bytes : 4041602 ns / 1000 allocations

Allocation throughput before this change:

2 bytes : 81216 ns / 1000 allocations
4 bytes : 81147 ns / 1000 allocations
8 bytes : 80590 ns / 1000 allocations
16 bytes : 82284 ns / 1000 allocations
32 bytes : 85628 ns / 1000 allocations
64 bytes : 80658 ns / 1000 allocations
128 bytes : 90329 ns / 1000 allocations
256 bytes : 91666 ns / 1000 allocations
512 bytes : 93411 ns / 1000 allocations
1024 bytes : 122811 ns / 1000 allocations
2048 bytes : 419922 ns / 1000 allocations
4096 bytes : 1087628 ns / 1000 allocations
8192 bytes : 2239114 ns / 1000 allocations
16384 bytes : 4660193 ns / 1000 allocations

Additionally, this change dramatically reduces memory overhead of small zone allocations. Previously each allocation had a corresponding malloc invocation, that was persisted in a linked list of boxed pointers. Now, the linked list only stores large contiguous memory areas and zone allocator bump allocates within them, without any additional book keeping per allocation.

densh · 2021-12-05T16:24:39Z

The benchmark that I used for throughput estimates:

object Test {
  def run(count: Int): Long = {
    val n = 1000
    val k = 1000
    val results = new Array[Long](n)
    var i = 0
    while (i < n) {
      val start = nanoTime()
      Zone { implicit z =>
        var j = 0
        while (j < k) {
          alloc[Byte](count)
          j += 1
        }
      }
      val end = nanoTime()
      results(i) = end - start
      i += 1
    }
    val sorted = results.sorted
    sorted((n * 0.5).toInt)
  }
  def main(args: Array[String]): Unit = {
    for (i <- 1 to 14) {
      val size = scala.math.pow(2, i).toInt

      // warmup runs
      run(size)
      run(size)
      run(size)
      run(size)
      run(size)

      val time = run(size)

      println(s"${size} bytes : ${time} ns / 1000 allocations")
    }
  }
}

This change replaces naive implementation of the zone allocator we had previously with one that performs bump allocation for small allocations (<2K). Allocation throughput after this change: ``` 2 bytes : 21115 ns / 1000 allocations 4 bytes : 20467 ns / 1000 allocations 8 bytes : 20535 ns / 1000 allocations 16 bytes : 20005 ns / 1000 allocations 32 bytes : 19696 ns / 1000 allocations 64 bytes : 19915 ns / 1000 allocations 128 bytes : 20309 ns / 1000 allocations 256 bytes : 27632 ns / 1000 allocations 512 bytes : 41469 ns / 1000 allocations 1024 bytes : 55459 ns / 1000 allocations 2048 bytes : 83120 ns / 1000 allocations 4096 bytes : 773972 ns / 1000 allocations 8192 bytes : 1755135 ns / 1000 allocations 16384 bytes : 4041602 ns / 1000 allocations ``` Allocation throughput before this change: ``` 2 bytes : 81216 ns / 1000 allocations 4 bytes : 81147 ns / 1000 allocations 8 bytes : 80590 ns / 1000 allocations 16 bytes : 82284 ns / 1000 allocations 32 bytes : 85628 ns / 1000 allocations 64 bytes : 80658 ns / 1000 allocations 128 bytes : 90329 ns / 1000 allocations 256 bytes : 91666 ns / 1000 allocations 512 bytes : 93411 ns / 1000 allocations 1024 bytes : 122811 ns / 1000 allocations 2048 bytes : 419922 ns / 1000 allocations 4096 bytes : 1087628 ns / 1000 allocations 8192 bytes : 2239114 ns / 1000 allocations 16384 bytes : 4660193 ns / 1000 allocations ```

densh force-pushed the topic/fast-zones branch from c8b9ded to 4a743fe Compare December 5, 2021 16:38

Add removed internal ZoneImpl to mima filters

188791b

WojciechMazur approved these changes Dec 6, 2021

View reviewed changes

WojciechMazur linked an issue Dec 6, 2021 that may be closed by this pull request

Optimize zone allocation for speed #735

Closed

WojciechMazur merged commit f3a367c into scala-native:master Dec 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #735: improve throughput of zone allocation #2443

Fix #735: improve throughput of zone allocation #2443

Uh oh!

densh commented Dec 5, 2021 •

edited

Loading

Uh oh!

densh commented Dec 5, 2021 •

edited

Loading

Uh oh!

Uh oh!

Fix #735: improve throughput of zone allocation #2443

Fix #735: improve throughput of zone allocation #2443

Uh oh!

Conversation

densh commented Dec 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

densh commented Dec 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

densh commented Dec 5, 2021 •

edited

Loading

densh commented Dec 5, 2021 •

edited

Loading