Skip to content

Conversation

densh
Copy link
Member

@densh densh commented Dec 5, 2021

This change replaces naive implementation of the zone allocator we had previously with one that performs bump allocation for small allocations (<2K).

Allocation throughput after this change:

2 bytes : 21115 ns / 1000 allocations
4 bytes : 20467 ns / 1000 allocations
8 bytes : 20535 ns / 1000 allocations
16 bytes : 20005 ns / 1000 allocations
32 bytes : 19696 ns / 1000 allocations
64 bytes : 19915 ns / 1000 allocations
128 bytes : 20309 ns / 1000 allocations
256 bytes : 27632 ns / 1000 allocations
512 bytes : 41469 ns / 1000 allocations
1024 bytes : 55459 ns / 1000 allocations
2048 bytes : 83120 ns / 1000 allocations
4096 bytes : 773972 ns / 1000 allocations
8192 bytes : 1755135 ns / 1000 allocations
16384 bytes : 4041602 ns / 1000 allocations

Allocation throughput before this change:

2 bytes : 81216 ns / 1000 allocations
4 bytes : 81147 ns / 1000 allocations
8 bytes : 80590 ns / 1000 allocations
16 bytes : 82284 ns / 1000 allocations
32 bytes : 85628 ns / 1000 allocations
64 bytes : 80658 ns / 1000 allocations
128 bytes : 90329 ns / 1000 allocations
256 bytes : 91666 ns / 1000 allocations
512 bytes : 93411 ns / 1000 allocations
1024 bytes : 122811 ns / 1000 allocations
2048 bytes : 419922 ns / 1000 allocations
4096 bytes : 1087628 ns / 1000 allocations
8192 bytes : 2239114 ns / 1000 allocations
16384 bytes : 4660193 ns / 1000 allocations

Additionally, this change dramatically reduces memory overhead of small zone allocations. Previously each allocation had a corresponding malloc invocation, that was persisted in a linked list of boxed pointers. Now, the linked list only stores large contiguous memory areas and zone allocator bump allocates within them, without any additional book keeping per allocation.

@densh
Copy link
Member Author

densh commented Dec 5, 2021

The benchmark that I used for throughput estimates:

object Test {
  def run(count: Int): Long = {
    val n = 1000
    val k = 1000
    val results = new Array[Long](n)
    var i = 0
    while (i < n) {
      val start = nanoTime()
      Zone { implicit z =>
        var j = 0
        while (j < k) {
          alloc[Byte](count)
          j += 1
        }
      }
      val end = nanoTime()
      results(i) = end - start
      i += 1
    }
    val sorted = results.sorted
    sorted((n * 0.5).toInt)
  }
  def main(args: Array[String]): Unit = {
    for (i <- 1 to 14) {
      val size = scala.math.pow(2, i).toInt

      // warmup runs
      run(size)
      run(size)
      run(size)
      run(size)
      run(size)

      val time = run(size)

      println(s"${size} bytes : ${time} ns / 1000 allocations")
    }
  }
}

This change replaces naive implementation of the zone allocator
we had previously with one that performs bump allocation for
small allocations (<2K).

Allocation throughput after this change:

```
2 bytes : 21115 ns / 1000 allocations
4 bytes : 20467 ns / 1000 allocations
8 bytes : 20535 ns / 1000 allocations
16 bytes : 20005 ns / 1000 allocations
32 bytes : 19696 ns / 1000 allocations
64 bytes : 19915 ns / 1000 allocations
128 bytes : 20309 ns / 1000 allocations
256 bytes : 27632 ns / 1000 allocations
512 bytes : 41469 ns / 1000 allocations
1024 bytes : 55459 ns / 1000 allocations
2048 bytes : 83120 ns / 1000 allocations
4096 bytes : 773972 ns / 1000 allocations
8192 bytes : 1755135 ns / 1000 allocations
16384 bytes : 4041602 ns / 1000 allocations
```

Allocation throughput before this change:

```
2 bytes : 81216 ns / 1000 allocations
4 bytes : 81147 ns / 1000 allocations
8 bytes : 80590 ns / 1000 allocations
16 bytes : 82284 ns / 1000 allocations
32 bytes : 85628 ns / 1000 allocations
64 bytes : 80658 ns / 1000 allocations
128 bytes : 90329 ns / 1000 allocations
256 bytes : 91666 ns / 1000 allocations
512 bytes : 93411 ns / 1000 allocations
1024 bytes : 122811 ns / 1000 allocations
2048 bytes : 419922 ns / 1000 allocations
4096 bytes : 1087628 ns / 1000 allocations
8192 bytes : 2239114 ns / 1000 allocations
16384 bytes : 4660193 ns / 1000 allocations
```
@WojciechMazur WojciechMazur linked an issue Dec 6, 2021 that may be closed by this pull request
@WojciechMazur WojciechMazur merged commit f3a367c into scala-native:master Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize zone allocation for speed
2 participants