Improve performance of Analysis deserialization.

We have been looking at ways to improve the performance of analysis deserialization. In large builds this can be a major factor in the responsiveness of the build when Zinc determines that little or no actual compilation is needed.

A quick way to profile this is to checkout and compiler scala/scala, the run the following:

```
sbt> consoleProject
...
scala> def timed[T](f: => T) = { val start = System.nanoTime; try f finally { println( ((System.nanoTime - start) / 1000 / 1000) + " ms") } }
timed: [T](f: => T)T

scala> for (i <- 1 to 1000) { println(timed(sbt.internal.inc.FileAnalysisStore.binary(file("/Users/jz/code/scala/target/compiler/zinc/inc_compile.zip")).get)) }
400 ms
Optional[CompileResult(analysis: Analysis: 1607 Scala sources, 7 Java sources, 2414 classes, 1073 external source dependencies, 1 binary dependency, setup: MiniSetup(output: SingleOutput(/tmp/dummy), options: MiniOptions(classpathHash: [Lxsbti.compile.FileHash;@7d93d001, scalacOptions: [Ljava.lang.String;@1417548d, javacOptions: [Ljava.lang.String;@797cdb4a), compilerVersion: 2.13.6, order: JavaThenScala, storeApis: true, extra: [Lxsbti.T2;@2db4ba0), hasModified: false)]
468 ms
Optional[CompileResult(analysis: Analysis: 1607 Scala sources, 7 Java sources, 2414 classes, 1073 external source dependencies, 1 binary dependency, setup: MiniSetup(output: SingleOutput(/tmp/dummy), options: MiniOptions(classpathHash: [Lxsbti.compile.FileHash;@612b9022, scalacOptions: [Ljava.lang.String;@4fb613b6, javacOptions: [Ljava.lang.String;@500f3781), compilerVersion: 2.13.6, order: JavaThenScala, storeApis: true, extra: [Lxsbti.T2;@8d1f06b), hasModified: false)]
467 ms
Optional[CompileResult(analysis: Analysis: 1607 Scala sources, 7 Java sources, 2414 classes, 1073 external source dependencies, 1 binary dependency, setup: MiniSetup(output: SingleOutput(/tmp/dummy), options: MiniOptions(classpathHash: [Lxsbti.compile.FileHash;@3546dbd3, scalacOptions: [Ljava.lang.String;@4a92556e, javacOptions: [Ljava.lang.String;@6308c082), compilerVersion: 2.13.6, order: JavaThenScala, storeApis: true, extra: [Lxsbti.T2;@64dd2620), hasModified: false)]
468 ms
...
```

Attach async-profiler:

```
$ git clone jvm-profiling-tools/async-profiler && cd async-profiler
$ make clean all
./profiler.sh -d 20 --reverse -f out.html --minwidth 0.5 $PID
```

<img width="1907" alt="image" src="https://user-images.githubusercontent.com/65551/121452639-2165a200-c9e3-11eb-80c6-7efabdf612b5.png">

# Opportunities

## Inefficient creation of `Array`

`javaList.asScala.iterator.map(f).toArray` forgoes the chance to allocate the resulting array with the correct size and needs an intermediate buffer. Try something like:

```diff
   implicit class EfficientTraverse[T](seq: JList[T]) {
-    def toZincArray[R: scala.reflect.ClassTag](f: T => R): Array[R] =
-      seq.asScala.iterator.map(f).toArray
+    def toZincArray[R: scala.reflect.ClassTag](f: T => R): Array[R] = {
+      seq.stream().map(x => f(x)).toArray(new Array[R](_))
+    }
   }
```

## Dead code creates wasted `Vector`

```diff
@@ -695,9 +699,11 @@ final class ProtobufReaders(mapper: ReadMapper, currentVersion: Schema.Version)
       val name = usedName.getName.intern()
       val useScopes = util.EnumSet.noneOf(classOf[UseScope])
       val len = usedName.getScopesCount
-      val scopes = for {
+      for {
         i <- 0 to len - 1
-      } yield useScopes.add(fromUseScope(usedName.getScopes(i), usedName.getScopesValue(i)))
+      } {
+        useScopes.add(fromUseScope(usedName.getScopes(i), usedName.getScopesValue(i)))
+      }
       UsedName.make(name, useScopes)
```

## String interning isn't free

Interning likely-duplicated strings as we read them from the protobuf trades off higher CPU to achieve lower footprint. Maybe it is worth thinking of introducing a name table in the start of the protobuf and replacing fields of type string with indices into this table?

## Eager Relation Building is costly

All maps that are deserialized are converted to a `Relation[A,B]`, a pair of maps, one with the forward relations another with the inverse. The inverse map is of type is a `Map[B, Set[A]]`.

The `usedNames` map appears to be the largest. Is the inverse relation actually used? If not, refactor Zinc to avoid building this up!

The other cost here is building immutable Map/Sets, rather than just upcasting mutable one to `collection.{Map,Set}`. This may well be worth it if clients of `MRelationsNameHashing` use its operations to construct new versions of it; these would structurally share with the base version. But its worth analysing how these are used.

It looks like the value in the forward relation in `UsedName` is not actually accessed as a `Set[UsedName]`, a sequential collection would suffice.

```
    private def filteredDependencies(dependent: Set[String]): Set[String] = {
      dependent.filter {
        case from if isScalaClass(from) =>
          val affectedNames = usedNames.forward(from).filter(modifiedNames.isModified)
```

Furthermore, the list of used names for a file is only consulted if there is an `memberRef` relation between the file with a changed API to the candidate file for invalidation. This is an opportunity for laziness.

## Lazy parsing?

If the used names data for many compilation units is actually not needed, can we avoid reading it from disk and parsing it in the first place? This would require an on-disk format with random access into the map of used names for a given unit. 

Protobuf's documentation [notes](https://developers.google.com/protocol-buffers/docs/techniques#large-data):

> Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

> That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.

> Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.

I could imagine an incremental change to the current format included an entry in inc_compile.zip for each compilation unit.

## Moar Hashing?

Another #brainstorm: what if we compressed the used-names section by hashes of the names? i.e. rather than recording that `A.scala -> ["println", "Predef", ...]`, we instead record that is references `["println".hashCode, "Predef".hascode, ...]`? Saving ints rather than Strings would reduce the size of the persisted form, avoid UTF-8 decoding and String interning on deserialization.

In case of hash collisions we would get over-compilation (although it would not propagate far.)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of Analysis deserialization. #984

Opportunities

Inefficient creation of `Array`

Dead code creates wasted `Vector`

String interning isn't free

Eager Relation Building is costly

Lazy parsing?

Moar Hashing?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve performance of Analysis deserialization. #984

Description

Opportunities

Inefficient creation of Array

Dead code creates wasted Vector

String interning isn't free

Eager Relation Building is costly

Lazy parsing?

Moar Hashing?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Inefficient creation of `Array`

Dead code creates wasted `Vector`