-
Notifications
You must be signed in to change notification settings - Fork 452
Description
Hi, I did a short profiling session of Dokka in kotlinx.coroutines
.
The methodology was pretty straightforward: warmup Gradle daemon, run ./gradlew cleanDokkaHtmlMultiModule cleanDokkaHtmlPartial dokkaHtmlMultiModule
, profile it with async-profiler
in CPU/alloc modes, the corresponding flame graphs attached.
The root cause is quite straightforward -- for text-based HTML blocks Dokka invokes parseWithNormalisedSpaces
, which is harmless in itself, but under the hood, it invokes Jsoup-parser unconditionally, which is not only a slow operation by itself, but also allocates a 64K temporary buffer for each text element.
PR #2730 invokes Jsoup conditionally by optimistically looking for &
in the text first and applying Jsoup only when necessary (also, it's probably worth doing it manually anyway, but it's beyond the scope of my change).
It's quite hard to measure the impact of the change on coroutines because there are a lot of other tasks happening (ktor is probably a better candidate to test against, GC there takes 90% of the CPU time).
My numbers are the following:
- No outstanding
char[]
allocations in the profile (new hotspots are identified) - 10-15% less GC in the profile (35% -> ~20-25% of all execution time)
- In
no-daemon
mode, it saves around ~15% of CPU time and peak CPU consumption by ~200% (on 16 core OS X)