Skip to content

Conversation

lmb
Copy link
Collaborator

@lmb lmb commented Apr 21, 2025

The Builder currently uses binary.Write with a shared bytes.Buffer. Replace this with binary.Append and a byte slice.

This drops the number of allocations by 99% and gives a small speed up in benchmarks. The overall amount of memory allocated increases a little bit, but I can't figure out why.

The []byte is threaded through the encoder.deflate* methods because assigning to a shared buffer in encoder incurs write barrier overheads.

core: 1
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 13th Gen Intel(R) Core(TM) i7-1365U
             │  base.txt   │            append3.txt            │
             │   sec/op    │   sec/op     vs base              │
Marshaler      11.31m ± 1%   11.41m ± 2%  +0.93% (p=0.041 n=6)
BuildVmlinux   140.7m ± 4%   145.6m ± 1%  +3.50% (p=0.002 n=6)
geomean        39.89m        40.77m       +2.21%

             │   base.txt   │             append3.txt             │
             │     B/op     │     B/op      vs base               │
Marshaler      3.286Mi ± 0%   3.469Mi ± 0%   +5.58% (p=0.002 n=6)
BuildVmlinux   37.31Mi ± 0%   42.22Mi ± 0%  +13.16% (p=0.002 n=6)
geomean        11.07Mi        12.10Mi        +9.31%

             │   base.txt    │            append3.txt             │
             │   allocs/op   │  allocs/op   vs base               │
Marshaler       12627.0 ± 0%    208.0 ± 0%  -98.35% (p=0.002 n=6)
BuildVmlinux   165.689k ± 0%   1.508k ± 0%  -99.09% (p=0.002 n=6)
geomean          45.74k         560.1       -98.78%

@github-actions github-actions bot added the breaking-change Changes exported API label Apr 21, 2025
@lmb lmb force-pushed the btf-marshal-append branch from e9e25c0 to 93e5fed Compare April 22, 2025 10:21
@github-actions github-actions bot removed the breaking-change Changes exported API label Apr 22, 2025
@lmb lmb force-pushed the btf-marshal-append branch from 93e5fed to 7402b81 Compare April 22, 2025 10:23
@lmb lmb marked this pull request as ready for review April 22, 2025 10:24
@lmb lmb requested review from dylandreimerink and a team as code owners April 22, 2025 10:24
@lmb
Copy link
Collaborator Author

lmb commented Apr 22, 2025

Overall I'm a bit confused by the outcome of this PR. I'd have thought that dropping allocations this much would give a good boost to runtime, but it almost has a negative effect. Can't figure it out. Maybe one of you has an idea?

@dylandreimerink
Copy link
Member

dylandreimerink commented Apr 23, 2025

I'd have thought that dropping allocations this much would give a good boost to runtime, but it almost has a negative effect. Can't figure it out. Maybe one of you has an idea?

My guess is that its mostly about the type of allocation we are eliminating. The make([]byte, unsafe.Sizeof(*bt)) was small, static sized, and short lived. So it will have benefited heavily from the free-list allocator as described in https://go.dev/src/runtime/malloc.go

The bytes.Buffer was stored in a sync pool which likely helped somewhat. And in the end a bytes.Buffer is just a slice with a custom resizing strategy which is slightly different from what the go runtime does in append.

Even though improvement is slight, I do like this a lot more 👍

The Builder currently uses binary.Write with a shared
bytes.Buffer. Replace this with binary.Append and a byte
slice.

This drops the number of allocations by 99% and gives a small
speed up in benchmarks. The overall amount of memory allocated
increases a little bit, but I can't figure out why.

The []byte is threaded through the encoder.deflate* methods because
assigning to a shared buffer in encoder incurs write barrier
overheads.

    core: 1
    goos: linux
    goarch: amd64
    pkg: github.com/cilium/ebpf/btf
    cpu: 13th Gen Intel(R) Core(TM) i7-1365U
                 │  base.txt   │            append3.txt            │
                 │   sec/op    │   sec/op     vs base              │
    Marshaler      11.31m ± 1%   11.41m ± 2%  +0.93% (p=0.041 n=6)
    BuildVmlinux   140.7m ± 4%   145.6m ± 1%  +3.50% (p=0.002 n=6)
    geomean        39.89m        40.77m       +2.21%

                 │   base.txt   │             append3.txt             │
                 │     B/op     │     B/op      vs base               │
    Marshaler      3.286Mi ± 0%   3.469Mi ± 0%   +5.58% (p=0.002 n=6)
    BuildVmlinux   37.31Mi ± 0%   42.22Mi ± 0%  +13.16% (p=0.002 n=6)
    geomean        11.07Mi        12.10Mi        +9.31%

                 │   base.txt    │            append3.txt             │
                 │   allocs/op   │  allocs/op   vs base               │
    Marshaler       12627.0 ± 0%    208.0 ± 0%  -98.35% (p=0.002 n=6)
    BuildVmlinux   165.689k ± 0%   1.508k ± 0%  -99.09% (p=0.002 n=6)
    geomean          45.74k         560.1       -98.78%

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@lmb lmb force-pushed the btf-marshal-append branch from 7402b81 to 15c3114 Compare April 23, 2025 16:24
@lmb lmb requested a review from dylandreimerink April 23, 2025 16:25
@lmb lmb merged commit 08bddf3 into cilium:main Apr 23, 2025
42 of 45 checks passed
@lmb lmb deleted the btf-marshal-append branch April 23, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants