textparse: Optimized protobuf parser with custom streaming unmarshal. #15731

bwplotka · 2024-12-30T17:52:38Z

Depends on #15966

Previous attempts #15726 #15729

Design

We have opportunity to stream protobuf unmarshalling. It's already sort of streamed by metric family, however no memory is reused across families. With this new implementation we:

Have a single MetricSteamingDecoder that allows progressing per MetricFamily. It reuses previous MetricFamily object (e.g. internal bytes for lazy decoding of metrics).
It also allows progressing per Metric, by unmarshalling MetricFamilies WITHOUT metrics array. We only save bytes position of each metric to unmarshal it later.
When unmarshalling Metric we skip labels (we save position of each label in serialized proto for later). It offers separate Labels(*labels.ScratchBuilder) to parse and put labels directly into builder.
All strings are yoloStrings, no copying. The only "copy" is for labels as we use them later in appending, so they can't be reused.

Maintenance

While it looks cumbersome to maintain, to make things efficient, one way or another we have to have streaming/lazy decoding with reusing of as much as possible. This does not depend on gogo we can do this on OpaqueAPI too but didn't want to change too much here. We could try to work with vtprotobuf or write our own plugin for those methods, but we can start by handcrafting now, this proto is not changing at all (if it will change in future, it will be OpenMetric proto likely).

model/textparse/interface_test.go

bwplotka · 2025-01-14T14:15:56Z

Getting back to this after holidays...

bwplotka · 2025-02-03T14:47:41Z

This should be now good to go 🎉

cc @krajorama @beorn7 @bboreham

Results looks solid. There's room for further allocation improvements, but I propose we iterate. Some of those needs interface updates.

$ benchstat append-v1.txt append-v2.txt 
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/scrape
cpu: Apple M1 Pro
                                                     │ append-v1.txt │           append-v2.txt            │
                                                     │    sec/op     │   sec/op     vs base               │
ScrapeLoopAppend/data=1Fam1000Gauges/fmt=PromText-2     280.4µ ± 10%   283.9µ ± 8%        ~ (p=0.937 n=6)
ScrapeLoopAppend/data=1Fam1000Gauges/fmt=OMText-2       265.1µ ± 10%   271.7µ ± 7%        ~ (p=0.818 n=6)
ScrapeLoopAppend/data=1Fam1000Gauges/fmt=PromProto-2    447.0µ ±  2%   299.8µ ± 7%  -32.93% (p=0.002 n=6)
ScrapeLoopAppend/data=59FamsAllTypes/fmt=PromText-2     168.9µ ±  3%   168.6µ ± 2%        ~ (p=0.937 n=6)
ScrapeLoopAppend/data=59FamsAllTypes/fmt=OMText-2       166.4µ ±  1%   167.5µ ± 2%   +0.68% (p=0.041 n=6)
ScrapeLoopAppend/data=59FamsAllTypes/fmt=PromProto-2    159.0µ ±  3%   146.1µ ± 2%   -8.07% (p=0.002 n=6)
geomean                                                 230.1µ         213.8µ        -7.10%

                                                     │ append-v1.txt │            append-v2.txt            │
                                                     │     B/op      │     B/op      vs base               │
ScrapeLoopAppend/data=1Fam1000Gauges/fmt=PromText-2    178.7Ki ±  8%   182.3Ki ± 6%        ~ (p=0.240 n=6)
ScrapeLoopAppend/data=1Fam1000Gauges/fmt=OMText-2      170.5Ki ± 10%   172.3Ki ± 2%        ~ (p=0.937 n=6)
ScrapeLoopAppend/data=1Fam1000Gauges/fmt=PromProto-2   665.7Ki ±  2%   212.5Ki ± 3%  -68.08% (p=0.002 n=6)
ScrapeLoopAppend/data=59FamsAllTypes/fmt=PromText-2    74.02Ki ±  1%   73.53Ki ± 3%        ~ (p=0.589 n=6)
ScrapeLoopAppend/data=59FamsAllTypes/fmt=OMText-2      72.87Ki ±  1%   73.53Ki ± 3%        ~ (p=0.485 n=6)
ScrapeLoopAppend/data=59FamsAllTypes/fmt=PromProto-2   169.2Ki ±  1%   110.6Ki ± 5%  -34.62% (p=0.002 n=6)
geomean                                                162.6Ki         125.9Ki       -22.56%

                                                     │ append-v1.txt │            append-v2.txt            │
                                                     │   allocs/op   │ allocs/op   vs base                 │
ScrapeLoopAppend/data=1Fam1000Gauges/fmt=PromText-2       11.00 ± 0%   11.00 ± 0%        ~ (p=1.000 n=6) ¹
ScrapeLoopAppend/data=1Fam1000Gauges/fmt=OMText-2         12.00 ± 0%   12.00 ± 0%        ~ (p=1.000 n=6) ¹
ScrapeLoopAppend/data=1Fam1000Gauges/fmt=PromProto-2    8012.00 ± 0%   21.00 ± 0%  -99.74% (p=0.002 n=6)
ScrapeLoopAppend/data=59FamsAllTypes/fmt=PromText-2       18.00 ± 0%   18.00 ± 0%        ~ (p=1.000 n=6) ¹
ScrapeLoopAppend/data=59FamsAllTypes/fmt=OMText-2         19.00 ± 0%   19.00 ± 0%        ~ (p=1.000 n=6) ¹
ScrapeLoopAppend/data=59FamsAllTypes/fmt=PromProto-2     1693.0 ± 0%   642.0 ± 0%  -62.08% (p=0.002 n=6)
geomean                                                   92.15        29.11       -68.41%
¹ all samples are equal

model/textparse/nhcbparse_test.go

prompb/io/prometheus/client/decoder_test.go

bwplotka · 2025-02-04T21:38:56Z

I have some ideas to optimize further (see https://pprof.me/14c6981f2f1f0ab8dce893729df1da67/?profileType=profile%253Aalloc_objects%253Acount%253Aspace%253Abytes&color_by=filename), but I will do those in the next iteration - this PR is already big.

bwplotka · 2025-02-04T21:39:08Z

/prombench help

prombot · 2025-02-04T21:39:10Z

Available Commands:

To start benchmark: /prombench <branch or git tag to compare with>
To restart benchmark: /prombench <branch or git tag to compare with>
To stop benchmark: /prombench cancel
To print help: /prombench help

Advanced Flags for start and restart Commands:

--bench.directory=<sub-directory of github.com/prometheus/test-infra/prombench
- See the details here, defaults to manifests/prombench.
--bench.version=<branch | @commit>
- See the details here, defaults to master.

Examples:

/prombench v3.0.0
/prombench v3.0.0 --bench.version=@aca1803ccf5d795eee4b0848707eab26d05965cc --bench.directory=manifests/prombench

bwplotka · 2025-02-04T21:40:34Z

/prombench main --bench.version=bench/protofirst

prombot · 2025-02-04T21:40:37Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-15731 and main

Custom benchmark version: bench/protofirst branch

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

To restart benchmark: /prombench restart main --bench.version=bench/protofirst
To stop benchmark: /prombench cancel
To print help: /prombench help

bwplotka · 2025-02-04T22:02:04Z

/prombench cancel

prombot · 2025-02-04T22:02:06Z

Benchmark cancel is in progress.

bwplotka · 2025-02-05T08:38:00Z

/prombench cancel

bwplotka · 2025-02-12T08:22:37Z

Rebased on top of #16012 otherwise good to go cc @bboreham - thanks for amazing review! 💪🏽

bwplotka · 2025-02-12T09:02:20Z

/prombench main --bench.version=bench/cross-feature/protofirst

prombot · 2025-02-12T09:02:25Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-15731 and main

Custom benchmark version: bench/cross-feature/protofirst branch

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

To restart benchmark: /prombench restart main --bench.version=bench/cross-feature/protofirst
To stop benchmark: /prombench cancel
To print help: /prombench help

Depends on #15731 Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka · 2025-02-12T13:42:09Z

Summary

The current state:

Microbenchmarks from #15731 (comment) are still valid.

Protobuf parsing is:

15-40% faster (lower CPU time)
Allocates ~60% less memory.
Allocates 2x-700x less objects in memory.
It's the fastest (CPU time) scrape protocol now (~6% faster than PromText and OMText)
It's allocating ~30% memory than other protocols.

Macrobenchmarks from #15731 (comment) and #15731 (comment)

Slightly more allocs/s vs OpenMetrics Text
Slightly higher CPU use vs OpenMetrics text (although significantly lower than proto parsing without this optimization on main). This is likely due to higher allocs/s (GC overhead).

RSS on par or lower:

Next steps

I propose we cache more in #16020 (more complex, this PR can be merged safely without #16020 )

bwplotka · 2025-02-12T13:42:18Z

/prombench cancel

prombot · 2025-02-12T13:42:21Z

Benchmark cancel is in progress.

…(...) (#16012) * model/textparse: Change parser interface Metric(...) string to Labels(...) Simplified the interface given no one is using the return argument. Renamed for clarity too. Found and discussed #15731 (comment) Signed-off-by: bwplotka <bwplotka@gmail.com> * Fixed comments; optimized not needed copy for om and text. Signed-off-by: bwplotka <bwplotka@gmail.com> --------- Signed-off-by: bwplotka <bwplotka@gmail.com>

Signed-off-by: bwplotka <bwplotka@gmail.com> Update model/textparse/protobufparse.go Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Addressing comments. Signed-off-by: bwplotka <bwplotka@gmail.com> decoder: reuse histograms and summaries. Signed-off-by: bwplotka <bwplotka@gmail.com> optimize help returning (5% of mem utilization). Signed-off-by: bwplotka <bwplotka@gmail.com> Apply suggestions from code review Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Update prompb/io/prometheus/client/decoder.go Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Fix build. Signed-off-by: bwplotka <bwplotka@gmail.com>

bboreham

Let’s give it a go.

…(...) (prometheus#16012) * model/textparse: Change parser interface Metric(...) string to Labels(...) Simplified the interface given no one is using the return argument. Renamed for clarity too. Found and discussed prometheus#15731 (comment) Signed-off-by: bwplotka <bwplotka@gmail.com> * Fixed comments; optimized not needed copy for om and text. Signed-off-by: bwplotka <bwplotka@gmail.com> --------- Signed-off-by: bwplotka <bwplotka@gmail.com>

model/textparse/protobufparse.go

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka · 2025-02-13T10:25:30Z

/prombench main --bench.version=bench/protofirst

prombot · 2025-02-13T10:25:33Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-15731 and main

Custom benchmark version: bench/protofirst branch

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

To restart benchmark: /prombench restart main --bench.version=bench/protofirst
To stop benchmark: /prombench cancel
To print help: /prombench help

This comment was marked as outdated.

Sign in to view

bwplotka mentioned this pull request Dec 30, 2024

scrape (histograms): Investigate (and address) protobuf scraping performance problems #14668

Closed

bwplotka force-pushed the protoopt branch 3 times, most recently from 78e9d70 to 1b22fd0 Compare December 31, 2024 12:58

This comment was marked as outdated.

Sign in to view

bwplotka commented Dec 31, 2024

View reviewed changes

model/textparse/interface_test.go Outdated Show resolved Hide resolved

bwplotka requested review from krajorama and bboreham January 3, 2025 10:46

Base automatically changed from scrapebench to main January 14, 2025 14:15

bwplotka force-pushed the protoopt branch 2 times, most recently from 192d376 to 8333f3c Compare February 3, 2025 13:41

bwplotka marked this pull request as ready for review February 3, 2025 13:42

bwplotka force-pushed the protoopt branch 3 times, most recently from 91697a5 to 308d630 Compare February 3, 2025 14:47

bwplotka commented Feb 3, 2025

View reviewed changes

model/textparse/nhcbparse_test.go Show resolved Hide resolved

bwplotka requested a review from beorn7 February 3, 2025 14:48

GiedriusS reviewed Feb 4, 2025

View reviewed changes

prompb/io/prometheus/client/decoder_test.go Show resolved Hide resolved

prombot added the prombench label Feb 4, 2025

bwplotka requested a review from dgl as a code owner February 12, 2025 08:22

bwplotka added a commit that referenced this pull request Feb 12, 2025

textparse: Use cache for protoparse labels, break interface.

5ea4e40

Depends on #15731 Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka mentioned this pull request Feb 12, 2025

textparse: Use cache for protoparse labels, break interface. #16020

Closed

bwplotka added a commit that referenced this pull request Feb 12, 2025

textparse: Use cache for protoparse labels, break interface.

7ea8e7f

Depends on #15731 Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka added a commit that referenced this pull request Feb 12, 2025

textparse: Use cache for protoparse labels, break interface.

31d7c69

Depends on #15731 Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka added a commit that referenced this pull request Feb 12, 2025

textparse: Use cache for protoparse labels, break interface.

2acc5d7

Depends on #15731 Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka added a commit that referenced this pull request Feb 12, 2025

textparse: Use cache for protoparse labels, break interface.

c495795

Depends on #15731 Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka force-pushed the protoopt branch from 9e38bfa to 0ce497d Compare February 12, 2025 15:49

bboreham approved these changes Feb 12, 2025

View reviewed changes

bboreham approved these changes Feb 13, 2025

View reviewed changes

model/textparse/protobufparse.go Outdated Show resolved Hide resolved

Update model/textparse/protobufparse.go

139f5ab

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

bwplotka enabled auto-merge (squash) February 13, 2025 10:29

bwplotka merged commit 733a5e9 into main Feb 13, 2025
45 of 46 checks passed

bwplotka deleted the protoopt branch February 13, 2025 10:38

This was referenced Feb 21, 2025

[pdata] Find replacement for now deprecated gogo/protobuf open-telemetry/opentelemetry-collector#7095

Open

Optimize zlabel usage for stringlabels thanos-io/thanos#8077

Open

bwplotka mentioned this pull request Mar 7, 2025

OM 2.0: OM protobuf future prometheus/OpenMetrics#296

Open

charleskorn mentioned this pull request Mar 13, 2025

Sync upstream Prometheus at 8356990 grafana/mimir-prometheus#841

Merged

bwplotka mentioned this pull request Apr 9, 2025

Add RW2 support grafana/mimir#11100

Merged

10 tasks

textparse: Optimized protobuf parser with custom streaming unmarshal. #15731

textparse: Optimized protobuf parser with custom streaming unmarshal. #15731

Uh oh!

Conversation

bwplotka commented Dec 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design

Maintenance

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

bwplotka commented Jan 14, 2025

Uh oh!

bwplotka commented Feb 3, 2025

Uh oh!

Uh oh!

Uh oh!

bwplotka commented Feb 4, 2025

Uh oh!

bwplotka commented Feb 4, 2025

Uh oh!

prombot commented Feb 4, 2025

Uh oh!

bwplotka commented Feb 4, 2025

Uh oh!

prombot commented Feb 4, 2025

Uh oh!

bwplotka commented Feb 4, 2025

Uh oh!

prombot commented Feb 4, 2025

Uh oh!

bwplotka commented Feb 5, 2025

Uh oh!

bwplotka commented Feb 12, 2025

Uh oh!

bwplotka commented Feb 12, 2025

Uh oh!

prombot commented Feb 12, 2025

Uh oh!

bwplotka commented Feb 12, 2025

Summary

Microbenchmarks from #15731 (comment) are still valid.

Macrobenchmarks from #15731 (comment) and #15731 (comment)

Next steps

Uh oh!

bwplotka commented Feb 12, 2025

Uh oh!

prombot commented Feb 12, 2025

Uh oh!

bboreham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bwplotka commented Feb 13, 2025

Uh oh!

prombot commented Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!

bwplotka commented Dec 30, 2024 •

edited

Loading