remote write 2.0: new proto format with string interning #13052

npazosmendez · 2023-10-30T21:45:08Z

Context

This PR is the first of the multiple PRs to come that will iteratively work on the new Remote Write 2.0 format, that we intend to keep in the remote-write-1.1 feature branch. This PR in particular is focused on introducing string interning to the new format, and is a continuation of the work from #11999 which we iterated until the proposal of this PR.

The goal behind interning is to reduce resource usage in both senders and receivers: network egress, CPU and potentially memory. There's the question of whether just changing the compression algorithm from Snappy to something else is sufficient, and would make this interning not necessary. We explored that question and concluded interning is still a major win, see this comment for more details.

As a separate note, we will be including interning of all string data as part of this format via follow up PRs. This will include exemplar labels as well as per-series metadata, rebasing #11640 on top of the changes in this PR and leveraging the symbols interning.

This PR combines work/help/suggestions from @cstyan , @alexgreenbank, @npazosmendez, @bwplotka , @pracucci, @bboreham

How the interning works

The interning technique is simple: a write request includes a request-level []string with the purpose of de-duplicating strings across timeseries. Each timeseries encodes its labels as indices to that array, using a []uint32 of the form <lbl1_key_index>,<lbl1_val_index>, <lbl2_key_index>,<lbl2_val_index>....

See Prometheus Remote-Write 1.1 Specification for more details.

Benchmark

This benchmark uses the scripts in https://github.com/prometheus/prometheus/tree/remote-write-1.1/scripts/remotewrite11-bench to run (sender, receiver) pairs with the current remote write version (v1) and with the one this PR introduces (v11). Both senders are scraping over 100 pods from a real kubernetes namespace, which includes various Mimir microservices. I ran the benchmark for 2 hours.

These numbers show that string interning:

Greatly reduces senders' network egress (-47%) and receiver's CPU usage (-53%). See red boxes.
Reduces memory and CPU usage of senders, and does not increase memory usage of receivers. While these margins are lower (and random-ish) and are potentially not representative of a final version of the protocol, they show how the benefits mentioned above can be achieved without regressions, or even further benefits. See yellow boxes.

Go benchmark (old)

This benchmark was run using an old, early version of the PR. Keeping for future references.

Details

These benchmarks compare CPU and payload sizes of the current and the new protocol, including construction of the request, marshaling and compression as done by the queue manager. The benchmarks are run with different fixtures, that try to mimic X processes emitting Go runtime metrics (see createDummyTimeSeries).

$ go test -benchmem -run=^$ -bench "^BenchmarkBuild(Reduced)?WriteRequest$" github.com/prometheus/prometheus/
storage/remote
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/storage/remote
cpu: 12th Gen Intel(R) Core(TM) i7-12700
BenchmarkBuildWriteRequest/2_instances-20                  91128             12449 ns/op              1933 compressedSize/op          80 B/op          1 allocs/op
BenchmarkBuildWriteRequest/10_instances-20                 19920             59286 ns/op              8156 compressedSize/op          80 B/op          1 allocs/op
BenchmarkBuildWriteRequest/1k_instances-20                   148           7437633 ns/op            784083 compressedSize/op          80 B/op          1 allocs/op
BenchmarkBuildReducedWriteRequest/2_instances-20           59606             19980 ns/op              1786 compressedSize/op          96 B/op          1 allocs/op
BenchmarkBuildReducedWriteRequest/10_instances-20          15636             75689 ns/op              5897 compressedSize/op          96 B/op          1 allocs/op
BenchmarkBuildReducedWriteRequest/1k_instances-20            163           7334976 ns/op            518392 compressedSize/op          96 B/op          1 allocs/op
PASS
ok      github.com/prometheus/prometheus/storage/remote 10.929s

The trend goes as one would expect: with more repetition, the faster the new encoding and the smaller the new payload when compared with the original protocol. Remarkably, both CPU and payload sizes are worse if the request doesn't have a lot of repetition. To get a better understanding of how this behaves with a more realistic workload, there's the e2e benches.

e2e benchmark (old)

This benchmark was run using an old, early version of the PR. Keeping for future references.

Details

This benchmark runs two receivers and two senders, each sender/receiver pair running a different version: sender_1.0 => receiver_1.0 and sender_1.1 => receiver_1.1. Both senders are scrapping all the rest and themselves, as well as an entire real kubernetes namespace with more than 100 pods, including various services from Mimir.

TODO(@npazosmendez): share the util scripts to reproduce this benchmark with an arbitrary namespace.

Below are the values of some relevant counters after running everything for around 30 minutes:

CPU

process_cpu_seconds_total
sender-v1: 109.16
sender-v11: 109.13
receiver-v1: 164.45
receiver-v11: 127.49

Senders CPU is practically the same. Receiver using the new protocol uses ~25% less CPU.

Network

prometheus_remote_storage_bytes_total
sender-v1: 321159262
sender-v11: 243318645

Sender using the new protocol sent ~25% less data.

Mmory

Inuse heap size does not vary much (first screenshot is below). But the number of objects does change for the receivers, where the one running the new protocol is almost consistently lower (second screendshot), which should lower the stress on GC.
Also, the total bytes allocated is less:

go_memstats_alloc_bytes_total
sender-v1: 7394260880
sender-v11: 7374288920
receiver-v1: 47286295416
receiver-v11: 33116164600

Senders are practically the same. Receiver with new protocol allocated ~30% less bytes.

GiedriusS

Stupid question but wouldn't a generic compression algorithm like Snappy catch cases like this where the same string(-s) are repeated over and over again? Maybe it was mentioned somewhere 🤔

bboreham · 2023-11-03T16:36:11Z

That's what we have today. However Snappy does not do a great job (it's designed to be fast rather than tight), and it creates overhead in the receiver who has to uncompress all the data.
If the receiver only gets one copy of each string this saves space and probably garbage.

npazosmendez · 2023-11-07T20:58:36Z

We are experimenting with an alternative interning method that is giving better results. I will mark this as a draft in the meantime.

On a separate note: I'm working on some benchmarks to compare the current protocol + better compression algorithm vs this new proposed protocol. Will share those soon

npazosmendez · 2023-11-09T15:29:18Z

Here's an experiment on string interning vs using other compression algorithms with the current protocol: #13105 (comment)

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

write request format Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

the tests fail to do `range labels.Labels` on CI Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

Co-authored-by: bwplotka <bwplotka@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

also adapt tests to the new format Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

bwplotka

LGTM, we agreed to merge it as-is and iterate, thanks for the initial cleanup!

bwplotka · 2023-12-28T18:08:06Z

cmd/prometheus/main.go

@@ -432,6 +434,9 @@ func main() {
 	a.Flag("enable-feature", "Comma separated feature names to enable. Valid options: agent, exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-at-modifier, promql-negative-offset, promql-per-step-stats, promql-experimental-functions, remote-write-receiver (DEPRECATED), extra-scrape-metrics, new-service-discovery-manager, auto-gomaxprocs, no-default-scrape-port, native-histograms, otlp-write-receiver. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details.").
 		Default("").StringsVar(&cfg.featureList)

+	a.Flag("remote-write-format", "remote write proto format to use, valid options: 0 (1.0), 1 (reduced format), 3 (min64 format)").


We probably can kill those later on

npazosmendez force-pushed the alexnico-remote-write-1-1 branch from 7c008b9 to ebdf073 Compare October 30, 2023 21:49

npazosmendez marked this pull request as ready for review November 2, 2023 13:21

npazosmendez requested review from jesusvazquez, csmarchbanks, cstyan, bwplotka and tomwilkie as code owners November 2, 2023 13:21

GiedriusS reviewed Nov 3, 2023

View reviewed changes

npazosmendez marked this pull request as draft November 7, 2023 20:58

cstyan mentioned this pull request Nov 8, 2023

[meta] Remote write 2.0 #13105

Closed

24 tasks

npazosmendez force-pushed the alexnico-remote-write-1-1 branch from d3de9d7 to 46b84ab Compare November 9, 2023 13:18

npazosmendez marked this pull request as ready for review November 29, 2023 20:06

cstyan and others added 15 commits December 19, 2023 14:27

replace snappy encoding library

4c4b9aa

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

add new proto types

1ac2950

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

add decode function for new write request proto

8194000

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

add lookup table struct that is used to build the symbol table in new

0768e55

write request format Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

Implement code paths for new proto format

ce1e2ad

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

update example server to include handler for new format

b7e3665

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

Add new test client

91bdd93

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

tests and new -> original proto mapping util

005ba7a

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

add new proto support on receiver end

ab7c96a

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

Fix test

7f7cf97

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

no-brainer copypaste but more performance write support

5f5272e

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

remove some comented code

12de4c4

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

fix mocks and fixture

0b42138

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

add basic reduce remote write handler benchmark

407e596

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

refactor out common code between write methods

2e57d7e

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

cstyan and others added 16 commits December 19, 2023 14:29

more cleanup, mostly linting fixes

a8639dd

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

remove package-lock.json change again

58b1a34

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

more cleanup, address review comments

7630577

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

fix test panic

18bf4b8

Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

fix minor lint issue + use labels Range function since it looks like

3e48b8a

the tests fail to do `range labels.Labels` on CI Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

new interning format based on []string indeces

31d3956

Co-authored-by: bwplotka <bwplotka@gmail.com> Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

remove all new rw formats but the []string one

ec9300f

also adapt tests to the new format Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

cleanup rwSymbolTable

25c8bae

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

add some TODOs for later

4cfd2ea

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

don't reserve field 3 for new proto and add TODO

5aab80a

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

fix custom marshaling

83325af

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

lint

934de72

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

additional merge fixes

dc0888c

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

lint fixes

66f9386

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

fix server example

d61fda9

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

revert package-lock.json changes

a8224cc

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

npazosmendez force-pushed the alexnico-remote-write-1-1 branch from 4d9d3d9 to a8224cc Compare December 19, 2023 17:33

npazosmendez added 2 commits December 19, 2023 15:56

update example prometheus version

8df1d63

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

define separate proto types for remote write 2.0

48f9285

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

npazosmendez force-pushed the alexnico-remote-write-1-1 branch from 25c752a to 48f9285 Compare December 21, 2023 13:03

npazosmendez added 5 commits December 21, 2023 10:08

lint

38c444b

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

rename new proto types and move to separate pkg

fe41ed9

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

update prometheus version for example

6d90d71

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

make proto

175bd21

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

make Metadata not nullable

baebe1c

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

npazosmendez changed the title ~~remote write 1.1: new proto format with string interning~~ remote write 2.0: new proto format with string interning Dec 28, 2023

remove old MinSample proto message

acd0353

Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>

npazosmendez force-pushed the alexnico-remote-write-1-1 branch from 3b0ce6a to acd0353 Compare December 28, 2023 15:28

bwplotka approved these changes Dec 28, 2023

View reviewed changes

npazosmendez merged commit 6a03f5a into prometheus:remote-write-1.1 Dec 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

remote write 2.0: new proto format with string interning #13052

remote write 2.0: new proto format with string interning #13052

Uh oh!

npazosmendez commented Oct 30, 2023 •

edited

Loading

Uh oh!

GiedriusS left a comment

Uh oh!

bboreham commented Nov 3, 2023

Uh oh!

npazosmendez commented Nov 7, 2023

Uh oh!

npazosmendez commented Nov 9, 2023

Uh oh!

bwplotka left a comment

Uh oh!

bwplotka Dec 28, 2023

Uh oh!

Uh oh!

remote write 2.0: new proto format with string interning #13052

remote write 2.0: new proto format with string interning #13052

Uh oh!

Conversation

npazosmendez commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

How the interning works

Benchmark

Go benchmark (old)

e2e benchmark (old)

Uh oh!

GiedriusS left a comment

Choose a reason for hiding this comment

Uh oh!

bboreham commented Nov 3, 2023

Uh oh!

npazosmendez commented Nov 7, 2023

Uh oh!

npazosmendez commented Nov 9, 2023

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

bwplotka Dec 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

npazosmendez commented Oct 30, 2023 •

edited

Loading