ct: Support CTs in WAL; change sample record; use in PRW 2.0 #16046

bwplotka · 2025-02-17T13:30:33Z

Fixes #14218 and #14220

Rebased version of @ridwanmsharif #15254 with improvements.

This change does the following:

Change appender interface to be CT aware (optional CT)
Add created-timestamp-per-sample feature flag
Add new sample record used only if CT is appended with the sample.
Remote Write awareness of CT.

bwplotka · 2025-02-17T14:53:53Z

/prombench main --bench.version=bench/cross-feature/ct-per-sample

prombot · 2025-02-17T14:53:56Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-16046 and main

Custom benchmark version: bench/cross-feature/ct-per-sample branch

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

To restart benchmark: /prombench restart main --bench.version=bench/cross-feature/ct-per-sample
To stop benchmark: /prombench cancel
To print help: /prombench help

bwplotka · 2025-02-17T15:53:50Z

Functionality

Ok, it seems working for majority of incoming cumulatives. Not for native histograms so this is why there are still ones without CTs:

Efficiency

In terms of perf (for samples), we see some overhead:

A lot more allocs/s

Slight CPU increase

Up to 115ms vs 80ms latency spikes on query_range:

Slight increase in RSS:

Weird stuff:

I see odd notification/live long API call, not present on main...

pprof

PR
main

bwplotka · 2025-02-17T15:59:03Z

Huh, so based on the profiles majority of allocated objects (other than querying) comes from protobuf parser still (magic labels) and CreatedTimestamp validation (and error?).

No error logs on Prometheus though.

bwplotka · 2025-02-17T16:00:29Z

/prombench cancel

prombot · 2025-02-17T16:00:33Z

Benchmark cancel is in progress.

bwplotka · 2025-02-17T16:21:28Z

No regression in terms of memory on query side. Majority of allocs comes from proto parsing still. We might want to change benchmark scenario to include proto parsing for both...

bwplotka · 2025-02-17T16:36:33Z

Decoding is heavy though (from WAL watcher), which is 2x heavier for some reason:

vs main

... oh I think we have to update our calculation here:

bwplotka · 2025-02-17T16:38:19Z

In terms of objects

cstyan · 2025-02-25T16:13:50Z

RW package changes look fine 👍

Re: the allocations issue, you're suggesting that the min size needs to account for the additional integer?
dec.Len() / (1 + 1 + 1 + 8); cap(samples) < minSize?

bwplotka · 2025-02-25T16:50:27Z

It feels so, yea. I want to repeat the experiment with both versions doing protobuf though for a fair experiment.

bwplotka · 2025-02-27T14:48:28Z

Next steps:

Redo benchmark with both using protobuf parsing, double check interesting query timing results.
Review this PR, especially how I decided to form this flag.
Connect with people who are working on OTLP detlas in Prometheus, because I think they 1:1 use this feature cc @ArthurSens -- especially if we go this route with detlas, is there anything worth changing in the struct/WAL format while we are at it. For example, we could consider renaming "Created Timestamp" fields into something like "Start time"? although we pushed for created timestamp naming already and maybe it's fine.

Fixes #14218 and #14220 Rebased version of #15254 with improvements. This change does the following: - Change appender interface to be CT aware (optional CT) - Add created-timestamp-per-sample feature flag - Add new sample record used only if CT is appended with the sample. - Remote Write awareness of CT. Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: bwplotka <bwplotka@gmail.com> # Conflicts: # cmd/prometheus/main.go # scrape/helpers_test.go # storage/remote/write_handler_test.go

Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka · 2025-03-03T10:28:50Z

Retrying benchmark after optimizations in watcher. Custom scenario have one update too -- proto parsing is now enabled on both versions of Prometheus.

/prombench restart main --bench.version=bench/cross-feature/ct-per-sample

bwplotka · 2025-03-03T10:31:58Z

/prombench start main --bench.version=bench/cross-feature/ct-per-sample

prombot · 2025-03-03T10:32:01Z

Incorrect /prombench syntax; command requires one argument that matches (master|main|v[0-9]+\.[0-9]+\.[0-9]+\S*) regex.

Available Commands:

To start benchmark: /prombench <branch or git tag to compare with>
To restart benchmark: /prombench <branch or git tag to compare with>
To stop benchmark: /prombench cancel
To print help: /prombench help

Advanced Flags for start and restart Commands:

--bench.directory=<sub-directory of github.com/prometheus/test-infra/prombench
- See the details here, defaults to manifests/prombench.
--bench.version=<branch | @commit>
- See the details here, defaults to master.

Examples:

/prombench v3.0.0
/prombench v3.0.0 --bench.version=@aca1803ccf5d795eee4b0848707eab26d05965cc --bench.directory=manifests/prombench

bwplotka · 2025-03-03T11:06:02Z

/prombench main --bench.version=bench/cross-feature/ct-per-sample

prombot · 2025-03-03T11:06:05Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-16046 and main

Custom benchmark version: bench/cross-feature/ct-per-sample branch

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

To restart benchmark: /prombench restart main --bench.version=bench/cross-feature/ct-per-sample
To stop benchmark: /prombench cancel
To print help: /prombench help

bwplotka · 2025-03-03T16:52:02Z

Thanks to decoder/watcher fixes I see significant improvement in RSS and allocs/s vs main WITHOUT CTs in WAL 😱

CPU etc looks on par

Query engine timing is worse - not sure why:

CPU wise it looks like GC is more intensive, querying is actually using more CPU on main 🤔 https://pprof.me/33106dee65aec562860070b28ed2d895

Looks like simply more objects are allocated https://pprof.me/c23cf4fda6e7be3ca34d292a30fc5f13
Total memory is seems better on this PR: https://pprof.me/0c232207d6fd0f7e461054bb972ebbb0

bwplotka · 2025-03-07T09:19:07Z

/prombench cancel

prombot · 2025-03-07T09:19:09Z

Benchmark cancel is in progress.

bwplotka · 2025-03-07T12:52:43Z

Adding a few PRs first to split this change into smaller parts.

bwplotka · 2025-03-07T12:53:15Z

e.g. #16072, #16156, #16182

bwplotka · 2025-07-02T14:25:22Z

TODO:

Rebase it on main and use type-and-unit feature
Double check if we need wlog: Optimized and refactored watcher code. #16182 or can we pursue this in separate PR.
Ensure type and unit is constructing PRWv2 (as per [meta] PROM-39 type-and-unit-labels stability #16610) (separate PR)
Benchmark and find consensus on the WAL format (e.g. based on https://github.com/prometheus/test-infra/tree/bench/cross-feature/ct-per-sample/prombench/manifests/prombench but instead of metada-wal-record, use type-and-unit feature)

The main non-trivial decision is about that WAL format. In this PR I propose CT next to every sample value and timestamp (all is diff-encoded) and it was showing not that big overhead, but we have to convince others.

The alternative is to perhaps design something similar to native histograms records, so create a record per different CT.

cc @ridwanmsharif

This was referenced Feb 17, 2025

storage: add new interface to append with CT #15254

Closed

append: expand the append interface to add ct to each Append call #15255

Closed

bwplotka force-pushed the ctwal branch 6 times, most recently from a1cd1e6 to 45933cc Compare February 17, 2025 14:53

prombot added the prombench label Feb 17, 2025

bwplotka marked this pull request as ready for review February 17, 2025 15:01

bwplotka requested review from jesusvazquez, cstyan and tomwilkie as code owners February 17, 2025 15:01

This was referenced Feb 18, 2025

Stabilize and extend Parca retention prometheus/test-infra#630

Open

textparse: CreatedTimestamp now returns int64 value; optimized proto CT parsing #16072

Merged

bwplotka force-pushed the ctwal branch 2 times, most recently from d16b65b to a736e37 Compare February 28, 2025 15:59

bwplotka force-pushed the ctwal branch 5 times, most recently from 2b310a0 to ddda640 Compare March 3, 2025 09:50

Optimize Decode/Encode and WAL watching.

d4dd997

Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka force-pushed the ctwal branch from ddda640 to d4dd997 Compare March 3, 2025 09:52

bwplotka mentioned this pull request Mar 3, 2025

Add util/compression package to consolidate snappy/zstd use in Prometheus. #16156

Merged

bwplotka mentioned this pull request Mar 7, 2025

wlog: Optimized and refactored watcher code. #16182

Open

bwplotka marked this pull request as draft March 7, 2025 12:52

github-actions bot added the stale label May 7, 2025

github-actions bot removed the stale label Jul 5, 2025

This was referenced Aug 7, 2025

[meta] PROM-35 Remote Write (PRW2.0) Stability #16944

Open

prw: Remote Write 2.0 CT per Sample/Histogram #17036

Draft

ct: Support CTs in WAL; change sample record; use in PRW 2.0 #16046

Are you sure you want to change the base?

ct: Support CTs in WAL; change sample record; use in PRW 2.0 #16046

Uh oh!

Conversation

bwplotka commented Feb 17, 2025

Uh oh!

bwplotka commented Feb 17, 2025

Uh oh!

prombot commented Feb 17, 2025

Uh oh!

bwplotka commented Feb 17, 2025

Functionality

Efficiency

pprof

Uh oh!

bwplotka commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwplotka commented Feb 17, 2025

Uh oh!

prombot commented Feb 17, 2025

Uh oh!

bwplotka commented Feb 17, 2025

Uh oh!

bwplotka commented Feb 17, 2025

Uh oh!

bwplotka commented Feb 17, 2025

Uh oh!

cstyan commented Feb 25, 2025

Uh oh!

bwplotka commented Feb 25, 2025

Uh oh!

bwplotka commented Feb 27, 2025

Uh oh!

bwplotka commented Mar 3, 2025

Uh oh!

bwplotka commented Mar 3, 2025

Uh oh!

prombot commented Mar 3, 2025

Uh oh!

bwplotka commented Mar 3, 2025

Uh oh!

prombot commented Mar 3, 2025

Uh oh!

bwplotka commented Mar 3, 2025

Uh oh!

bwplotka commented Mar 7, 2025

Uh oh!

prombot commented Mar 7, 2025

Uh oh!

bwplotka commented Mar 7, 2025

Uh oh!

bwplotka commented Mar 7, 2025

Uh oh!

bwplotka commented Jul 2, 2025

Uh oh!

Uh oh!

bwplotka commented Feb 17, 2025 •

edited

Loading