-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Labels: Reduce stringlabels memory usage for common labels #15988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@bboreham I was hoping to get your thoughts on this approach |
Testing a variant of this patch on two servers with ~32M time series each, both scraping same set of targets and using a custom list of strings to map I see: (vanilla 2.55.1)
vs (2.55.1 + patch)
And I'm seeing ~4-5GB reduction in go heap, so it does seem worth the effort. |
Sorry I haven’t read the code yet but fine with the idea. Probably just the hard-coded list of strings. |
Hardcoded list works well as a starting point IMHO, so we can start with that. When I generated a list of most common strings from last TSDB block I've managed to reduce heap from ~70GB to ~61GB, so there's potential to make this a lot more efficient. |
Can you run the labels package benchmarks please? |
They are attached to my first comment, but might be outdated with latest changes so let me re-run there. Takes a while and it's late so might be ready tomorrow. |
Regarding the hard-coded strings, I don't want this to descend into bike-shedding, but I do think as Kubernetes users we have quite a different profile from you. Label Namesinstance Label Values:false Metric names(excluding ones from home-grown software) |
Isn't that quite an intensive operation, involving reading the whole index? |
Latest benchmark results (folded)
|
That's right, and this is why ideally this list gets populated from actual usage.
Possibly, didn't look into that much yet, but Prometheus will open recent blocks anyway so it might be not that bad, needs benchmarking. Any option that gives use list from actual usage is worth following IMHO. |
Benchmarks: there are a number of slow-downs, including a couple at +50% and one at +1943%, which deserve attention. Maybe we can code-golf them back a bit. I'll fire up PromBench so we get an idea whether this is microbenchmark exaggeration or not. (next time you can skip the regex ones - they don't touch Labels) |
/prombench main |
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
Running this code below: labels.go (folded)package main
import (
"cmp"
"context"
"flag"
"fmt"
"log"
"os"
"runtime/pprof"
"slices"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/tsdb"
"github.com/prometheus/prometheus/tsdb/chunks"
)
var dbPath, blockID string
func openBlock(path, blockID string) (*tsdb.DBReadOnly, tsdb.BlockReader, error) {
db, err := tsdb.OpenDBReadOnly(path, "", nil)
if err != nil {
return nil, nil, err
}
if blockID == "" {
blockID, err = db.LastBlockID()
if err != nil {
return nil, nil, err
}
}
b, err := db.Block(blockID, tsdb.DefaultPostingsDecoderFactory)
if err != nil {
return nil, nil, err
}
return db, b, nil
}
func countLabels(dbPath, blockID string) (map[string]int, error) {
ctx := context.Background()
names := map[string]int{}
db, block, err := openBlock(dbPath, blockID)
if err != nil {
return names, err
}
ir, err := block.Index()
if err != nil {
return names, err
}
p, err := ir.Postings(ctx, "", "")
if err != nil {
return names, err
}
chks := []chunks.Meta{}
builder := labels.ScratchBuilder{}
for p.Next() {
if err = ir.Series(p.At(), &builder, &chks); err != nil {
return names, err
}
builder.Labels().Range(func(l labels.Label) {
if _, ok := names[l.Name]; !ok {
names[l.Name] = 0
}
names[l.Name]++
if _, ok := names[l.Value]; !ok {
names[l.Value] = 0
}
names[l.Value]++
})
}
if err = p.Err(); err != nil {
return names, err
}
if err = ir.Close(); err != nil {
return names, err
}
if err = db.Close(); err != nil {
return names, err
}
return names, nil
}
type labelCost struct {
name string
count int
cost int
}
func main() {
flag.StringVar(&dbPath, "dbpath", "", "Path to the DB")
flag.StringVar(&blockID, "blockid", "", "ID of the block to open")
flag.Parse()
f, err := os.Create("profile.prof")
if err != nil {
log.Fatalf("Can't create profile: %s", err)
}
defer f.Close()
if err := pprof.StartCPUProfile(f); err != nil {
log.Fatalf("Can't start to profile: %s", err)
}
defer pprof.StopCPUProfile()
names, err := countLabels(dbPath, blockID)
if err != nil {
log.Fatalf("Failed to open block: %s", err)
}
costs := make([]labelCost, 0, len(names))
for name, count := range names {
costs = append(costs, labelCost{name: name, count: count, cost: (len(name) - 1) * count})
}
slices.SortFunc(costs, func(a, b labelCost) int {
return cmp.Compare(b.cost, a.cost)
})
for i, c := range costs {
fmt.Println(c.name)
if i > 256 {
break
}
}
} Takes little over 2 minutes on the ~32M time series instance I was testing it on:
with most time spent decoding TSDB data:
2m extra startup time does seem excessive so it doesn't seem like an acceptable solution. |
Just looked at the "Top 10 label names with high memory usage" section in We see an improvement on the hard-coded strings. The overall number is still large because the size of the value is included. Quoting here so the info is available after prombench is canceled: release
pr
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some downstream users might want to transfer this data over the wire so it would be cool to expose the mapped labels and version this somehow i.e. create a const MappedLabelsVersion =
How would you do that? I can see how you would do it inside the Labels package, but then you don't need an export. |
I think most of the slowdown comes from the extra check for zero value in What happens is that if a string is mapped to our static mapping ( This simple benchmark demonstrates that:
|
Updated benchmarks (just the ones that matter):
|
Benchmark tests are running for 3 days! If this is intended ignore this message otherwise you can cancel it by commenting: |
I've added another commit that populates the mapping from last tsdb block, this should show better results in benchmarks but I suspect that the benchmark suite run here always starts with an empty instance, so won't really show any improvement. |
Through the Bytes() method https://github.com/prometheus/prometheus/blob/main/model/labels/labels_stringlabels.go#L62. Without exporting this new mapping, it's unclear what the bytes mean. |
I’ve done a whole symbol-table implementation, under |
Is there a line where this PR is acceptable or stops being acceptable? Does removing the last commit make it more acceptable? |
I was responding to @GiedriusS, not on the changes in this PR. Building a list from the last TSDB block is acceptable, though I might still do it as an add-on PR. However, we can use PromBench to check the impact. I think I can restart the process so it will see a block. |
However the (micro) benchmarks are still worrying. |
stringlabels stores all time series labels as a single string using this format: <length><name><length><value>[<length><name><length><value> ...] So a label set for my_metric{job=foo, instance="bar", env="prod", blank=""} would be encoded as: [8]__name__[9]my_metric[3]job[3]foo[8]instance[3]bar[3]env[4]prod[5]blank[0] This is a huge improvement over 'classic' labels implementation that stores all label names & values as seperate strings. There is some room for improvement though since some string are present more often than others. For example __name__ will be present for all label sets of every time series we store in HEAD, eating 1+8=9 bytes. Since __name__ is well known string we can try to use a single byte to store it in our encoded string, rather than repeat it in full each time. To be able to store strings that are short cut into a single byte we need to somehow signal that to the reader of the encoded string, for that we use the fact that zero length strings are rare and generaly not stored on time series. If we have an encoded string with zero length then this will now signal that it represents a mapped value - to learn the true value of this string we need to read the next byte which gives us index in a static mapping. That mapping must include empty string, so that we can still encode empty strings using this scheme. Example of our mapping (minimal version): 0: "" 1: "__name__" 2: "instance" 3: "job" With that mapping our example label set would be encoded as: [0]1[9]mymetric[0]3[3]foo[0]2[3]bar[3]env[4]prod[5]blank[0]0 The tricky bit is how to populate this mapping with useful strings that will result in measurable memory savings. This is further complicated by the fact that the mapping must remain static and cannot be modified during Prometheus lifetime. We can use all the 255 slots we have inside our mapping byte with well known generic strings and that will provide some measurable savings for all Prometheus users, and is essentially a slightly more compact stringlabels variant. We could also allow users to pass in a list of well know strings via flags, which will allow Prometheus operators to reduce memory usage for any labels if they know those are popular. Third option is to discover most popular strings from TSDB or WAL on startup, but that's more complicated and we might pick a list that would be the best set of mapped strings on startup, but after some time is no longer the best set. Benchmark results: goos: linux goarch: amd64 pkg: github.com/prometheus/prometheus/model/labels cpu: 13th Gen Intel(R) Core(TM) i7-13800H │ main.txt │ new1.txt │ │ sec/op │ sec/op vs base │ String-20 863.8n ± 4% 873.0n ± 4% ~ (p=0.353 n=10) Labels_Get/with_5_labels/first_label/get-20 4.763n ± 1% 5.035n ± 0% +5.72% (p=0.000 n=10) Labels_Get/with_5_labels/first_label/has-20 3.439n ± 0% 3.967n ± 0% +15.37% (p=0.000 n=10) Labels_Get/with_5_labels/middle_label/get-20 7.077n ± 1% 9.588n ± 1% +35.47% (p=0.000 n=10) Labels_Get/with_5_labels/middle_label/has-20 5.166n ± 0% 6.990n ± 1% +35.30% (p=0.000 n=10) Labels_Get/with_5_labels/last_label/get-20 9.181n ± 1% 12.970n ± 1% +41.26% (p=0.000 n=10) Labels_Get/with_5_labels/last_label/has-20 8.101n ± 1% 11.640n ± 1% +43.69% (p=0.000 n=10) Labels_Get/with_5_labels/not-found_label/get-20 3.974n ± 0% 4.768n ± 0% +19.98% (p=0.000 n=10) Labels_Get/with_5_labels/not-found_label/has-20 3.974n ± 0% 5.033n ± 0% +26.65% (p=0.000 n=10) Labels_Get/with_10_labels/first_label/get-20 4.761n ± 0% 5.042n ± 0% +5.90% (p=0.000 n=10) Labels_Get/with_10_labels/first_label/has-20 3.442n ± 0% 3.972n ± 0% +15.40% (p=0.000 n=10) Labels_Get/with_10_labels/middle_label/get-20 10.62n ± 1% 14.85n ± 1% +39.83% (p=0.000 n=10) Labels_Get/with_10_labels/middle_label/has-20 9.360n ± 1% 13.375n ± 0% +42.90% (p=0.000 n=10) Labels_Get/with_10_labels/last_label/get-20 18.19n ± 1% 22.00n ± 0% +20.97% (p=0.000 n=10) Labels_Get/with_10_labels/last_label/has-20 16.51n ± 0% 20.50n ± 1% +24.14% (p=0.000 n=10) Labels_Get/with_10_labels/not-found_label/get-20 3.985n ± 0% 4.768n ± 0% +19.62% (p=0.000 n=10) Labels_Get/with_10_labels/not-found_label/has-20 3.973n ± 0% 5.045n ± 0% +26.97% (p=0.000 n=10) Labels_Get/with_30_labels/first_label/get-20 4.773n ± 0% 5.050n ± 1% +5.80% (p=0.000 n=10) Labels_Get/with_30_labels/first_label/has-20 3.443n ± 1% 3.976n ± 2% +15.50% (p=0.000 n=10) Labels_Get/with_30_labels/middle_label/get-20 31.93n ± 0% 43.50n ± 1% +36.21% (p=0.000 n=10) Labels_Get/with_30_labels/middle_label/has-20 30.53n ± 0% 41.75n ± 1% +36.75% (p=0.000 n=10) Labels_Get/with_30_labels/last_label/get-20 106.55n ± 0% 71.17n ± 0% -33.21% (p=0.000 n=10) Labels_Get/with_30_labels/last_label/has-20 104.70n ± 0% 69.21n ± 1% -33.90% (p=0.000 n=10) Labels_Get/with_30_labels/not-found_label/get-20 3.976n ± 1% 4.772n ± 0% +20.03% (p=0.000 n=10) Labels_Get/with_30_labels/not-found_label/has-20 3.974n ± 0% 5.032n ± 0% +26.64% (p=0.000 n=10) Labels_Equals/equal-20 2.382n ± 0% 2.446n ± 0% +2.67% (p=0.000 n=10) Labels_Equals/not_equal-20 0.2741n ± 2% 0.2662n ± 2% -2.88% (p=0.001 n=10) Labels_Equals/different_sizes-20 0.2762n ± 3% 0.2652n ± 0% -3.95% (p=0.000 n=10) Labels_Equals/lots-20 2.381n ± 0% 2.386n ± 1% +0.23% (p=0.011 n=10) Labels_Equals/real_long_equal-20 6.087n ± 1% 5.558n ± 1% -8.70% (p=0.000 n=10) Labels_Equals/real_long_different_end-20 5.030n ± 0% 4.699n ± 0% -6.57% (p=0.000 n=10) Labels_Compare/equal-20 4.814n ± 1% 4.777n ± 0% -0.77% (p=0.000 n=10) Labels_Compare/not_equal-20 17.55n ± 8% 20.92n ± 1% +19.24% (p=0.000 n=10) Labels_Compare/different_sizes-20 3.711n ± 1% 3.707n ± 0% ~ (p=0.224 n=10) Labels_Compare/lots-20 27.09n ± 3% 28.73n ± 2% +6.05% (p=0.000 n=10) Labels_Compare/real_long_equal-20 27.91n ± 3% 15.67n ± 1% -43.86% (p=0.000 n=10) Labels_Compare/real_long_different_end-20 33.92n ± 1% 35.35n ± 1% +4.22% (p=0.000 n=10) Labels_Hash/typical_labels_under_1KB-20 59.63n ± 0% 59.67n ± 0% ~ (p=0.897 n=10) Labels_Hash/bigger_labels_over_1KB-20 73.42n ± 1% 73.81n ± 1% ~ (p=0.342 n=10) Labels_Hash/extremely_large_label_value_10MB-20 720.3µ ± 2% 715.2µ ± 3% ~ (p=0.971 n=10) Builder-20 371.6n ± 4% 1191.0n ± 3% +220.46% (p=0.000 n=10) Labels_Copy-20 85.52n ± 4% 53.90n ± 48% -36.97% (p=0.000 n=10) geomean 13.26n 14.68n +10.71% │ main.txt │ new1.txt │ │ B/op │ B/op vs base │ String-20 240.0 ± 0% 240.0 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/first_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/first_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/middle_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/middle_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/last_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/last_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/not-found_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/not-found_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/first_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/first_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/middle_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/middle_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/last_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/last_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/not-found_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/not-found_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/first_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/first_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/middle_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/middle_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/last_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/last_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/not-found_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/not-found_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/not_equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/different_sizes-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/lots-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/real_long_equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/real_long_different_end-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/not_equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/different_sizes-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/lots-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/real_long_equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/real_long_different_end-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Hash/typical_labels_under_1KB-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Hash/bigger_labels_over_1KB-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Hash/extremely_large_label_value_10MB-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Builder-20 224.0 ± 0% 192.0 ± 0% -14.29% (p=0.000 n=10) Labels_Copy-20 224.0 ± 0% 192.0 ± 0% -14.29% (p=0.000 n=10) geomean ² -0.73% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ main.txt │ new1.txt │ │ allocs/op │ allocs/op vs base │ String-20 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/first_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/first_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/middle_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/middle_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/last_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/last_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/not-found_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_5_labels/not-found_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/first_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/first_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/middle_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/middle_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/last_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/last_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/not-found_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_10_labels/not-found_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/first_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/first_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/middle_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/middle_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/last_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/last_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/not-found_label/get-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Get/with_30_labels/not-found_label/has-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/not_equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/different_sizes-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/lots-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/real_long_equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Equals/real_long_different_end-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/not_equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/different_sizes-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/lots-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/real_long_equal-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Compare/real_long_different_end-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Hash/typical_labels_under_1KB-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Hash/bigger_labels_over_1KB-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Hash/extremely_large_label_value_10MB-20 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Builder-20 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Labels_Copy-20 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
This will populate the static mapping of strings to store as a single byte on startup. We use the last TSDB block as the source of data, iterate the index for each label and count how many time series given label pair is referencing. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
There is definitely an increase in CPU usage as per screenshot from the previous benchmark run (I've changed the rate to be 2h so it's easier to read). I can put this behind a dedicated build tag but since this is modifying I'm not sure what the long term plans are but maybe now is a good time to declare |
The reason to leave tags as-is was to avoid inconvenience for downstream users such as Thanos. |
This makes building labels faster by having a fast lookup for string->index path via a map. Since we now need to populate both the slice that maps index->string and a map that gives us string->index. For that we add labels.MapLabels() function which handles updating the static mapping. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
I took some time to look at the performance of this PR over the weekend, and tried to speed it up. However I think this kind of thing is a blocker:
It doesn't show up most of the time when scraping, as the initial construction of labels is cached. But when reading the WAL or creating a large set of new series, this overhead will be painful. So, I don't think we could have this on by default. Do you find that this version is better than |
I did push a change that helps a bit with that, just didn't re-run all benchmarks yet.
I did leave a comment a while ago about |
We need to call mapCommonLabelSymbols() once TSDB opens all blocks, but before we start to reply the WAL and populate the HEAD. There doesn't seem to be a way to do this right now, so add a hook we can use for it. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Hello from the bug-scrub! I'm personally still unable to accept a benchmark result going 70% slower. In our cloud bills CPU is about 4x the price of RAM. Discussed at the bug-scrub: we could add this variant under another build tag, letting people in similar situations to yourself get the benefit. |
Happy to put it behind a build tag, raised #16064 to make stringlabels the default and if that gets merged I can add it as a new tag. Don't really want to add a new tag that's behind another tag. |
Still thinking whether doesn't work better as a mod to |
Raised #16588 with a version that's behind a new build tag |
stringlabels stores all time series labels as a single string using this format:
<length><name><length><value>[<length><name><length><value> ...]
So a label set for my_metric{job=foo, instance="bar", env="prod", blank=""} would be encoded as:
[8]__name__[9]my_metric[3]job[3]foo[8]instance[3]bar[3]env[4]prod[5]blank[0]
This is a huge improvement over 'classic' labels implementation that stores all label names & values as seperate strings. There is some room for improvement though since some string are present more often than others. For example
__name__
will be present for all label sets of every time series we store in HEAD, eating 1+8=9 bytes. Since__name__
is well known string we can try to use a single byte to store it in our encoded string, rather than repeat it in full each time. To be able to store strings that are short cut into a single byte we need to somehow signal that to the reader of the encoded string, for that we use the fact that zero length strings are rare and generaly not stored on time series. If we have an encoded string with zero length then this will now signal that it represents a mapped value - to learn the true value of this string we need to read the next byte which gives us index in a static mapping. That mapping must include empty string, so that we can still encode empty strings using this scheme.Example of our mapping (minimal version):
With that mapping our example label set would be encoded as:
[0]1[9]my_metric[0]3[3]foo[0]2[3]bar[3]env[4]prod[5]blank[0]0
Which would mean 40 bytes instead of 56.
Given that we have more possible mapping slots we could and should map more strings, but the tricky bit is how to populate this mapping with useful strings that will result in measurable memory savings. This is further complicated by the fact that the mapping must remain static and cannot be modified during Prometheus lifetime. We can use all the 255 slots we have inside our mapping byte with well known generic strings and that will provide some measurable savings for all Prometheus users, and is essentially a slightly more compact stringlabels variant. We could also allow users to pass in a list of well know strings via flags, which will allow Prometheus operators to reduce memory usage for any labels if they know those are popular. Third option is to discover most popular strings from TSDB or WAL on startup, but that's more complicated and we might pick a list that would be the best set of mapped strings on startup, but after some time is no longer the best set.
Benchmark results: