Native histograms: fix spurios counter reset when merging recoded chunk to normal chunk #14513

krajorama · 2024-07-26T08:00:09Z

Description of problem

Fix bug in compaction of overlapping chunks: if you have two identical histogram chunks A and B , the merge over them produces a single chunk C that has the same data as A and B.

Now add one sample to chunk B without creating a new chunk and call it B'. The merge produces a list of samples that contains all common elements plus the new one at the end.

And you would think that the merge produces a single chunk equal to B' , however that's not true in all cases!

If the extra histogram added to B has a new bucket, then B is recoded into B' to contain the new bucket. Which means that iterating over B' yields samples that have en extra empty bucket.

When we merge the chunks A, B' the histogram samples can come from either A or B' randomly(*) and if we get a sample from B' and then from A, then the sample from A is now not appendable to the result and in fact causes a new chunk and a counter reset!

The real case where I've seen it is a bit more complicated, but the upshot is that we see spikes in the rate/increase of the data after compaction.

Solution

Accept this case in appendable: previous histogram has extra empty buckets. This should be appendable and append should just fill in the empty buckets in the incoming histogram.

I ended up making two functions: expandIntSpansAndBuckets which is a merge of counterResetInAnyBucket and expandSpansForward AND expandFloatSpansAndBuckets which is a merge of counterResetInAnyFloatBucket and expandSpansForward . The merge was pretty straightforward, however it didn't allow for a single generic function. Also the new functions have repeated code, but I couldn't inline it without destroying performance.

Alternate solution discarded 1

I was thinking that maybe selecting the source chunk wisely in a merge to always choose the sample with most buckets would be a fix, but that's not going to work. Imagine chunk B' from time T1 to T3 that was recoded and has "big" histograms due to a "big" sample at time T3 (big=has more buckets). However for whatever reason it lacks a sample at time T2, which another chunk A has and that sample will be "small" , causing the counter reset on merge. Since B' doesn't have a sample at T2, there's no way to choose the bigger sample.

Alternate solution discarded 2

I'm not 100% sure we should discard this.

When returning histograms from a chunk iterator, do not return empty buckets. So basically sanitize the buckets. This means adding extra iteration and check on every readout :(

Alternate solution discarded 3

Add h = h.Copy().Compact(0) before line

prometheus/storage/series.go

Line 340 in 71c90c7

t, h = seriesIter.AtHistogram(nil)

. This would mean making a copy for every histogram. Doubles the compaction time pretty much.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Allow appending to chunks when the histogram to be added is missing some buckets, but the missing buckets are empty in the chunk. For example bucket at index 5 is present in the chunk, but its value is 0 and the new histogram doesn't have a bucket at index 5. This fixes an issue of merging chunks where one chunk was recoded to retroactively have some empty buckets in all the histograms and we are merging in a histogram that doesn't have the empty bucket (because it was not recoded yet). The operation alters the histogram that is being added, however this has already been the case when appending gauge histograms. Thus the test TestHistogramSeriesToChunks in storage package is changed to explicitly test what happened to the appended histogram - Compact(0) call is removed. The new expandIntSpansAndBuckets and expandFloatSpansAndBuckets functions are a merge of expandSpansForward and counterResetInAnyBucket and counterResetInAnyFloatBucket. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

prometheus/prometheus#14513 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

jesusvazquez

LGTM

I did not see anything that was clearly wrong and I appreciate the effort in writing the extensive description of the pull request. The topic is a bit complex and that guided me a bit.

If @fionaliao @carrieedwards or @zenador have a chance to have a look I'd appreciate that as well since they are more familiar with this than I am and also because maybe this also affects the out of order implementation.

jesusvazquez · 2024-07-30T13:37:04Z

tsdb/chunkenc/histogram_test.go

@@ -1368,3 +1447,50 @@ func TestHistogramAppendOnlyErrors(t *testing.T) {
 		require.EqualError(t, err, "histogram counter reset")
 	})
 }
+
+func BenchmarkAppendable(b *testing.B) {


Can you post these benchmark results?

│ /tmp/old.bench │ /tmp/new.bench │ │ sec/op │ sec/op vs base │ Appendable-8 173.1µ ± 2% 146.1µ ± 1% -15.61% (p=0.000 n=10) │ /tmp/old.bench │ /tmp/new.bench │ │ B/op │ B/op vs base │ Appendable-8 172.0 ± 1% 145.0 ± 1% -15.70% (p=0.000 n=10) │ /tmp/old.bench │ /tmp/new.bench │ │ allocs/op │ allocs/op vs base │ Appendable-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal

fionaliao

LGTM

fionaliao · 2024-07-30T18:02:05Z

tsdb/chunkenc/float_histogram.go

+	var aInter Insert
+	var bInter Insert


Why are these called aInter/bInter rather than aInsert/bInsert?

Historical reasons:

prometheus/tsdb/chunkenc/histogram_meta.go

Line 330 in 7fab72a

var inter Insert

No idea, but I wanted to keep the resemblance to the original

For prometheus/prometheus#14513 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* test(chunkenc): appending histograms with empty buckets and gaps Append such native histograms that have empty buckets and gaps between the indexes of those buckets. There is a special case for appending counter native histograms to a chunk in TSDB: if we append a histogram that is missing some buckets that are already in chunk, then usually that's a counter reset. However if the missing bucket is empty, meaning its value is 0, then we don't consider it missing. For this case to trigger , we need to write empty buckets into the chunk. Normally native histograms are compacted when we emit them , so this is very rare and compact make sure that there are no multiple continuous empty buckets with gaps between them. The code that I've added in #14513 did not take into account that you can bypass compact and write histograms with many empty buckets, with gaps between them. These are still valid, so the code has to account for them. Main fix in the expandIntSpansAndBuckets and expandFloatSpansAndBuckets function. I've also refactored them for clarity. Consequently needed to fix insert and adjustForInserts to also allow gaps between inserts. I've added some new test cases (data driven test would be nice here, too many cases). And removed the deprecated old function that didn't deal with empty buckets at all. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com> Co-authored-by: Björn Rabenstein <beorn@grafana.com>

) * test(chunkenc): appending histograms with empty buckets and gaps Append such native histograms that have empty buckets and gaps between the indexes of those buckets. There is a special case for appending counter native histograms to a chunk in TSDB: if we append a histogram that is missing some buckets that are already in chunk, then usually that's a counter reset. However if the missing bucket is empty, meaning its value is 0, then we don't consider it missing. For this case to trigger , we need to write empty buckets into the chunk. Normally native histograms are compacted when we emit them , so this is very rare and compact make sure that there are no multiple continuous empty buckets with gaps between them. The code that I've added in prometheus#14513 did not take into account that you can bypass compact and write histograms with many empty buckets, with gaps between them. These are still valid, so the code has to account for them. Main fix in the expandIntSpansAndBuckets and expandFloatSpansAndBuckets function. I've also refactored them for clarity. Consequently needed to fix insert and adjustForInserts to also allow gaps between inserts. I've added some new test cases (data driven test would be nice here, too many cases). And removed the deprecated old function that didn't deal with empty buckets at all. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com> Co-authored-by: Björn Rabenstein <beorn@grafana.com>

) * test(chunkenc): appending histograms with empty buckets and gaps Append such native histograms that have empty buckets and gaps between the indexes of those buckets. There is a special case for appending counter native histograms to a chunk in TSDB: if we append a histogram that is missing some buckets that are already in chunk, then usually that's a counter reset. However if the missing bucket is empty, meaning its value is 0, then we don't consider it missing. For this case to trigger , we need to write empty buckets into the chunk. Normally native histograms are compacted when we emit them , so this is very rare and compact make sure that there are no multiple continuous empty buckets with gaps between them. The code that I've added in prometheus#14513 did not take into account that you can bypass compact and write histograms with many empty buckets, with gaps between them. These are still valid, so the code has to account for them. Main fix in the expandIntSpansAndBuckets and expandFloatSpansAndBuckets function. I've also refactored them for clarity. Consequently needed to fix insert and adjustForInserts to also allow gaps between inserts. I've added some new test cases (data driven test would be nice here, too many cases). And removed the deprecated old function that didn't deal with empty buckets at all. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com> Co-authored-by: Björn Rabenstein <beorn@grafana.com> Signed-off-by: Andrew Hall <andrew.hall@grafana.com>

Unit test for appending histogram with missing empty bucket

60ea3da

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama requested a review from jesusvazquez as a code owner July 26, 2024 08:00

krajorama changed the title ~~Native hisotgram: fix spurios counter reset when merging recoded chunk to normal chunk~~ Native histograms: fix spurios counter reset when merging recoded chunk to normal chunk Jul 26, 2024

krajorama marked this pull request as draft July 26, 2024 08:10

krajorama force-pushed the fix-histogram-overlap-compaction-bug branch from 493c5fb to 265655d Compare July 28, 2024 13:33

krajorama marked this pull request as ready for review July 28, 2024 13:36

Fix typo

af97e30

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama added a commit to grafana/mimir-prometheus that referenced this pull request Jul 29, 2024

Test prom pr 14513 at af97e30daf54ee78f9cf71a635e28d7f295ead02

cbc4a8c

prometheus/prometheus#14513 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama added a commit to grafana/mimir that referenced this pull request Jul 29, 2024

Test prom pr 14513

32799eb

prometheus/prometheus#14513 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama closed this Jul 30, 2024

krajorama reopened this Jul 30, 2024

jesusvazquez approved these changes Jul 30, 2024

View reviewed changes

fionaliao approved these changes Jul 30, 2024

View reviewed changes

carrieedwards approved these changes Jul 31, 2024

View reviewed changes

krajorama merged commit 00ab05c into prometheus:main Aug 1, 2024
45 checks passed

krajorama mentioned this pull request Aug 1, 2024

Merge upstream Prometheus at 00ab05c3 grafana/mimir-prometheus#664

Merged

krajorama added a commit to grafana/mimir that referenced this pull request Aug 1, 2024

Vendor update mimir-prometheus at c9f0582f0dc9

2336bdb

For prometheus/prometheus#14513 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama mentioned this pull request Aug 1, 2024

Vendor update mimir-prometheus at c9f0582f0dc9 grafana/mimir#8876

Merged

krajorama added a commit to grafana/mimir that referenced this pull request Aug 1, 2024

Vendor update mimir-prometheus at c9f0582f0dc9 (#8876)

c85d971

For prometheus/prometheus#14513 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

krajorama mentioned this pull request Aug 6, 2024

Fix: panic: runtime error: index out of range [4] with length 4 #14598

Merged

krajorama mentioned this pull request Jul 21, 2025

fix(chunkenc): appending histograms with empty buckets #16893

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Native histograms: fix spurios counter reset when merging recoded chunk to normal chunk #14513

Native histograms: fix spurios counter reset when merging recoded chunk to normal chunk #14513

Uh oh!

krajorama commented Jul 26, 2024 •

edited

Loading

Uh oh!

jesusvazquez left a comment

Uh oh!

jesusvazquez Jul 30, 2024

Uh oh!

krajorama Jul 31, 2024

Uh oh!

fionaliao left a comment

Uh oh!

fionaliao Jul 30, 2024

Uh oh!

krajorama Jul 31, 2024

Uh oh!

Uh oh!

Uh oh!

Native histograms: fix spurios counter reset when merging recoded chunk to normal chunk #14513

Native histograms: fix spurios counter reset when merging recoded chunk to normal chunk #14513

Uh oh!

Conversation

krajorama commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of problem

Solution

Alternate solution discarded 1

Alternate solution discarded 2

Alternate solution discarded 3

Uh oh!

jesusvazquez left a comment

Choose a reason for hiding this comment

Uh oh!

jesusvazquez Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

krajorama Jul 31, 2024

Choose a reason for hiding this comment

Uh oh!

fionaliao left a comment

Choose a reason for hiding this comment

Uh oh!

fionaliao Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

krajorama Jul 31, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

krajorama commented Jul 26, 2024 •

edited

Loading