-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Native histograms: fix spurios counter reset when merging recoded chunk to normal chunk #14513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native histograms: fix spurios counter reset when merging recoded chunk to normal chunk #14513
Conversation
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Allow appending to chunks when the histogram to be added is missing some buckets, but the missing buckets are empty in the chunk. For example bucket at index 5 is present in the chunk, but its value is 0 and the new histogram doesn't have a bucket at index 5. This fixes an issue of merging chunks where one chunk was recoded to retroactively have some empty buckets in all the histograms and we are merging in a histogram that doesn't have the empty bucket (because it was not recoded yet). The operation alters the histogram that is being added, however this has already been the case when appending gauge histograms. Thus the test TestHistogramSeriesToChunks in storage package is changed to explicitly test what happened to the appended histogram - Compact(0) call is removed. The new expandIntSpansAndBuckets and expandFloatSpansAndBuckets functions are a merge of expandSpansForward and counterResetInAnyBucket and counterResetInAnyFloatBucket. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
493c5fb
to
265655d
Compare
prometheus/prometheus#14513 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
prometheus/prometheus#14513 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I did not see anything that was clearly wrong and I appreciate the effort in writing the extensive description of the pull request. The topic is a bit complex and that guided me a bit.
If @fionaliao @carrieedwards or @zenador have a chance to have a look I'd appreciate that as well since they are more familiar with this than I am and also because maybe this also affects the out of order implementation.
@@ -1368,3 +1447,50 @@ func TestHistogramAppendOnlyErrors(t *testing.T) { | |||
require.EqualError(t, err, "histogram counter reset") | |||
}) | |||
} | |||
|
|||
func BenchmarkAppendable(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you post these benchmark results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
│ /tmp/old.bench │ /tmp/new.bench │
│ sec/op │ sec/op vs base │
Appendable-8 173.1µ ± 2% 146.1µ ± 1% -15.61% (p=0.000 n=10)
│ /tmp/old.bench │ /tmp/new.bench │
│ B/op │ B/op vs base │
Appendable-8 172.0 ± 1% 145.0 ± 1% -15.70% (p=0.000 n=10)
│ /tmp/old.bench │ /tmp/new.bench │
│ allocs/op │ allocs/op vs base │
Appendable-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
var aInter Insert | ||
var bInter Insert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these called aInter
/bInter
rather than aInsert
/bInsert
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historical reasons:
prometheus/tsdb/chunkenc/histogram_meta.go
Line 330 in 7fab72a
var inter Insert |
No idea, but I wanted to keep the resemblance to the original
For prometheus/prometheus#14513 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
For prometheus/prometheus#14513 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* test(chunkenc): appending histograms with empty buckets and gaps Append such native histograms that have empty buckets and gaps between the indexes of those buckets. There is a special case for appending counter native histograms to a chunk in TSDB: if we append a histogram that is missing some buckets that are already in chunk, then usually that's a counter reset. However if the missing bucket is empty, meaning its value is 0, then we don't consider it missing. For this case to trigger , we need to write empty buckets into the chunk. Normally native histograms are compacted when we emit them , so this is very rare and compact make sure that there are no multiple continuous empty buckets with gaps between them. The code that I've added in #14513 did not take into account that you can bypass compact and write histograms with many empty buckets, with gaps between them. These are still valid, so the code has to account for them. Main fix in the expandIntSpansAndBuckets and expandFloatSpansAndBuckets function. I've also refactored them for clarity. Consequently needed to fix insert and adjustForInserts to also allow gaps between inserts. I've added some new test cases (data driven test would be nice here, too many cases). And removed the deprecated old function that didn't deal with empty buckets at all. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com> Co-authored-by: Björn Rabenstein <beorn@grafana.com>
) * test(chunkenc): appending histograms with empty buckets and gaps Append such native histograms that have empty buckets and gaps between the indexes of those buckets. There is a special case for appending counter native histograms to a chunk in TSDB: if we append a histogram that is missing some buckets that are already in chunk, then usually that's a counter reset. However if the missing bucket is empty, meaning its value is 0, then we don't consider it missing. For this case to trigger , we need to write empty buckets into the chunk. Normally native histograms are compacted when we emit them , so this is very rare and compact make sure that there are no multiple continuous empty buckets with gaps between them. The code that I've added in prometheus#14513 did not take into account that you can bypass compact and write histograms with many empty buckets, with gaps between them. These are still valid, so the code has to account for them. Main fix in the expandIntSpansAndBuckets and expandFloatSpansAndBuckets function. I've also refactored them for clarity. Consequently needed to fix insert and adjustForInserts to also allow gaps between inserts. I've added some new test cases (data driven test would be nice here, too many cases). And removed the deprecated old function that didn't deal with empty buckets at all. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com> Co-authored-by: Björn Rabenstein <beorn@grafana.com>
) * test(chunkenc): appending histograms with empty buckets and gaps Append such native histograms that have empty buckets and gaps between the indexes of those buckets. There is a special case for appending counter native histograms to a chunk in TSDB: if we append a histogram that is missing some buckets that are already in chunk, then usually that's a counter reset. However if the missing bucket is empty, meaning its value is 0, then we don't consider it missing. For this case to trigger , we need to write empty buckets into the chunk. Normally native histograms are compacted when we emit them , so this is very rare and compact make sure that there are no multiple continuous empty buckets with gaps between them. The code that I've added in prometheus#14513 did not take into account that you can bypass compact and write histograms with many empty buckets, with gaps between them. These are still valid, so the code has to account for them. Main fix in the expandIntSpansAndBuckets and expandFloatSpansAndBuckets function. I've also refactored them for clarity. Consequently needed to fix insert and adjustForInserts to also allow gaps between inserts. I've added some new test cases (data driven test would be nice here, too many cases). And removed the deprecated old function that didn't deal with empty buckets at all. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com> Co-authored-by: Björn Rabenstein <beorn@grafana.com> Signed-off-by: Andrew Hall <andrew.hall@grafana.com>
Description of problem
Fix bug in compaction of overlapping chunks: if you have two identical histogram chunks A and B , the merge over them produces a single chunk C that has the same data as A and B.
Now add one sample to chunk B without creating a new chunk and call it B'. The merge produces a list of samples that contains all common elements plus the new one at the end.
And you would think that the merge produces a single chunk equal to B' , however that's not true in all cases!
If the extra histogram added to B has a new bucket, then B is recoded into B' to contain the new bucket. Which means that iterating over B' yields samples that have en extra empty bucket.
When we merge the chunks A, B' the histogram samples can come from either A or B' randomly(*) and if we get a sample from B' and then from A, then the sample from A is now not appendable to the result and in fact causes a new chunk and a counter reset!
The real case where I've seen it is a bit more complicated, but the upshot is that we see spikes in the rate/increase of the data after compaction.
Solution
Accept this case in appendable: previous histogram has extra empty buckets. This should be appendable and append should just fill in the empty buckets in the incoming histogram.
I ended up making two functions:
expandIntSpansAndBuckets
which is a merge ofcounterResetInAnyBucket
andexpandSpansForward
ANDexpandFloatSpansAndBuckets
which is a merge ofcounterResetInAnyFloatBucket
andexpandSpansForward
. The merge was pretty straightforward, however it didn't allow for a single generic function. Also the new functions have repeated code, but I couldn't inline it without destroying performance.Alternate solution discarded 1
I was thinking that maybe selecting the source chunk wisely in a merge to always choose the sample with most buckets would be a fix, but that's not going to work. Imagine chunk B' from time T1 to T3 that was recoded and has "big" histograms due to a "big" sample at time T3 (big=has more buckets). However for whatever reason it lacks a sample at time T2, which another chunk A has and that sample will be "small" , causing the counter reset on merge. Since B' doesn't have a sample at T2, there's no way to choose the bigger sample.
Alternate solution discarded 2
I'm not 100% sure we should discard this.
When returning histograms from a chunk iterator, do not return empty buckets. So basically sanitize the buckets. This means adding extra iteration and check on every readout :(
Alternate solution discarded 3
Add
h = h.Copy().Compact(0)
before lineprometheus/storage/series.go
Line 340 in 71c90c7