-
Notifications
You must be signed in to change notification settings - Fork 9.7k
fix(promql): Ensure native histogram values are copied in subqueries #16879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
17ff99f
to
df388e6
Compare
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
df388e6
to
ae8ed03
Compare
Hi @beorn7 @fpetkovski, |
Thanks for catching this. We suspected there are more cases where we don't copy enough, see #16581. But it needs some investigation if this PR is the right way to solve this particular instance. @krajorama might have state on this (actually better than myself). I'll preliminarily assign this to him. I'm out sick and will take some time to look at this myself. |
I've literally made the same patch more than a year ago https://github.com/prometheus/prometheus/pull/13659/files for a different reason 😞 . I don't remember the exact reason we abandoned that PR, but I'm sure we thought we can sort out the copy/not to copy problem later, as described in the spec and #16581. To me copying is always safe, and the code is the same as mine, so I'm just reviewing the test part basically. |
I've used the debugger to step through subquery eval and the histograms returned from the We could fix the issue in |
I've spent half a day trying to figure out a simpler test to trigger the error. But the error isn't as simple as setting the same pointer twice in the As far as I understand the problem really starts in MemoizedSeriesIterator which is used in evaluating the subquery. This iterator calls into chunkenc.FloatHistogram.AtFloatHistogram which does a shallow copy of the histogram. Which means that the result of the subquery will have histograms that share the same physical spans, buckets. Thus setting any of the spans/buckets will result in setting all. I've tested an alternative fix where I patch the MemoizedSeriesIterator:
Anyway, this is not as efficient as your fix, so I've added it here for demonstration only. What really threw me was that the error isn't deterministic, even though the code looks deterministic. The only thing running in parallel is the GC, but I thought that we didn't really use that due to the pooling. But setting GOGC=off makes the problem much worse, one iteration of the subquery is enough to trigger the problem. The bug comes out non-deterministic even then , due to And the failure is that the pool contains histograms with shared spans/buckets, so when we do the On the other hand if sync.Pool is faster at letting go of the unused pool item, then we get to I don't really see how we could make the test deterministic, unless we make the use of the pool optional. Which could make actual sense in some scenarios, when we do not have to make a copy after all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I've tried to come up with a simpler test, but turned out to be too hard as explained in #16879 (comment)
@bboreham suggested to check if https://pkg.go.dev/testing/synctest has some way of getting rid of the non determinism in sync.Pool for the test. If not , could be worth an issue on Go. I've put this on my backlog, but @harry671003 if you got some time, feel free to check yourself please. NOTE: Prometheus needs to update the GO version first as this is a new feature since 1.24. |
Thanks for the reviews and merge! Appreciate all the insights - learned a lot through this. Regarding the |
…metheus#16879) Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com> Signed-off-by: Andrew Hall <andrew.hall@grafana.com>
Notes
While testing Thanos PromQL engine I found a very subtle bug in Prometheus engine. The NH samples returned from the iterator from subquery is not copied. This causes the query results from the Prometheus engine to be incorrect results.
Reproducing the bug
I originally discovered this this bug by comparing the results against Thanos engine. I also added a test in this PR.
This bug occurs when a subquery is run before other queries.
In the below test, when a subquery is included
min_over_time({__name__="http_request_duration_seconds"}[1h:1m])
the results of the rate() and delta() are wrong. I'm suspecting this is because of array reuse inmatrixIterSlice()
(and maybe other places?).Failure