-
Notifications
You must be signed in to change notification settings - Fork 9.7k
fix(bstream/writeByte): ensure it appends only one byte #14854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Antoine Pultier <antoine.pultier@sintef.no>
ad18ea5
to
afaf2c8
Compare
Signed-off-by: Antoine Pultier <antoine.pultier@sintef.no>
Signed-off-by: Antoine Pultier <antoine.pultier@sintef.no>
Looking at the test package main
import (
"fmt"
"github.com/prometheus/prometheus/tsdb/chunkenc"
)
func main() {
byteArrays := [][]byte{
[]byte("\000\001\200\364\356\006@\307p\000\000\000\000\000\000"),
[]byte("\000\001\200\364\356\006@\307p\000\000\000\000\000"), // new version
}
for _, bytes := range byteArrays {
chunk_reader := chunkenc.NewXORChunk()
chunk_reader.Reset(bytes)
it := chunk_reader.Iterator(nil)
for j := 0; ; j++ {
next_val := it.Next()
if next_val == chunkenc.ValNone {
break
}
t, v := it.At()
fmt.Printf("Value at %v: %v\n", t, v)
}
fmt.Println()
}
} |
Thanks for the find. The proposed change looks plausible to me. However, this code is essentially vendored from https://github.com/dgryski/go-tsz/blob/03b7d791f4fe79270d114e1ea6929332dbf028b7/bstream.go#L59 . So it might be interesting to discuss this with @dgryski . If this change makes sense and there is no catch that we are missing here, then the upstream library should probably be changed, too. @jesusvazquez @roidelapluie @codesome @bwplotka WDYT about this? |
I implemented the original paper in a weekend for fun, more or less. I never ended up using the package for anything, although prometheus and grafana's metrictank did pick it up (and they both vendored it in and hacked it up for their own purposes.). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks plausible to me as well, but I'm not super familiar with bstream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Should we add a changelog entry? |
Changelog entries are for user-visible changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all you thoughts, and most importantly @fungiboletus for the find and fix.
Let's ship it!
Thanks everyone! It's great to see this merged. I feel better about the many hours I worked on this. |
While reading the code to understand this PR, I found a couple more small performance improvements: #14932. |
Hi,
This pull/merge request fixes a bug in bstream. Sometimes, it appends two bytes when only one is necessary.
This is a somewhat risky pull request in critical historical code. I'm submitting it as an improvement suggestion, but I would understand if the maintainers decide against it. I also only tested from my point of view, and I don't know about all the potential implications.
Still, I think the change fixes a minor bug and may bring some little performances.
For context, I'm writing a Prometheus chunk parser in Rust using nom, and an assertion seldomly failed because some of my testing XOR chunks generated by Prometheus had a whole byte of padding instead of 0 to 7 bits as documented.
I have checked my parser many times, but I think Prometheus stores an extra useless byte of padding in XOR chunks once in a while. I have not done stats, but from far it looks in the magnitude of 1% of the chunks.
It's not a bigger problem than a waste of a byte because the Prometheus parser does not read the padding bits.
I think the cause is that when
bstream/writeByte
is called and 0 bits of space are left, the function adds two bytes of space instead of one. Most of the time, the second byte will be used later, so it's not wasted. But if this happens at the end of the chunk, it is wasted.From my point of view, a simple fix is to shortcut the bit operations and add the byte to the stream. It's simpler and slightly faster as adding a byte that is luckily aligned skip bit operations. It seems to work on my machine, but I would suggest heavy testing.
An alternative is to update
chunks.md
and say xor chunks can have from 0 to 8 padding bits. It goes against the efforts of saving as much space as possible in the chunks, but it's historical code.The change doesn't seem to impact the parser. Old chunks can still be read by an updated Prometheus after the change, and new chunks can be read by older Prometheus versions. It's a removed byte in a non-read part of the chunk data. CRC32C checks will still pass, but making a new chunk with the same data will result in a 1-byte shorter chunk with a different CRC32C.
Here is some code to create an XOR Chunk featuring 8 bits of padding before the change: