-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Ensure ByteStrings are compact when calling ToString. #3148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yep, if I revert my change, then that test passes (but of course the other tests that I added fail). As far as I can tell, it's the |
Ok, found the bug - I didn't realise |
if (IsCompact) | ||
return encoding.GetString(_buffers[0].Array, _buffers[0].Offset, _buffers[0].Count); | ||
|
||
byte[] buffer = ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What advantage does that give over using StringBuilder
? I'm just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! The problem is that when you have a ByteString
composed of several ByteString
s and some of those ByteStrings
represent incomplete Unicode sequences (e.g. "AB"
in Unicode is 4 bytes long, and if you split the ByteString
after 3 bytes, resulting in a ByteString
representing 1-and-a-half characters), then the resulting string will be incorrect because some of its components will be incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the existing code and tests used UTF-8, which encodes characters from the ASCII character set using only a single byte (so the issue may not have come up before). As soon as I switched to Unicode it gave wrong results unless I called .Compact()
before calling .ToString()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that there are multiple test cases for UTF8, Unicode, et al here. Looks like a good change to me. We'll get this into 1.3.2 |
Fixes #3147.