Skip to content

Conversation

tintoy
Copy link
Contributor

@tintoy tintoy commented Oct 12, 2017

Fixes #3147.

@Aaronontheweb
Copy link
Member

@tintoy looks like some of the JSON framing specs are failing for Akka.Streams, but the Akka.Persistence issues might be related to this: #3149 - should find out once CI has a change to run, but you should take a look at the failing specs for streams.

@tintoy
Copy link
Contributor Author

tintoy commented Oct 12, 2017

Yep, if I revert my change, then that test passes (but of course the other tests that I added fail). As far as I can tell, it's the UnfoldResource test that seems to be wrong (although I'm not sure why since I find the code a little hard to follow). It specifies a chunk size of 50, but then expects a string that's only 10 long, which doesn't sound right.

@tintoy
Copy link
Contributor Author

tintoy commented Oct 12, 2017

Ok, found the bug - I didn't realise ByteBuffer is an alias for ArraySegment<byte>, so I was ignoring the other properties of the ArraySegment in the case that IsCompact == true. I'm just running the remaining streams tests then I'll push a fix.

if (IsCompact)
return encoding.GetString(_buffers[0].Array, _buffers[0].Offset, _buffers[0].Count);

byte[] buffer = ToArray();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What advantage does that give over using StringBuilder? I'm just curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! The problem is that when you have a ByteString composed of several ByteStrings and some of those ByteStrings represent incomplete Unicode sequences (e.g. "AB" in Unicode is 4 bytes long, and if you split the ByteString after 3 bytes, resulting in a ByteString representing 1-and-a-half characters), then the resulting string will be incorrect because some of its components will be incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the existing code and tests used UTF-8, which encodes characters from the ASCII character set using only a single byte (so the issue may not have come up before). As soon as I switched to Unicode it gave wrong results unless I called .Compact() before calling .ToString().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aaronontheweb
Copy link
Member

I like that there are multiple test cases for UTF8, Unicode, et al here. Looks like a good change to me. We'll get this into 1.3.2

@Aaronontheweb Aaronontheweb merged commit 48204e2 into akkadotnet:dev Oct 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants