Skip to content

Conversation

niw
Copy link
Contributor

@niw niw commented Jan 3, 2021

Problems

ANTLRInputStream in Swift runtime is using array of Character as internal representation to get Unicode code point as Int to supply it to the lexer. However, Swift Character is not representing Unicode code point but representing Unicode grapheme, therefore if the original String contains characters that build up from multiple Unicode code points such as Family emoji (👨‍👩‍👧‍👦, which is represented in a single Character in Swift but actually U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466), ANTLRInputStream will not able to get each Unicode code point.

Solution

  • Use array of UnicodeScalar instead.
  • Add unit tests to ensure ANTLRInputStream can read each unicode code point.

@niw niw force-pushed the fix_swift_input_stream branch from 86c89b4 to 65aacab Compare January 8, 2021 02:39
Turns out, using `UnicodeScalarView` is extremely slow.
@hanjoes
Copy link
Member

hanjoes commented Oct 11, 2021

will try to merge it as part of #3301

@parrt parrt added this to the 4.9.3 milestone Oct 11, 2021
@parrt parrt merged commit c293e23 into antlr:master Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants