Skip to content

Conversation

jcking
Copy link
Collaborator

@jcking jcking commented Dec 7, 2021

Historically ANTLRv4 C++ runtime has relied on deprecated facilities provided by the standard library to convert between UTF-8, UTF-16 (Windows only) and UTF-32. More recently https://github.com/nemtrif/utfcpp was added as an option. ANTLR has historically kept external dependencies for its runtimes to a minimum, basically nothing except the languages standard library. This change restores that by implementing full UTF-8 and UTF-32 handling based on a port off CEL's C++ implementation which is itself based on Go's unicode/utf8 package. UTF-16 support (indirectly via ws2s and s2ws) has been dropped as it was only relevant for Windows. On Windows users should use MultiByteToWideChar and WideCharToMultiByte.

With this change ANTLRv4 C++ is now dependency free, has full UTF-8 and UTF-32 handling, and ANTLRInputStream can support invalid UTF-8 strings by replacing illegal bytes with the Unicode replacement character.

@jcking
Copy link
Collaborator Author

jcking commented Dec 7, 2021

@parrt FYI Will hopefuly be ready to review later this week.

To make life easier for everyone, we are contributing a port of CEL's C++ UTF-8 decoding and encoding logic.

@jcking
Copy link
Collaborator Author

jcking commented Dec 7, 2021

@mike-lischke FYI, not ready for review yet. Hopefully later this week.

@jcking jcking force-pushed the cpp-cleanup branch 2 times, most recently from 1555652 to aa5b233 Compare December 7, 2021 16:42
@jcking jcking force-pushed the cpp-cleanup branch 2 times, most recently from 033ec8f to fdf6b9f Compare December 9, 2021 19:05
@jcking jcking marked this pull request as ready for review December 9, 2021 19:23
@jcking
Copy link
Collaborator Author

jcking commented Dec 9, 2021

@mike-lischke This is ready now. I added unit tests as well that use googletest and hooked up the scripts to run them.

Copy link
Member

@mike-lischke mike-lischke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it! Just fix the small casing issues please.

@mike-lischke
Copy link
Member

@jcking Not sure if that would show up in the patch: did you also remove the utf8cpp submodule from git?

@KvanTTT
Copy link
Member

KvanTTT commented Dec 10, 2021

There are some problems with tests. Do they related to these changes?

@jcking
Copy link
Collaborator Author

jcking commented Dec 10, 2021

@mike-lischke I do not see any submodule utf8cpp present. What path is the submodule at?

@mike-lischke
Copy link
Member

@jcking You are right, there's no submodule in git. I didn't check first, but answered from memory. So from this side we should be fine here.

@KvanTTT The tests are odd. All those marked as failure actually seem to hang. It's difficult to say what's wrong there. Also Swift tests failed, which are not affected by this patch. The CircleCI tests succeed, however, I think the failures have nothing to do with this patch.

@parrt Even though some tests fail, I'm pretty sure this patch is ok. So I'd say we can merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants