-
Notifications
You must be signed in to change notification settings - Fork 164
Fix range of make_Cc #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
I signed it! |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
UnicodeTablesGenerator uses Unicode data from ICU4J to generate Unicode tables for consumption by RE2/J. Output is google-java-formatted before it is written. No new runtime dependencies are added to RE2/J. The generator uses ICU4J 4.8.2 which bundles Unicode 6.0.0. This keeps it compatible with Java 8, which RE2/J targets. Consideration should be given for how we might upgrade to later Unicode versions without introducing inconsistencies (e.g. RE2/J matches something that shouldn't match according to java.lang.Character data). There are some differences in the generated tables: * the new tables do not contain binary property character ranges (e.g. ASCII_Hex_digit), as those tables are currently unused in RE2/J. * Cc (control) char class now contains NUL (u+0000), this is correct and was also the subject of google#26. See https://github.com/google/re2j/files/4725343/diff.txt for a full list of differences between the old tables and the new.
UnicodeTablesGenerator uses Unicode data from ICU4J to generate Unicode tables for consumption by RE2/J. Output is google-java-formatted before it is written. No new runtime dependencies are added to RE2/J. The generator uses ICU4J 4.8.2 which bundles Unicode 6.0.0. This keeps it compatible with Java 8, which RE2/J targets. Consideration should be given for how we might upgrade to later Unicode versions without introducing inconsistencies (e.g. RE2/J matches something that shouldn't match according to java.lang.Character data). There are some differences in the generated tables: * the new tables do not contain binary property character ranges (e.g. ASCII_Hex_digit), as those tables are currently unused in RE2/J. * Cc (control) char class now contains NUL (u+0000), this is correct and was also the subject of #26. See https://github.com/google/re2j/files/4725343/diff.txt for a full list of differences between the old tables and the new.
This was (finally) fixed. Thank you for the contribution and sorry it took so long. |
'Cc' of Unicode character class must be contains NULL character(0x000),
But current code isn't contains it.
http://www.fileformat.info/info/unicode/category/Cc/list.htm
In java.util.regex.Pattern The Cc pattern is correctly implemented.