Skip to content

Conversation

aoking
Copy link

@aoking aoking commented Jan 19, 2017

'Cc' of Unicode character class must be contains NULL character(0x000),
But current code isn't contains it.
http://www.fileformat.info/info/unicode/category/Cc/list.htm

In java.util.regex.Pattern The Cc pattern is correctly implemented.

@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@aoking
Copy link
Author

aoking commented Jan 19, 2017

I signed it!

@googlebot
Copy link
Collaborator

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.

sjamesr added a commit to sjamesr/re2j that referenced this pull request Jun 3, 2020
UnicodeTablesGenerator uses Unicode data from ICU4J to generate Unicode
tables for consumption by RE2/J. Output is google-java-formatted before
it is written.

No new runtime dependencies are added to RE2/J.

The generator uses ICU4J 4.8.2 which bundles Unicode 6.0.0. This keeps
it compatible with Java 8, which RE2/J targets. Consideration should be
given for how we might upgrade to later Unicode versions without
introducing inconsistencies (e.g. RE2/J matches something that shouldn't
match according to java.lang.Character data).

There are some differences in the generated tables:
  * the new tables do not contain binary property character ranges (e.g.
    ASCII_Hex_digit), as those tables are currently unused in RE2/J.

  * Cc (control) char class now contains NUL (u+0000), this is correct
    and was also the subject of google#26.

See https://github.com/google/re2j/files/4725343/diff.txt for a full
list of differences between the old tables and the new.
sjamesr added a commit that referenced this pull request Jun 3, 2020
UnicodeTablesGenerator uses Unicode data from ICU4J to generate Unicode
tables for consumption by RE2/J. Output is google-java-formatted before
it is written.

No new runtime dependencies are added to RE2/J.

The generator uses ICU4J 4.8.2 which bundles Unicode 6.0.0. This keeps
it compatible with Java 8, which RE2/J targets. Consideration should be
given for how we might upgrade to later Unicode versions without
introducing inconsistencies (e.g. RE2/J matches something that shouldn't
match according to java.lang.Character data).

There are some differences in the generated tables:
  * the new tables do not contain binary property character ranges (e.g.
    ASCII_Hex_digit), as those tables are currently unused in RE2/J.

  * Cc (control) char class now contains NUL (u+0000), this is correct
    and was also the subject of #26.

See https://github.com/google/re2j/files/4725343/diff.txt for a full
list of differences between the old tables and the new.
@sjamesr
Copy link
Contributor

sjamesr commented Jun 3, 2020

This was (finally) fixed. Thank you for the contribution and sorry it took so long.

@sjamesr sjamesr closed this Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants