Skip to content

Conversation

sjamesr
Copy link
Contributor

@sjamesr sjamesr commented Jun 2, 2021

Previously, the parser would match each individual character within a
\Q...\E section. Runes requiring a surrogate pair would be incorrectly
treated as two individual characters.

E.g.

String source = new StringBuilder().appendCodePoint(110781).toString();

Before this change:
Parser.parse(source, ...) matches \x{1b0bd}
Parser.parse("\Q" + source + "\E", ...) matches \x{d82c}\x{dcbd}

After this change:
Parser.parse(source, ...) matches \x{1b0bd}
Parser.parse("\Q" + source + "\E", ...) matches \x{1b0bd}

Fixes #123.

@google-cla google-cla bot added the cla: yes label Jun 2, 2021
@sjamesr sjamesr force-pushed the surrogate_pair_codepoints branch from 8b4c92d to 1a41124 Compare June 2, 2021 05:31
@codecov-commenter
Copy link

codecov-commenter commented Jun 2, 2021

Codecov Report

Merging #143 (ed25d21) into master (ddfb693) will increase coverage by 0.20%.
The diff coverage is 100.00%.

❗ Current head ed25d21 differs from pull request most recent head 1012489. Consider uploading reports for the commit 1012489 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
+ Coverage   89.06%   89.27%   +0.20%     
==========================================
  Files          18       18              
  Lines        3019     3022       +3     
  Branches      607      607              
==========================================
+ Hits         2689     2698       +9     
+ Misses        192      186       -6     
  Partials      138      138              
Impacted Files Coverage Δ
java/com/google/re2j/Parser.java 88.10% <100.00%> (+0.04%) ⬆️
java/com/google/re2j/MachineInput.java 67.28% <0.00%> (+5.60%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ddfb693...1012489. Read the comment docs.

@sjamesr sjamesr force-pushed the surrogate_pair_codepoints branch from 1a41124 to a432619 Compare June 2, 2021 05:34
Previously, the parser would match each individual character within a
\Q...\E section. Runes requiring a surrogate pair would be incorrectly
treated as two individual characters.

E.g.

String source = new StringBuilder().appendCodePoint(110781).toString();

Before this change:
Parser.parse(source, ...) matches \x{1b0bd}
Parser.parse("\\Q" + source + "\\E", ...) matches \x{d82c}\x{dcbd}

After this change:
Parser.parse(source, ...) matches \x{1b0bd}
Parser.parse("\\Q" + source + "\\E", ...) matches \x{1b0bd}

Fixes google#123.
@sjamesr sjamesr force-pushed the surrogate_pair_codepoints branch from ed25d21 to 1012489 Compare June 2, 2021 05:48
@sjamesr sjamesr merged commit f9c0a8c into google:master Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quoted codepoint is not matched while unquoted is matched
3 participants