Fix quoting of codepoints requiring surrogate pairs. #143

sjamesr · 2021-06-02T05:30:57Z

Previously, the parser would match each individual character within a
\Q...\E section. Runes requiring a surrogate pair would be incorrectly
treated as two individual characters.

E.g.

String source = new StringBuilder().appendCodePoint(110781).toString();

Before this change:
Parser.parse(source, ...) matches \x{1b0bd}
Parser.parse("\Q" + source + "\E", ...) matches \x{d82c}\x{dcbd}

After this change:
Parser.parse(source, ...) matches \x{1b0bd}
Parser.parse("\Q" + source + "\E", ...) matches \x{1b0bd}

Fixes #123.

codecov-commenter · 2021-06-02T05:32:36Z

Codecov Report

Merging #143 (ed25d21) into master (ddfb693) will increase coverage by 0.20%.
The diff coverage is 100.00%.

❗ Current head ed25d21 differs from pull request most recent head 1012489. Consider uploading reports for the commit 1012489 to get more accurate results

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
+ Coverage   89.06%   89.27%   +0.20%     
==========================================
  Files          18       18              
  Lines        3019     3022       +3     
  Branches      607      607              
==========================================
+ Hits         2689     2698       +9     
+ Misses        192      186       -6     
  Partials      138      138

Impacted Files	Coverage Δ
java/com/google/re2j/Parser.java	`88.10% <100.00%> (+0.04%)`	⬆️
java/com/google/re2j/MachineInput.java	`67.28% <0.00%> (+5.60%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ddfb693...1012489. Read the comment docs.

Previously, the parser would match each individual character within a \Q...\E section. Runes requiring a surrogate pair would be incorrectly treated as two individual characters. E.g. String source = new StringBuilder().appendCodePoint(110781).toString(); Before this change: Parser.parse(source, ...) matches \x{1b0bd} Parser.parse("\\Q" + source + "\\E", ...) matches \x{d82c}\x{dcbd} After this change: Parser.parse(source, ...) matches \x{1b0bd} Parser.parse("\\Q" + source + "\\E", ...) matches \x{1b0bd} Fixes google#123.

google-cla bot added the cla: yes label Jun 2, 2021

sjamesr force-pushed the surrogate_pair_codepoints branch from 8b4c92d to 1a41124 Compare June 2, 2021 05:31

sjamesr force-pushed the surrogate_pair_codepoints branch from 1a41124 to a432619 Compare June 2, 2021 05:34

sjamesr force-pushed the surrogate_pair_codepoints branch from ed25d21 to 1012489 Compare June 2, 2021 05:48

sjamesr merged commit f9c0a8c into google:master Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix quoting of codepoints requiring surrogate pairs. #143

Fix quoting of codepoints requiring surrogate pairs. #143

Uh oh!

sjamesr commented Jun 2, 2021

Uh oh!

codecov-commenter commented Jun 2, 2021 •

edited

Loading

Uh oh!

Uh oh!

Fix quoting of codepoints requiring surrogate pairs. #143

Fix quoting of codepoints requiring surrogate pairs. #143

Uh oh!

Conversation

sjamesr commented Jun 2, 2021

Uh oh!

codecov-commenter commented Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

codecov-commenter commented Jun 2, 2021 •

edited

Loading