Fix #1673: regex ANY_CHARACTER "\\n|." ANY_CHARACTER idiom now works #1691
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR was motivated by Issue regex RE2 wrongly ignores newline in OR groups #1673 " regex RE2 wrongly ignores newline
in OR groups".
The defect was a botched rewrite of the Parser.scala method
matchRune() in an attempt to remove multiple
return
statements.The issue is now fixed for the two patterns which would evoke
the presenting problem. The patterns "\n|." and ".|\n"
are both used to indicate a match to ANY_CHARACTER, including
newline. Because regex is defined to have a preference for the
left term, they execute slightly different code paths.
The defect was in SN 0.4.0 code, so this PR is not a candidate for
backport to the SN 0.3.n series.
Two unit-test cases were added to ParserSuite.scala. Each tests
one of the two patterns which evoked Issue regex RE2 wrongly ignores newline in OR groups #1673.
Details for future maintainers.
The FLAGS argument to testParseDump() in NOMATCHNL_TESTS as 0.
That is, all flags are clear. This mean MATCH_NL is clear, which
matches Java 8 DOTALL being clear: dot does not match newline.
Most of rest of Suite uses FLAGS which match PERL:
TEST_FLAGS = MATCH_NL | PERL_X | UNICODE_GROUPS`
This sets dot to match newline and does not evoke Issue regex RE2 wrongly ignores newline in OR groups #1673.
Before this PR, Parser.scala with MATCH_NL clear compiled uses of
the patterns to "dnl{}"; that is, ROP.ANY_CHAR_NOT_NL.
As of this PR, the patterns get compiled to the correct and
expected "dnl{}", ROP.ANY_CHAR. See NONMATCHNL_TESTS .
Documentation:
Testing:
X86_64 only . All tests pass.