Skip to content

Fix #1673: regex ANY_CHARACTER "\\n|." ANY_CHARACTER idiom now works #1691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

LeeTibbert
Copy link
Contributor

@LeeTibbert LeeTibbert commented Aug 23, 2019

  • This PR was motivated by Issue regex RE2 wrongly ignores newline in OR groups #1673 " regex RE2 wrongly ignores newline
    in OR groups".

    The defect was a botched rewrite of the Parser.scala method
    matchRune() in an attempt to remove multiple return statements.

    The issue is now fixed for the two patterns which would evoke
    the presenting problem. The patterns "\n|." and ".|\n"
    are both used to indicate a match to ANY_CHARACTER, including
    newline. Because regex is defined to have a preference for the
    left term, they execute slightly different code paths.

  • The defect was in SN 0.4.0 code, so this PR is not a candidate for
    backport to the SN 0.3.n series.

  • Two unit-test cases were added to ParserSuite.scala. Each tests
    one of the two patterns which evoked Issue regex RE2 wrongly ignores newline in OR groups #1673.

  • Details for future maintainers.

    • The FLAGS argument to testParseDump() in NOMATCHNL_TESTS as 0.
      That is, all flags are clear. This mean MATCH_NL is clear, which
      matches Java 8 DOTALL being clear: dot does not match newline.

    • Most of rest of Suite uses FLAGS which match PERL:
      TEST_FLAGS = MATCH_NL | PERL_X | UNICODE_GROUPS`
      This sets dot to match newline and does not evoke Issue regex RE2 wrongly ignores newline in OR groups #1673.

    • Before this PR, Parser.scala with MATCH_NL clear compiled uses of
      the patterns to "dnl{}"; that is, ROP.ANY_CHAR_NOT_NL.
      As of this PR, the patterns get compiled to the correct and
      expected "dnl{}", ROP.ANY_CHAR. See NONMATCHNL_TESTS .

Documentation:

  • The standard changelog entry is requested.

Testing:

  • Built and tested ("test-all") in release-fast mode using sbt 1.2.8 on
    X86_64 only . All tests pass.

@LeeTibbert LeeTibbert force-pushed the PR_sn_regex_OrAnyCharParser_I1673 branch from 738c9e3 to fe754c1 Compare August 23, 2019 08:51
…m now works

  * This PR was motivated by Issue scala-native#1673 " regex RE2 wrongly ignores
    newline in OR groups".

    The defect was a botched rewrite of the Parser.scala method
    matchRune() in an attempt to remove multiple `return` statements.

    The issue is now fixed for the two patterns which would evoke
    the presenting problem.  The patterns "\\n|." and ".|\\n"
    are both used to indicate a match to ANY_CHARACTER, including
    newline. Because regex is defined to have a preference for the
    left term, they execute slightly different code paths.

  * Two unit-test cases were added to ParserSuite.scala. Each tests
    one of the two patterns which evoked Issue scala-native#1673.

  * Details for future maintainers.

    + The FLAGS argument to testParseDump() in NOMATCHNL_TESTS as 0.
    That is, all flags are clear. This mean MATCH_NL is clear, which
    matches Java 8 DOTALL being clear: dot does not match newline.

    Most of rest of Suite uses FLAGS which match PERL:
   `TEST_FLAGS = MATCH_NL | PERL_X | UNICODE_GROUPS`
    This sets dot to match newline and does not evoke Issue scala-native#1673.

    + Before this PR, Parser.scala with MATCH_NL clear compiled uses of
      the patterns to "dnl{}"; that is, ROP.ANY_CHAR_NOT_NL.
      As of this PR, the patterns get compiled to the correct and
      expected "dnl{}", ROP.ANY_CHAR. See NONMATCHNL_TESTS .

Documentation:

    * The standard changelog entry is requested.

Testing:

  * Built and tested ("test-all") in debug mode using sbt 1.2.8 on
    X86_64 only . All tests pass.
@LeeTibbert LeeTibbert force-pushed the PR_sn_regex_OrAnyCharParser_I1673 branch from fe754c1 to 2c37853 Compare May 15, 2020 21:47
@LeeTibbert
Copy link
Contributor Author

preemptively rebased. Travis CI is all green.

@WojciechMazur
Copy link
Contributor

Rebased and merged in #2402. Thank you for publishing the original solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants