-
Notifications
You must be signed in to change notification settings - Fork 383
Improved sn.regex named group support. #1699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
LeeTibbert
wants to merge
3
commits into
scala-native:master
from
LeeTibbert:PR_sn_regex_Improved_NamedGroup_Support
Closed
Improved sn.regex named group support. #1699
LeeTibbert
wants to merge
3
commits into
scala-native:master
from
LeeTibbert:PR_sn_regex_Improved_NamedGroup_Support
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* This PR requires earlier PRs: scala-native#1397, scala-native#1693, and scala-native#1698. Travis CI will fail until those are merged. * This PR ports [re2j](https://github.com/google/re2j/) PR scala-native#49, "Named group support", dated 2018-03-04, commit d0ec5a7cfec67a08735b720a956c92d1440b3789 to Scala Native. My thanks and appreciation to the re2j developer @sjamesr for the re2j PR. * Some named group support was activated/implemented when re2j was originally ported to Scala Native. This PR removes that code in favor to the quote "new" re2j named group support. I believe there is no major functional change. As more fully described below, a few Exceptions and their messages have been changed to conform to Java 8. My hope is that using some close relative of the re2j code will help future devos, myself included, identify and track relevant changes in the re2j repository. Requiescat in Pace, sn.regex named group support, we hardly knew ye! * Deviations of note from re2j code: + Adjusted Exception thrown and message to match Java 8. The re2j message was better in the sense that it gave the name not found, but SN is trying to hew close to Java 8. + Converted from namedGroups from Map[String, Integer] to Map[String, Int] to avoid boxing values which are known to be Ints. + Re-worked the named group methods start(name), end(name), and group(name) to use a common subroutine rather than open coding. This makes it easier to get the lookup & exception handling right and consistent. * unit-test points were ported/added: + sn.regex.MatcherSuite + sn.regex.ParserSuite + java.util.regex.MatcherSuite + java.util.regex.PatternSuite The majority of the testing of both Perl and Java syntax named group handling is done in sn.regex to stay close to the code which actually does the work. Documentation: * The standard changelog entry is requested. Testing: * Built and tested ("test-all") in release-fast mode using sbt 1.2.8 on X86_64 only . All tests pass.
This is a micro-edit point fix. A better comprehensive, deterministic fix is being exercised in work-in-progress for a future PR. Prior to this commit, the "stack safe" test could intermittently fail with duplicate groups. This fix increases the size of the string to drastically reduce the chance of a duplicate name. Note that this code to generate a group name can still generate, with a 10 in 62 probability, a digit for the first character of the group name. Such a name is not a valid Java 8 group name and will cause the test to fail intermittently and in a hard to debug manner. Again, a better fix is on the way. Both fixes are in a future PR to simplify the merge process.
LeeTibbert
pushed a commit
to LeeTibbert/scala-native-fork
that referenced
this pull request
Sep 2, 2019
…e V1.3 #### This PR requires earlier PR scala-native#1699 & earlier. Travis CI will fail until those are merged. * This PR addresses a number of PRs in the parent code [re2j] (https://github.com/google/re2j/). It brings sn.regex up to date with all relevant PRs in the re2j repository dated 2019-08-21, commit 86028a5d722956800c6d6257e713d539e1bdd24d. In particular, it incorporates all relevant PRs included in re2j Release V1.3, dated 2019-07-22, commit 222899bc5e058380935311eba419a722f00b3d69 + This PR is most closely related to re2j PR "Add named capture group support to appendReplacement " dated 2018-03-05, commit 0e87e2a2bb403edb4196698b6707eaa2c524cf96 It looks like some of that work had been done as part of the original port to sn.regex. This PR more closely aligns the work, adding test cases and such. + re2j PR "simplify Matcher.appendReplacement(StringBuffer, String)" dated 2019-03-05 is not relevant because the subject code was eliminated during the port to sn.regex because it was never used. + Other re2j PRs up to and including the top of the stack dated 2019-08-21 were considered and found to be not relevant. * My thanks and appreciation to the re2j developers for the re2j PRs. * Deviations of note from re2j code: + Some Exceptions were changed from IllegalArgumentException to the IllegalStateException used by Java 8. The corresponding messages were changed to match Java 8 as closely as feasible. In many case the re2j message was better because it gave the name of the duplicated group or such. For sn.regex, Java 8 alignment is more important. Some of the indices given by "near index X" are only true but the value of X may not be as good an approximation as on JVM. Similarly, sn.regex may echo a failing group in printStack() as only the near context which failed. Java 8 tends to give the entire string, making it easier to see a duplicate or such. Improvements for the next generation. * unit-test points were ported from re2j PR and additional test points created, particularly for Exception type and messages. * The code used by the "stack safe" test in NamedGroupSuite.scala was made more robust & more correct. A probabilistic fix had been added to PR scala-native#1699 to deal with one of two Heisenbugs. This PR makes a deterministic fixes to both. See file for details. Heisenbugs, especially in automated test code, are a massive pain. Normally in a test Suite, Random would be initialized with a seed so that the results would be reproducible, run over run. This was not done in NamedGroupSuite so that more group names would be tested over the long run. Yes, this can lead to Heisenbugs. A trade-off. I chose this approach because I want to know if there is a bad case lurking. Documentation: * The standard changelog entry is requested. Testing: * Built and tested ("test-all") in release-fast mode using sbt 1.2.8 on X86_64 only . All tests pass.
Rebased and merged in #2407, thank you for providing the original solution. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR requires earlier PR Port/implement s.n.regex.Regexp hashCode(), equals(), & tests. #1698.
Travis CI will fail until that is merged.
This PR ports re2j PR Build failures cont'd #49,
"Named group support", dated 2018-03-04,
commit d0ec5a7cfec67a08735b720a956c92d1440b3789 to Scala Native.
My thanks and appreciation to the re2j developer @sjamesr for the
re2j PR.
Some named group support was activated/implemented when re2j was
originally ported to Scala Native. This PR removes that code
in favor to the quote "new" re2j named group support. I believe
there is no major functional change. As more fully described below,
a few Exceptions and their messages have been changed to conform to Java 8.
My hope is that using some close relative of the re2j code will
help future devos, myself included, identify and track relevant changes in
the re2j repository.
Requiescat in Pace, sn.regex named group support, we hardly knew ye!
Deviations of note from re2j code:
Adjusted Exception thrown and message to match Java 8.
The re2j message was better in the sense that it gave the
name not found, but SN is trying to hew close to Java 8.
Converted from namedGroups from Map[String, Integer] to
Map[String, Int] to avoid boxing values which are known to be Ints.
Re-worked the named group methods start(name), end(name), and
group(name) to use a common subroutine rather than open coding.
This makes it easier to get the lookup & exception handling
right and consistent.
unit-test points were ported/added:
The majority of the testing of both Perl and Java syntax named group
handling is done in sn.regex to stay close to the code which
actually does the work.
Documentation:
Testing:
X86_64 only . All tests pass.