-
Notifications
You must be signed in to change notification settings - Fork 383
Description
@lolgab kindly reported on gitter an issue where specs2 code on SN 0.4.0-?
would fail a complicated Scala ReadAllIn regex. That code is under license, so
I will not post it here, but it is available in the specs2 GitHub repository.
Thank you, @lolgab for the report. Per our off-line discussion, I am creating this
Issue so that the report does not go missing or dormant.
Problem
Upon intensive investigation, the presenting issue boils down to a small reproducing
case. SN using RE2 using a regex of "\n|."
fails to match a string of \n
. Scala JVM
and a large number of other regex variant successfully make the match.
NOTE WELL: The OR vertical bar is essential to evoking the failure. A match
on '\n' alone will succeed on all tested platforms, as expected
Short term
The "((\n|.)*)" idiom is common in code to say "match any character, including
newline). It is used in places without, or before, the regex DOTALL flag became
avaliable or common. This is probably the first of many reports on this failure.
Probably the best short term solution is to edit the javalib documentation to
describe this problem and to document one of the many (4 or more) workarounds.
My personal favorite workaround is to substitute "((\s|\S))". Second
favorite is use a range limited inline DOTALL "((?:\n|.))"..
Both are documented and work with Java 8 JVM (and above). Both
require code changes, so are not good middle-to-long range solutions.
Documenting them may reduce the frustration of people who encounter this issue.
Analysis
Solving this one is going to take time, cunning, & patience.
There are a number of software layers involved and a number of porting hops involved.
Surprisingly, this case appears to not be tested in the extensive RE2 tests.
-
The SN javalib and scalanative.regex Pattern & regex.Matcher appear to be innocent.
-
The problem shows when calling directly into the regex.RE2mumble code directly
beneath the regex.Matcher code. -
The probably next step is to do a binary search on the porting. The problem
is present in the SN port. Is it present in close-to-original Go, C or C++ code?
If no, is it present in the re2j code from which the current re2s regex was ported?