-
Notifications
You must be signed in to change notification settings - Fork 288
storage/url: allow wildcards to match new line characters #505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -20,7 +20,7 @@ func createExcludesFromWildcard(inputExcludes []string) ([]*regexp.Regexp, error | |||
for _, input := range inputExcludes { | |||
if input != "" { | |||
regexVersion := wildCardToRegexp(input) | |||
regexpCompiled, err := regexp.Compile(regexVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kucukaslan thank you for debugging this and creating the PR. I have two cosmetic comments:
-
Regex pattern is built in
wildCardToRegexp
function. Prefixing thepatternRegex
in that function with(?s)
will be more convenient. We can also drop a comment about why we are using this flag. -
regexp.Compile("(?s)", pattern)
function calls can be encapsulated in its own method where we prepend(?s)
with a an explanation. In this way the regex flag(?s)
will not be visible in multiple places in the code.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review.
As a first note, I assume these two options are alternative options. So my further comments will be about which one is preferable and why.
On option 1
Although the regexp
that wildCardToRegexp
prepared does not work accurately with new line character, I'm rather hesitated to add (?s)
in that method. Because the (?s)
is, at least technically, a flag and my concern is that would specifying flags be an expected behaviour of wildCardToRegexp
method (or adding flags is the responsibility of the caller of that method)?
On option 2
I'm not sure what kind of function do you mean.
a) If you mean a function that only takes pattern and returns the new string with (?s)
it's doable and I'm fine with it.
b) But if you mean to replace whole regexp.Compile("(?s)", pattern)
then I've a few concerns:
- In the tests we use its other variant
regexp.MustCompile
which panics instead of returning error. Then we should either write seperate method for it too1, or same method should take another parameter "isMustCompile". - If we use the same wrapper method for both Compile and MustCompile the surrounding method2 still needs to receive the error so it should return both the regex end the error even tough MustCompile won't return an error.
My preference would be Option 2.a if we want to handle both Compile and MustCompile together.
Footnotes
-
Or do you think changing wildCardToRegexp method will already fix this. ↩
-
https://github.com/peak/s5cmd/blob/acb97e9fa4d186ee47307f011fbe03f90b3e2a57/storage/url/url.go#L202-L226 ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extracted the regular expression preperation to three util functions: WildCardToRegexp, MatchFromStartToEnd and AddNewLineFlag.
Since the exclude.go was in command
package but the setPrefixAndFilter
was in storage/url
they must be exported.
Neither of the packages seemed a correct place, so I placed them to strutil
package. It might be better to rename strutil
to, more generic name, util
.
'setPrefixAndFilter' preferred *?
over *
which was preferred by exclude.go
. As far as I understood, they won't affect the results since we always use full text match. But it may have some effect in the speed.
Use greedy regex quantifier "*" instead of reluctant "*?". Since we are always matching the full keys, afaik, they're equivalent as far as we're concerned. See also https://docs.microsoft.com/en-us/dotnet/standard/base-types/quantifiers-in-regular-expressions#greedy-and-lazy-quantifiers
Co-Authored-By: İlkin Balkanay <ilkinulas@gmail.com>
This commit adds
s
flag 1 to URL's filter Regexp to "let.
match\n
".Footnotes
https://github.com/google/re2/wiki/Syntax#:~:text=text%20(default%20false)-,s,-let%20.%20match ↩