Skip to content

storage/url: allow wildcards to match new line characters #505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 20, 2022
Merged

storage/url: allow wildcards to match new line characters #505

merged 4 commits into from
Sep 20, 2022

Conversation

kucukaslan
Copy link
Contributor

@kucukaslan kucukaslan commented Sep 2, 2022

This commit adds s flag 1 to URL's filter Regexp to "let . match \n".

Footnotes

  1. https://github.com/google/re2/wiki/Syntax#:~:text=text%20(default%20false)-,s,-let%20.%20match

@kucukaslan kucukaslan requested a review from a team as a code owner September 2, 2022 08:18
@kucukaslan kucukaslan requested review from igungor and ilkinulas and removed request for a team September 2, 2022 08:18
@@ -20,7 +20,7 @@ func createExcludesFromWildcard(inputExcludes []string) ([]*regexp.Regexp, error
for _, input := range inputExcludes {
if input != "" {
regexVersion := wildCardToRegexp(input)
regexpCompiled, err := regexp.Compile(regexVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kucukaslan thank you for debugging this and creating the PR. I have two cosmetic comments:

  1. Regex pattern is built in wildCardToRegexp function. Prefixing the patternRegex in that function with (?s) will be more convenient. We can also drop a comment about why we are using this flag.

  2. regexp.Compile("(?s)", pattern) function calls can be encapsulated in its own method where we prepend (?s) with a an explanation. In this way the regex flag (?s) will not be visible in multiple places in the code.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review.

As a first note, I assume these two options are alternative options. So my further comments will be about which one is preferable and why.

On option 1

Although the regexp that wildCardToRegexp prepared does not work accurately with new line character, I'm rather hesitated to add (?s) in that method. Because the (?s) is, at least technically, a flag and my concern is that would specifying flags be an expected behaviour of wildCardToRegexp method (or adding flags is the responsibility of the caller of that method)?

On option 2

I'm not sure what kind of function do you mean.
a) If you mean a function that only takes pattern and returns the new string with (?s) it's doable and I'm fine with it.
b) But if you mean to replace whole regexp.Compile("(?s)", pattern) then I've a few concerns:

  1. In the tests we use its other variant regexp.MustCompile which panics instead of returning error. Then we should either write seperate method for it too1, or same method should take another parameter "isMustCompile".
  2. If we use the same wrapper method for both Compile and MustCompile the surrounding method2 still needs to receive the error so it should return both the regex end the error even tough MustCompile won't return an error.

My preference would be Option 2.a if we want to handle both Compile and MustCompile together.

Footnotes

  1. Or do you think changing wildCardToRegexp method will already fix this.

  2. https://github.com/peak/s5cmd/blob/acb97e9fa4d186ee47307f011fbe03f90b3e2a57/storage/url/url.go#L202-L226

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extracted the regular expression preperation to three util functions: WildCardToRegexp, MatchFromStartToEnd and AddNewLineFlag.

Since the exclude.go was in command package but the setPrefixAndFilter was in storage/url they must be exported.
Neither of the packages seemed a correct place, so I placed them to strutil package. It might be better to rename strutil to, more generic name, util.

'setPrefixAndFilter' preferred *? over * which was preferred by exclude.go. As far as I understood, they won't affect the results since we always use full text match. But it may have some effect in the speed.

Use greedy regex quantifier "*" instead of reluctant "*?". Since we are always matching the full keys, afaik, they're equivalent as far as we're concerned.
See also https://docs.microsoft.com/en-us/dotnet/standard/base-types/quantifiers-in-regular-expressions#greedy-and-lazy-quantifiers
Co-Authored-By: İlkin Balkanay <ilkinulas@gmail.com>
ilkinulas
ilkinulas previously approved these changes Sep 8, 2022
@ilkinulas ilkinulas merged commit 83ce8bc into peak:master Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants