Skip to content

Fix cached download file extension for certain URLs #3657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 13, 2018
Merged

Fix cached download file extension for certain URLs #3657

merged 5 commits into from
Feb 13, 2018

Conversation

kabel
Copy link
Contributor

@kabel kabel commented Jan 10, 2018

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew tests with your changes locally?

PHP URLs have the downloadable file in the middle of the pathname. If no extension is detected, continue up the pathname. Ex: https://php.net/get/php-7.2.1.tar.xz/from/this/mirror should cache to php-7.2.1.tar.xz.

kabel added 2 commits January 9, 2018 19:56
PHP URLs have the downloadable file in the middle of the pathname. If no
extension is detected, continue up the pathname.
@MikeMcQuaid
Copy link
Member

PHP URLs have the downloadable file in the middle of the pathname. If no extension is detected, continue up the pathname. Ex: https://php.net/get/php-7.2.1.tar.xz/from/this/mirror should cache to php-7.2.1.tar.xz.

My concern is this is a widely generic change that will affect the behaviour of a bunch of other formulae in a way that may not be correct. The downloaded filename is based off the formula and version and not the upstream name.

@kabel
Copy link
Contributor Author

kabel commented Jan 11, 2018

Right, I should have used a better example. I can certainly change it to make it fix the very specific case of php's file suffix, but I don't think it will change any other formulae that don't also exhibit the same problem. That is, tarballs in the cache directory that don't have any extension. I could also crawl through hombrew-core to get a better representation of downloaded resources and put in more tests.

@@ -303,7 +303,11 @@ def ext
# We can't use basename_without_params, because given a URL like
# https://example.com/download.php?file=foo-1.0.tar.gz
# the extension we want is ".tar.gz", not ".php".
Pathname.new(@url).extname[/[^?]+/]
url_pathname = Pathname.new(@url)
until ext = url_pathname.extname[/[^?]+/]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably handle the case where the extname never matches and/or dirname fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kabel Ping on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this in 0254605 by checking the return value of dirname.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kabel That's not sufficient. For example, if url_pathname ends up as / then this will be an infinite loop. I think it's worth changing the approach so this loop can never be infinite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did test it out through irb. I could add the root / as another escape condition, but these are going to be URLs that we're dealing with. So the top-level domain would (in most cases where it going that far) be returned as the extension. If the URL didn't have a domain (file:///), dirname would still default to .. I guess it can't hurt to be even more robust.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kabel I'm sorry if I'm not being explicit enough but I'm just not happy with a change that adds an infinite loop if the return values are ever something you do not expect. Instead of making this loop indefinitely until there's an escape condition instead consider splitting on / and iterating over that or another approach that makes this a bounded loop.

@MikeMcQuaid
Copy link
Member

I can certainly change it to make it fix the very specific case of php's file suffix, but I don't think it will change any other formulae that don't also exhibit the same problem. That is, tarballs in the cache directory that don't have any extension.

Ok. I think it's fine then if better error handling is added.

Should never reach this as most URL's will have a domain name that will
be caught as an extname.
@@ -303,7 +303,11 @@ def ext
# We can't use basename_without_params, because given a URL like
# https://example.com/download.php?file=foo-1.0.tar.gz
# the extension we want is ".tar.gz", not ".php".
Pathname.new(@url).extname[/[^?]+/]
url_pathname = Pathname.new(@url)
until ext = url_pathname.extname[/[^?]+/]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kabel That's not sufficient. For example, if url_pathname ends up as / then this will be an infinite loop. I think it's worth changing the approach so this loop can never be infinite.

@@ -303,7 +303,11 @@ def ext
# We can't use basename_without_params, because given a URL like
# https://example.com/download.php?file=foo-1.0.tar.gz
# the extension we want is ".tar.gz", not ".php".
Pathname.new(@url).extname[/[^?]+/]
url_pathname = Pathname.new(@url)
until ext = url_pathname.extname[/[^?]+/]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kabel I'm sorry if I'm not being explicit enough but I'm just not happy with a change that adds an infinite loop if the return values are ever something you do not expect. Instead of making this loop indefinitely until there's an escape condition instead consider splitting on / and iterating over that or another approach that makes this a bounded loop.

@kabel
Copy link
Contributor Author

kabel commented Feb 12, 2018

@MikeMcQuaid I understand the displeasure with the potential unbounded loop. Should be better now with .ascend.

@MikeMcQuaid
Copy link
Member

Nice find on ascend @kabel (TIL) and thanks again for the PR.

@MikeMcQuaid MikeMcQuaid merged commit 2a9660f into Homebrew:master Feb 13, 2018
@kabel kabel deleted the php-cache-ext branch March 7, 2018 03:21
@Homebrew Homebrew locked and limited conversation to collaborators May 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants