Skip to content

Conversation

rdeltour
Copy link
Member

URLs were not normalized before performing existence checks. So percent-encoded URLs sometimes triggered RSC-001 or RSC-007 errors.

This commit introduces a new normalize(URL) method in the URLUtils class. Normalization is now used when checking a URL. This notably applies to resource and ID existence checks.

Important Note:
URL normalization is not well-defined. Some percent-encoding normalization is described in RFC3986, but is not defined in the URL standard. Also, normalization (as useful for EPUBCheck) is also dependent on the URL scheme.
The normalization we apply is quite naïve and might need to be improved in the future. It should however cover the majority of HTTP URL real-world scenarios.

Fix #1479

URLs were not normalized before performing existence checks. So percent-encoded URLs sometimes triggered `RSC-001` or `RSC-007` errors.

This commit introduces a new `normalize(URL)` method in the `URLUtils` class. Normalization is now used when checking a URL. This notably applies to resource and ID existence checks.

Important Note:
  URL normalization is not well-defined. Some percent-encoding normalization is described in RFC3986, but is not defined in the URL standard. Also, normalization (as useful for EPUBCheck) is also dependent on the URL scheme.
  The normalization we apply is quite naïve and might need to be improved in the future. It should however cover the majority of HTTP URL real-world scenarios.

Fix #1479
@rdeltour rdeltour added this to the Next maintenance release milestone Apr 28, 2023
@rdeltour rdeltour requested a review from mattgarrish April 28, 2023 13:10
@rdeltour rdeltour self-assigned this Apr 28, 2023
@rdeltour rdeltour linked an issue Apr 28, 2023 that may be closed by this pull request
Base automatically changed from fix/1480/messages to main April 28, 2023 15:19
@rdeltour rdeltour merged commit 0323668 into main Apr 28, 2023
@rdeltour rdeltour deleted the fix/1479/url-normalization branch April 28, 2023 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

URI escaped filepaths not always correctly identified
2 participants