Skip to content

Normalization: don't decode percent-encoded reserved characters #366

@janklimo

Description

@janklimo

Given the following example URL:

url = "https://i.guim.co.uk/img/media/97b07b907a75e7f1b4aecb092f8181ca63d0ad44/2_254_1183_709/master/1183.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom%2Cleft&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctZGVmYXVsdC5wbmc&enable=upscale&s=4c9af90b3d91c2269bad342e6b78d577"
addressable_uri = Addressable::URI.parse(url)
addressable_uri.normalize.to_s
=> "https://i.guim.co.uk/img/media/97b07b907a75e7f1b4aecb092f8181ca63d0ad44/2_254_1183_709/master/1183.jpg?width=1200&height=630&quality=85&auto=format&fit=crop&overlay-align=bottom,left&overlay-width=100p&overlay-base64=L2ltZy9zdGF0aWMvb3ZlcmxheXMvdGctZGVmYXVsdC5wbmc&enable=upscale&s=4c9af90b3d91c2269bad342e6b78d577"

normalization changes overlay-align=bottom%2Cleft to overlay-align=bottom,left.

Looks harmless but this change results in getting a 401 response instead of the image itself.

Looking at the RFC, I believe this deviates from the spec which (to my understanding) suggests sub-delims should not be decoded in the normalization process.

URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent.

This SO post supports that. I came across #320 which touches on the same issue.

Please correct me if I'm reading this wrong 👍


Duplicates of this issues:


Maintainer notes:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions