Skip to content

Incorrect normalzation behaviour on character sequence '%e2%80%b3' #160

@gh2k

Description

@gh2k

Specifically, this produces an incorrect result:

1.9.3-p392 :019 > u = Addressable::URI.parse('http://example.org/%e2%80%b3')
 => #<Addressable::URI:0xd005e8 URI:http://example.org/%e2%80%b3> 
1.9.3-p392 :020 > u.normalize!
 => #<Addressable::URI:0xd005e8 URI:http://example.org/%E2%80%B2%E2%80%B2> 

Note that the normalized URL no longer matches.

I think this is related to Addressable::IDNA.unicode_normalize_kc

Specifiaclly:

1.9.3-p392 :013 > s = Addressable::URI.unencode('%e2%80%b3')
 => "″" 
1.9.3-p392 :014 > Addressable::IDNA.unicode_normalize_kc(s)
 => "′′" 

The output is now two UTF-8 characters, when previously it was one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions