Skip to content

Alternative handling of illegal IDNs (such as domains with emojis) #18

@pjsg

Description

@pjsg

The decode method can throw an exception when it finds characters not acceptable in IDNA2008. I think that the characters are acceptable in UTS46.

idna.decode("xn--co8ha.tk")

There isn't a way of signalling to decode that it should apply uts46 rules. UTS46 (in section 4.3) says:

Like [RFC3490], this will always produce a converted Unicode string. Unlike ToASCII of [RFC3490], this always signals whether or not there was an error.

The decode method currently indicates whether there was an error, but it does not always produce a converted unicode string.

The domain name above is a valid domain name and can be accessed: http://🐔🐔.tk/

Also, trying to encode this domain name also fails, even with uts46=True and transitional=True.

The python call

"xn--co8ha.tk".decode("idna")

does produce the right answer.

I would stick with the python idna2003 implementation, except that I need to improved handling of the german ß character.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions