Should `utils.iana_name()` return the actual IANA name?

`charset_normalizer.utils.iana_name('utf-8')` returns `'utf_8'`, which does not appear at all on https://www.iana.org/assignments/character-sets/character-sets.xhtml -- it's called `UTF-8` there, or possibly `utf-8` (as the table notes " no distinction is made between use of upper and lower case letters").

(The concrete usecase that brought this up was serving arbitrary files over HTTP and generating an appropriate `content-type: text/plain; charset=UTF-8` header for them. I was quite suprised to get `charset=utf_8` instead, which browsers don't understand and then interpret wrongly.)

I've looked at the current implementation, which is based on `encoding.aliases` from the stdlib -- but that [explicitly talks about normalizing](https://github.com/python/cpython/blob/main/Lib/encodings/aliases.py#L6) the names beforehand, because it is [meant to lookup python modules](https://github.com/python/cpython/blob/main/Lib/encodings/__init__.py#L8) AFAIU, whose syntax rules are quite different than the IANA encoding names. So I'm not sure if that's actually an appropriate datasource for that use case, or am I completely misunderstanding something here? I'll be grateful for any light that someone could shed onto this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Should `utils.iana_name()` return the actual IANA name? #572

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Should utils.iana_name() return the actual IANA name? #572

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Should `utils.iana_name()` return the actual IANA name? #572