-
-
Notifications
You must be signed in to change notification settings - Fork 57
Closed
Labels
bugSomething isn't workingSomething isn't workingdetectionRelated to the charset detection mechanism, chaos/mess/coherenceRelated to the charset detection mechanism, chaos/mess/coherence
Description
I'm updating the charset-normalizer package in OpenWrt (with Python 3.11.6) and tried the example in https://charset-normalizer.readthedocs.io/en/latest/user/handling_result.html#handling-result:
my_byte_str = 'Bсеки човек има право на образование.'.encode('cp1251')
# Assign return value so we can fully exploit result
result = from_bytes(
my_byte_str
).best()
print(result.encoding) # cp1251
In 3.3.0 this would print cp1251
but in 3.3.1 this prints cp1257
(str(result)
returns 'Bńåźč ÷īāåź čģą ļšąāī ķą īįšąēīāąķčå.'
).
I also tried the French phrase from https://charset-normalizer.readthedocs.io/en/latest/index.html#introduction:
my_byte_str = 'Bonjour, je suis à la recherche d\'une aide sur les étoiles'.encode('cp1252')
and from_bytes(my_byte_str).best()
also has the encoding cp1257
.
I have compiled the package for arm, aarch64 and x86_64 and I get the same results.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingdetectionRelated to the charset detection mechanism, chaos/mess/coherenceRelated to the charset detection mechanism, chaos/mess/coherence