[BUG] Incorrect encoding detected in 3.3.1

I'm updating the charset-normalizer package in OpenWrt (with Python 3.11.6) and tried the example in https://charset-normalizer.readthedocs.io/en/latest/user/handling_result.html#handling-result:

```python3
my_byte_str = 'Bсеки човек има право на образование.'.encode('cp1251')

# Assign return value so we can fully exploit result
result = from_bytes(
    my_byte_str
).best()

print(result.encoding)  # cp1251
```

In 3.3.0 this would print `cp1251` but in 3.3.1 this prints `cp1257` (`str(result)` returns `'Bńåźč ÷īāåź čģą ļšąāī ķą īįšąēīāąķčå.'`).

I also tried the French phrase from https://charset-normalizer.readthedocs.io/en/latest/index.html#introduction:

```python3
my_byte_str = 'Bonjour, je suis à la recherche d\'une aide sur les étoiles'.encode('cp1252')
```

and `from_bytes(my_byte_str).best()` also has the encoding `cp1257`.

I have compiled the package for arm, aarch64 and x86_64 and I get the same results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] Incorrect encoding detected in 3.3.1 #371

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] Incorrect encoding detected in 3.3.1 #371

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions