Incorrectly detecting valid UTF-16 as UTF-32LE, for which it is invalid

Given the following string:

``` python
u'\x000'.encode('utf-16')
```

chardet.detect as of 2.3.0 reports this as 'UTF-32LE' with a confidence of 1.0, but attempting to decode it as such fails with 

```
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 4-5: truncated data
```

I found this bug using [Hypothesis](http://hypothesis.readthedocs.org/en/latest/). I'd be happy to submit a pull request adding the test that found it if you'd like me to, though it is of course currently failing.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrectly detecting valid UTF-16 as UTF-32LE, for which it is invalid #62

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrectly detecting valid UTF-16 as UTF-32LE, for which it is invalid #62

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions