Add API option to get all the encodings confidence #96 #111

mdamien · 2017-04-11T23:11:16Z

As discussed in #96, here's my solution for the encodings listing problem.

For info, this was my first version: Just adding a method detect_all to __init__.py:

def detect_all(byte_str):
    if not isinstance(byte_str, _bin_type):
        raise TypeError('Expected object of {0} type, got: {1}'
                        ''.format(_bin_type, type(byte_str)))

    u = UniversalDetector()
    u.feed(byte_str)
    u.close()

    results = [
        {
            'encoding': prober.charset_name,
            'confidence': prober.get_confidence()
        } for prober in u._charset_probers
            if prober.get_confidence() > u.MINIMUM_THRESHOLD
    ]
    return sorted(results, key=lambda r: -r['confidence'])

dan-blanchard · 2017-04-18T00:21:54Z

I think for now the initial detect_all might actually be preferable. I know @sigmavirus24 wanted it to be a separate function.

dan-blanchard

Thanks for the PR. This has been something we wanted to add for a while. As I said in the other comment though, I think the detect_all function might actually be the better approach.

by treating the self.done = True as a real finish point of the analysis

mdamien · 2017-04-18T08:31:16Z

@dan-blanchard no problem, I replaced the PR with the detect_all solution

mdamien · 2017-10-03T12:00:14Z

@dan-blanchard do you think this could be merged ?

mdamien · 2017-10-03T15:11:04Z

@dan-blanchard thanks 😄

dan-blanchard mentioned this pull request Apr 12, 2017

chardet 3.0.0/3.0.1 broken #112

Closed

dan-blanchard requested review from dan-blanchard and sigmavirus24 April 12, 2017 19:41

mdamien force-pushed the all-encodings branch from e32f933 to 24a80dc Compare April 13, 2017 10:05

dan-blanchard requested changes Apr 18, 2017

View reviewed changes

mdamien added 3 commits April 18, 2017 08:53

Add API option to get all the encodings confidence chardet#96

89df79e

make code more straightforward

5351558

by treating the self.done = True as a real finish point of the analysis

use detect_all instead of detect(.., all=True)

d9b8115

mdamien force-pushed the all-encodings branch from 726d4a2 to d9b8115 Compare April 18, 2017 08:24

fix corner case of when there is no good prober

ca83a26

dan-blanchard merged commit c68f120 into chardet:master Oct 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add API option to get all the encodings confidence #96 #111

Add API option to get all the encodings confidence #96 #111

Uh oh!

mdamien commented Apr 11, 2017 •

edited

Loading

Uh oh!

dan-blanchard commented Apr 18, 2017

Uh oh!

dan-blanchard left a comment

Uh oh!

mdamien commented Apr 18, 2017

Uh oh!

mdamien commented Oct 3, 2017

Uh oh!

mdamien commented Oct 3, 2017

Uh oh!

Uh oh!

Add API option to get all the encodings confidence #96 #111

Add API option to get all the encodings confidence #96 #111

Uh oh!

Conversation

mdamien commented Apr 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dan-blanchard commented Apr 18, 2017

Uh oh!

dan-blanchard left a comment

Choose a reason for hiding this comment

Uh oh!

mdamien commented Apr 18, 2017

Uh oh!

mdamien commented Oct 3, 2017

Uh oh!

mdamien commented Oct 3, 2017

Uh oh!

Uh oh!

mdamien commented Apr 11, 2017 •

edited

Loading