Skip to content

Conversation

mdamien
Copy link
Contributor

@mdamien mdamien commented Apr 11, 2017

As discussed in #96, here's my solution for the encodings listing problem.

For info, this was my first version: Just adding a method detect_all to __init__.py:

def detect_all(byte_str):
    if not isinstance(byte_str, _bin_type):
        raise TypeError('Expected object of {0} type, got: {1}'
                        ''.format(_bin_type, type(byte_str)))

    u = UniversalDetector()
    u.feed(byte_str)
    u.close()

    results = [
        {
            'encoding': prober.charset_name,
            'confidence': prober.get_confidence()
        } for prober in u._charset_probers
            if prober.get_confidence() > u.MINIMUM_THRESHOLD
    ]
    return sorted(results, key=lambda r: -r['confidence'])

@dan-blanchard
Copy link
Member

I think for now the initial detect_all might actually be preferable. I know @sigmavirus24 wanted it to be a separate function.

Copy link
Member

@dan-blanchard dan-blanchard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. This has been something we wanted to add for a while. As I said in the other comment though, I think the detect_all function might actually be the better approach.

@mdamien
Copy link
Contributor Author

mdamien commented Apr 18, 2017

@dan-blanchard no problem, I replaced the PR with the detect_all solution

@mdamien
Copy link
Contributor Author

mdamien commented Oct 3, 2017

@dan-blanchard do you think this could be merged ?

@dan-blanchard dan-blanchard merged commit c68f120 into chardet:master Oct 3, 2017
@mdamien
Copy link
Contributor Author

mdamien commented Oct 3, 2017

@dan-blanchard thanks 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants