-
Notifications
You must be signed in to change notification settings - Fork 265
Turkish ISO-8859-9 detection support #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I have no idea about side effects. With this value it works good. More then 0.6..... always detects Turkish(iso-8859-9) as iso-8859-2.
Merge? |
@hakanzy I assume you're looking for the "Merge" button. If that's the case you need to be a maintainer on the repository. If instead you're giving an obscure direction, all pull requests need a proper amount of review before being accepted. The maintainers of chardet are busy right now and do not have the time to dedicate to review. |
So I finally got a chance to look at this, and my initial thought was that things looked fine, but then I looked at the unit test results. We currently have the repo setup so that travis builds always look like they pass (because of known failures), which makes it difficult for people to see when their changes introduce new bugs. Anyway, in the case of this PR, we go from having 27 unit test failures (see #13), to 29. If you look at the failures:
you can see that with your change we would incorrectly predict ISO-8859-9 all over the place, so I think your threshold is too low. |
What do you suggest? What do you mean about low threshold? |
I believe the |
Yeah this is why I think we need to since with the upstream version of universal chardet from Mozilla. I expect they have updated models and ratios that would be far more accurate. |
I completely agree. I'm working on a bunch of SKLL stuff at the moment, but I hope to get a chance to tackle this in the next month or so. |
Sorry for the confusion, but because so many people were targeting
Your pull request was unfortunately the only one to have the proper target of If you don't mind, could you please create a new PR with Also, @sigmavirus24, I've looked into the upstream changes a little bit, and there were very few substantial changes that I've found so far (except for them eliminating detection of a number of codecs that we would like to continue to support). |
I didn't commit README.rst changes. I leave it to developers.