-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
This thread is a master thread for collecting problems and reports related to incorrect and/or problematic predictions of the pre-trained models.
Why a master thread instead of separate issues?
GitHub now supports pinned issues, which lets us create master threads more easily without them getting buried.
Users often report issues that come down to incorrect predictions made by the pre-trained statistical models. Those are all good and valid, and can include very useful test cases. However, having a lot of open issues around minor incorrect predictions across various languages also makes it more difficult to keep track of the reports. Unlike bug reports, they're much more difficult to action on. Sometimes, mistakes a model makes can indicate deeper problems that occurred during training or when preprocessing the data. Sometimes they can give us ideas for how to use data augmentation to make the models less sensitive to very small variations like punctuation or capitalisation.
Other times, it's just something we have to accept. A model that's 90% accurate will make a mistake on every 10th prediction. A model that's 99% accurate will be wrong once every 100 predictions.
The main reason we distribute pre-trained models is that it makes it easier for users to build their own systems by fine-tuning pre-trained models on their data. Of course, we want them to be as good as possible, and we're always optimising for the best compromise of speed, size and accuracy. But we won't be able to ship pre-trained models that are always correct on all data ever.
For many languages, we're also limited by the resources available, especially when it comes to data for named entity recognition. We've already made substantial investments into licensing training corpora, and we'll continue doing so (including running our own annotation projects with Prodigy ✨) – but this will take some time.
Reporting incorrect predictions in this thread
If you've come across suspicious predictions in the pre-trained models (tagger, parser, entity recognizer) or you want to contribute test cases for a given language, feel free to submit them here. (Test cases should be "fair" and useful for measuring the model's general accuracy, so single words, significant typos and very ambiguous parses aren't usually that helpful.)
You can check out our new models test suite for spaCy v2.1.0
to see the tests we're currently running.