Skip to content

Conversation

mauryaland
Copy link
Contributor

Fixing inconsistency between the couple (best_path, best_scores) and scores in the SequenceTagger model using the CRF for predictions.

Indeed, what is happening sometimes is that the tag with the highest score is not the same between scores and the pair (best_path, best_scores). This PR should fix this behavior.

@alanakbik
Copy link
Collaborator

Hello @mauryaland thanks for the PR! Could you paste a minimal code example for instance with an example sentence and one of the public NER models to help me better understand the incosistency error?

@mauryaland
Copy link
Contributor Author

Hello @alanakbik, here is an example:

from flair.data import Sentence
from flair.models SequenceTagger

tagger = SequenceTagger.load('fr-ner')
sentence = Sentence("La France est un beau pays à parcourir.")

tagger.predict(sentence)

print(sentence.get_spans('ner'))
>>> [<LOC-span (1,2): "La France">]

for token in sentence:
    print(f"{token} {token.get_tag('ner')} {token.tags_proba_dist} \n\n")

>>> Token: 1 La I-LOC (0.7660126090049744) {'ner': [<unk> (0.0008465868304483593), O (0.7660126090049744), I-PER (0.010933118872344494), I-LOC (0.20101521909236908), I-ORG (0.004218324087560177), I-MISC (0.010774117894470692), B-PER (0.00022157360217534006), B-LOC (0.0013388736406341195), B-MISC (0.0003332003252580762), B-ORG (0.000203109928406775), <START> (0.0), <STOP> (0.004103234503418207)]} 


Token: 2 France I-LOC (0.9984954595565796) {'ner': [<unk> (9.132213563134428e-08), O (8.645388849259916e-10), I-PER (0.0001087988493964076), I-LOC (0.9984954595565796), I-ORG (0.0005726731033064425), I-MISC (0.0007387555087916553), B-PER (2.118784436788701e-07), B-LOC (5.522951323655434e-05), B-MISC (6.367911709048713e-08), B-ORG (9.927322963676488e-08), <START> (0.0), <STOP> (2.8619269869523123e-05)]} 

....

As we can see, using the method to get scores of every tag, the best score is assigned to the tag "O" (from variables scores in the source code). However, the best_score (variable from the code source) is coupled with the best_path, in this case "I-LOC".

I hope that my example is clear. It is happening with some models we trained on our own tags too.

@alanakbik
Copy link
Collaborator

@mauryaland thanks for adding the example!

@alanakbik
Copy link
Collaborator

👍

1 similar comment
@yosipk
Copy link
Collaborator

yosipk commented Aug 5, 2019

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants