-
Notifications
You must be signed in to change notification settings - Fork 289
Open
Description
Using evaluate.combine
, some kwargs seem not to get passed to the sub metrics resulting in incorrect outputs.
Using the examples from precision
import evaluate
metric1 = evaluate.load('precision')
metric2 = evaluate.combine(['precision'])
print(metric1.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], pos_label=0))
print(metric2.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], pos_label=0))
Out:
{'precision': 0.6666666666666666}
{'precision': 0.5}
0.666... is the correct answer
import evaluate
metric1 = evaluate.load('precision')
metric2 = evaluate.combine(['precision'])
print(metric1.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], sample_weight=[0.9, 0.5, 3.9, 1.2, 0.3]))
print(metric2.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], sample_weight=[0.9, 0.5, 3.9, 1.2, 0.3]))
Out:
{'precision': 0.23529411764705882}
{'precision': 0.5}
0.235... is the correct answer
This issue occurred with all metrics I tried (precision, recall and F1).
Perhaps I am using the function incorrectly, but if so this behaviour was very surprising to me.
Mac OS 13.2 on M1, Python 3.10.9, evaluate 0.4.0
cegme
Metadata
Metadata
Assignees
Labels
No labels