Pass metric-specific parameters to TextClassificationEvaluator #137

juliensimon · 2022-06-13T10:32:44Z

Although Evaluator.compute() has a **compute_parameters argument for metric-specific parameters, it's not passed to TextClassificationEvaluator.compute(), e.g.:

eval = evaluator("text-classification")
results = eval.compute(
    model_or_pipeline=model,
    tokenizer=tokenizer,
    data=eval_dataset, 
    metric=evaluate.load("f1"), 
    label_column=label_column,
    label_mapping=label_mapping,
    average=None
)
results

TypeError: compute() got an unexpected keyword argument 'average'

This PR fixes this, e.g.:

results = eval.compute(
    model_or_pipeline=model,
    tokenizer=tokenizer,
    data=eval_dataset, 
    metric=evaluate.load("f1"), 
    label_column=label_column,
    label_mapping=label_mapping,
    average=None
)
results

{'f1': array([0.61904762, 0.55172414, 0.64705882, 0.50574713, 0.80978261])}

…r.evaluate()

HuggingFaceDocBuilderDev · 2022-06-13T10:35:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

lvwerra · 2022-06-14T08:10:14Z

Thanks @juliensimon for working on this! Indeed, runtime kwargs for evaluations are an issue. However, I think adding them as the trailing kwargs of the compute call is not so transparent: How does the user know that this is not a kwarg that should go to the pipeline? I can see two options to solve this:

Add a metric_kwarg kwarg that takes a dictionary and passes it to the metric.compute.
Extend the evaluate.load function to take a default_setting that can be used to overwrite the defaults in the compute signature. This would also help in other instances where we for example compose multiple metrics together.

Also curious to hear what @lhoestq thinks.

lhoestq · 2022-06-14T16:24:37Z

Or simply not have additional arguments in metric.compute, and pass them in evaluate.load instead ? ^^

It would maybe be simpler this way in terms of API

(if it's not possible option 1. is clearer IMO)

- Pass metric arguments in load() instead and propagate them

lvwerra · 2022-06-15T12:59:20Z

Also been thinking about this. I think it would be much clearer if all config is done within load and if users want to use different configs they could just load two instances.

This would require fixing several metrics (and measurments/comparisons) but should be relatively low effort. Could also use this opportunity to make the configuration better accessible as requested in #138.

juliensimon · 2022-06-15T13:00:37Z

OK, here's another version, which I believe preserves pipeline args. Now I can do:

metric = evaluate.load("f1", average=None)
eval = evaluator("text-classification")
results = eval.compute(
    model_or_pipeline=model,
    tokenizer=tokenizer,
    data=eval_dataset, 
    metric=evaluate.load("f1"), 
    label_column=label_column,
    label_mapping=label_mapping,
)
results
{'f1': array([0.61904762, 0.55172414, 0.64705882, 0.50574713, 0.80978261])}

lvwerra · 2022-07-07T09:18:50Z

Closing this PR for now as we'll likely fix this with #169. Thanks @juliensimon for working on this!

Pass metric-specific compute parameters to TextClassificationEvaluato…

63f9cf3

…r.evaluate()

juliensimon added the enhancement New feature or request label Jun 13, 2022

Julien Simon added 2 commits June 15, 2022 14:54

- Revert change to compute() method

bdc1610

- Pass metric arguments in load() instead and propagate them

Revert change to compute() method

f73d45f

lvwerra mentioned this pull request Jun 29, 2022

Move kwargs from compute to a config pass during load #169

Closed

lvwerra closed this Jul 7, 2022

lvwerra mentioned this pull request Jul 14, 2022

refactor evaluator base and task classes #185

Merged

christian-storm mentioned this pull request May 20, 2023

Merged PR #425 doesn't properly address the issue of passing metric specific kwargs to compute or load #462

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass metric-specific parameters to TextClassificationEvaluator #137

Pass metric-specific parameters to TextClassificationEvaluator #137

Uh oh!

juliensimon commented Jun 13, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Jun 13, 2022

Uh oh!

lvwerra commented Jun 14, 2022

Uh oh!

lhoestq commented Jun 14, 2022

Uh oh!

lvwerra commented Jun 15, 2022

Uh oh!

juliensimon commented Jun 15, 2022

Uh oh!

lvwerra commented Jul 7, 2022

Uh oh!

Uh oh!

Pass metric-specific parameters to TextClassificationEvaluator #137

Pass metric-specific parameters to TextClassificationEvaluator #137

Uh oh!

Conversation

juliensimon commented Jun 13, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Jun 13, 2022

Uh oh!

lvwerra commented Jun 14, 2022

Uh oh!

lhoestq commented Jun 14, 2022

Uh oh!

lvwerra commented Jun 15, 2022

Uh oh!

juliensimon commented Jun 15, 2022

Uh oh!

lvwerra commented Jul 7, 2022

Uh oh!

Uh oh!