Skip to content

Conversation

juliensimon
Copy link

Although Evaluator.compute() has a **compute_parameters argument for metric-specific parameters, it's not passed to TextClassificationEvaluator.compute(), e.g.:

eval = evaluator("text-classification")
results = eval.compute(
    model_or_pipeline=model,
    tokenizer=tokenizer,
    data=eval_dataset, 
    metric=evaluate.load("f1"), 
    label_column=label_column,
    label_mapping=label_mapping,
    average=None
)
results

TypeError: compute() got an unexpected keyword argument 'average'

This PR fixes this, e.g.:

results = eval.compute(
    model_or_pipeline=model,
    tokenizer=tokenizer,
    data=eval_dataset, 
    metric=evaluate.load("f1"), 
    label_column=label_column,
    label_mapping=label_mapping,
    average=None
)
results

{'f1': array([0.61904762, 0.55172414, 0.64705882, 0.50574713, 0.80978261])}

@juliensimon juliensimon added the enhancement New feature or request label Jun 13, 2022
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@lvwerra
Copy link
Member

lvwerra commented Jun 14, 2022

Thanks @juliensimon for working on this! Indeed, runtime kwargs for evaluations are an issue. However, I think adding them as the trailing kwargs of the compute call is not so transparent: How does the user know that this is not a kwarg that should go to the pipeline? I can see two options to solve this:

  1. Add a metric_kwarg kwarg that takes a dictionary and passes it to the metric.compute.
  2. Extend the evaluate.load function to take a default_setting that can be used to overwrite the defaults in the compute signature. This would also help in other instances where we for example compose multiple metrics together.

Also curious to hear what @lhoestq thinks.

@lhoestq
Copy link
Member

lhoestq commented Jun 14, 2022

Or simply not have additional arguments in metric.compute, and pass them in evaluate.load instead ? ^^

It would maybe be simpler this way in terms of API

(if it's not possible option 1. is clearer IMO)

Julien Simon added 2 commits June 15, 2022 14:54
- Pass metric arguments in load() instead and propagate them
@lvwerra
Copy link
Member

lvwerra commented Jun 15, 2022

Also been thinking about this. I think it would be much clearer if all config is done within load and if users want to use different configs they could just load two instances.

This would require fixing several metrics (and measurments/comparisons) but should be relatively low effort. Could also use this opportunity to make the configuration better accessible as requested in #138.

@juliensimon
Copy link
Author

OK, here's another version, which I believe preserves pipeline args. Now I can do:

metric = evaluate.load("f1", average=None)
eval = evaluator("text-classification")
results = eval.compute(
    model_or_pipeline=model,
    tokenizer=tokenizer,
    data=eval_dataset, 
    metric=evaluate.load("f1"), 
    label_column=label_column,
    label_mapping=label_mapping,
)
results
{'f1': array([0.61904762, 0.55172414, 0.64705882, 0.50574713, 0.80978261])}

@lvwerra
Copy link
Member

lvwerra commented Jul 7, 2022

Closing this PR for now as we'll likely fix this with #169. Thanks @juliensimon for working on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants