Feature: compose multiple metrics into single object

Often models are evaluated on multiple metrics in a project. E.g. a classification project might always want to report the Accuracy, Precision, Recall, and F1 score. In `scikit-learn` one use the classification report for that which is widely used. This takes this a step further and allows the user to freely compose metrics. Similar to a `DatasetDict` one could use the `MetricSuite` like a `Metric` object. 

```Python
metrics_suite = MetricsSuite(
     {
        "accuray": load_metric("accuracy"),
        "recall": load_metric("recall")
     }
)

metrics_suite = MetricsSuite(
     {
        "bleu": load_metric("bleu"),
        "rouge": load_metric("rouge"),
        "perplexity": load_metric("perplexity")
     }
)

metrics_suite.add(predictions, references)
metrics_suite.compute()
>>> {"bleu": bleu_result_dict, "rouge": roughe_result_dict, "perplexity": perplexity_result_dict}
```

Alternatively, we could also flatten the return dict or have it as an option. We could also add a `summary` option that defines how an overall result is calculated. E.g. `summary="average"` averages all the metrics into a summary metric or a custom function with `summary=lambda x: x["bleu"]**2 + 0.5*["rouge"]+2`. This would allow to create simple, composed metrics without the needing to define a new metric (e.g. for a custom benchmark). 

cc @douwekiela @lewtun 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: compose multiple metrics into single object #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: compose multiple metrics into single object #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions