Skip to content

Feature: compose multiple metrics into single object #8

@lvwerra

Description

@lvwerra

Often models are evaluated on multiple metrics in a project. E.g. a classification project might always want to report the Accuracy, Precision, Recall, and F1 score. In scikit-learn one use the classification report for that which is widely used. This takes this a step further and allows the user to freely compose metrics. Similar to a DatasetDict one could use the MetricSuite like a Metric object.

metrics_suite = MetricsSuite(
     {
        "accuray": load_metric("accuracy"),
        "recall": load_metric("recall")
     }
)

metrics_suite = MetricsSuite(
     {
        "bleu": load_metric("bleu"),
        "rouge": load_metric("rouge"),
        "perplexity": load_metric("perplexity")
     }
)

metrics_suite.add(predictions, references)
metrics_suite.compute()
>>> {"bleu": bleu_result_dict, "rouge": roughe_result_dict, "perplexity": perplexity_result_dict}

Alternatively, we could also flatten the return dict or have it as an option. We could also add a summary option that defines how an overall result is calculated. E.g. summary="average" averages all the metrics into a summary metric or a custom function with summary=lambda x: x["bleu"]**2 + 0.5*["rouge"]+2. This would allow to create simple, composed metrics without the needing to define a new metric (e.g. for a custom benchmark).

cc @douwekiela @lewtun

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions