Refactor for loading multiple evaluation categories

In addition to `Metric` we also want to add other types of evaluations such as `Comparison` (#34) and `Measurement` (#35) following the internal discussion (https://huggingface.slack.com/archives/C035S5G2J3D/p1652200198598789). Technically, these all behave the same way as they take some inputs and compute a scores. As such they could largely be one class (essentially what `Metric` is today). That means we could also load in the same fashion:

```
import evaluate

metric = evaluate.load("accuracy")
comparison = evaluate.load("mcnemar")
measure = evaluate.load("npmi")
```

While each type can live in a different folder on the repository this can cause name clashes when a name can be used for two methods (e.g. `perplexity` can be a metric and a measurement). This could be solved with an additional argument for like `load("perplexity", type="metric")` that resolves those conflicts. I think this would be fine.

However, there is a second conflict with Spaces: since each metric (or comparison/measurement) would have their own space with widget it is not so easy to resolve the conflicts here, unless we create an org for each type of metric: e.g. `evaluate-metrics`, `evaluate-comparisons`, `evaluate-measurements`. Then each evaluation type is pushed to a separate org. 

If that solution sounds good then we could implement the following behaviour:
- if no `type` is provided we cycle through metric/comparison/measurement and return the first result
- if a `type` is provided we only look for that one and raise an error if that type does not exist

What do you think @douwekiela @lhoestq?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor for loading multiple evaluation categories #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor for loading multiple evaluation categories #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions