Skip to content

Conversation

meg-huggingface
Copy link
Contributor

… to utilize.

@meg-huggingface meg-huggingface requested a review from sashavor May 10, 2022 05:15
@sashavor
Copy link
Contributor

Tagging @lvwerra since we haven't really discussed what the inputs and outputs of measurements should be (as opposed to metrics) -- maybe worth discussing, so once we start adding more, they all have a similar format?

@lvwerra
Copy link
Member

lvwerra commented May 10, 2022

Hi @meg-huggingface, thanks a lot for adding the data measurements! This is also connected to #18. A few general remarks about the PR:

Measurement API

Similar to how it is done in #34 I think it would be great if we establish a uniform interface for all measurements. Following metrics we could establish the following:

Metrics

from evaluate import load_metric

metric = load_metric("accuracy")
metric.compute(references=references, predictions=predictions)
>>> {"accuracy": 0.7}

Measurement

from evaluate import load_measure

measure = load_measure("npmi")
measure.compute(data=data)
>>> {"nmpi": 0.88}

A uniform interface will help users familiar with the metrics part of evaluate/datasets easily adapt to measurements.

Classes

It would simplify making the measurements uniform if they built on top of a class to metrics (see here). The above could be easiest achieved by creating a Measurement class that inherits what's currently called Metric. We probably want to refactor this a bit such that:

EvaluationModule #or whatever we want to call this
 |-- Metric
 |-- Comparison
 |-- Measurement

Where EvaluationModule is essentially Metric and then Metric as well as the other subclasses inherit from it.

Template

Once we have the class structure we could add a template for measurements in templates/ and expand the evaluate-cli create command to allow to create a new measurement and push it to the hub.

Miscellanious

  • For the README template of measurements we could probably build on @sashavor's template for metrics templates/{{ cookiecutter.metric_slug }}/README.md
  • Dependencies: I see that there is e.g. a streamlit dependency - is this necessary?
  • it would be good to use the evaluate logger for consistency

cc @lhoestq @douwekiela

@lvwerra
Copy link
Member

lvwerra commented May 23, 2022

Ok with you to close this PR and create smaller PRs like #44 for each measurement with the same structure? Also what do you think @sashavor?

@sashavor
Copy link
Contributor

Yes! Perfect :)

@lvwerra lvwerra closed this May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants