Skip to content

Add support for other languages for rouge #108

@alexyalunin

Description

@alexyalunin

I calculate rouge with

from datasets import load_metric
rouge = load_metric("rouge")
rouge_output = rouge.compute(predictions=['тест тест привет'], references=['тест тест пока'], rouge_types=[
    "rouge2"])["rouge2"].mid
print(rouge_output)

the result is
Score(precision=0.0, recall=0.0, fmeasure=0.0)
It seems like the rouge_score library that this metric uses filters all non-alphanueric latin characters
in rouge_scorer/tokenize.py with text = re.sub(r"[^a-z0-9]+", " ", six.ensure_str(text)).
Please add support for other languages.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions