Skip to content

Conversation

sashavor
Copy link
Contributor

.... but there's a bug that's driving me mad, @lvwerra can you help? it's probably something silly but I can't figure it out 🙈

When I call compute(), it says:

>>> lengths.compute('hello')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: compute() takes 1 positional argument but 2 were given

But there is a single argument, why is it saying I'm giving 2?...

@sashavor sashavor requested a review from lvwerra May 17, 2022 15:46
@sashavor sashavor marked this pull request as draft May 17, 2022 15:46
@lvwerra
Copy link
Member

lvwerra commented May 17, 2022

I think it should be:

>>> lengths.compute(texts=['hello'])

Does that work?

@sashavor
Copy link
Contributor Author

It gives another error:

>>> lengths.compute(texts=['hello'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sasha/Documents/HuggingFace/evaluate/src/evaluate/module.py", line 427, in compute
    self._finalize()
  File "/home/sasha/Documents/HuggingFace/evaluate/src/evaluate/module.py", line 379, in _finalize
    file_paths, filelocks = self._get_all_cache_files()
  File "/home/sasha/Documents/HuggingFace/evaluate/src/evaluate/module.py", line 296, in _get_all_cache_files
    raise ValueError(
ValueError: Evaluation module cache file doesn't exist. Please make sure that you call `add` or `add_batch` at least once before calling `compute`.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 17, 2022

The documentation is not available anymore as the PR was closed or merged.

@lvwerra
Copy link
Member

lvwerra commented May 17, 2022

I think this is the culprit:

features=datasets.Features({
                'texts': datasets.Value('string'),
            })

The feature names have to match the _compute args.

@sashavor
Copy link
Contributor Author

Oh I didn't know that! Thank you for pointing it out.

Are you ok with returning both average_length and all the individual lengths? I think both would potentially be useful as measurements:

{'average_length': 273.66, 'all lengths': [335, 253, 121, 157, 424, 142, 129, 359, 600, 268, 331, 157, 152, 166, 436, 240, 89, 1078, 95, 214, 259, 182, 230, 393, 136, 241, 309, 135, 281, 479, 183, 179, 170, 244, 475, 141, 174, 344, 272, 140, 156, 148, 994, 253, 663, 163, 143, 136, 164, 150]}

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @sashavor!

I left a few comments on the script. In addition it would be great if you could add a README.md, requirements.txt, and app.py. You can have a look at the template for that.

"""Returns the lengths"""
lengths = [len(word_tokenize(text)) for text in texts]
average_length = mean(lengths)
return {"average_length": average_length, "all lengths":lengths}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favour of keeping the outputs simple where possible and just return aggregated score. If somebody wanted the score per sample they could still do:

for text in texts:
  score =  measure.compute([text])
  ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so it would only work with 1 input string, not a list?

Copy link
Member

@lvwerra lvwerra May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what I mean is if you want the aggregate statistics you do:

score = measure.compute(all_texts)

If you really need the result per samples you could do:

for text in texts:
  score =  measure.compute([text])
  ...

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you propose only returning the average_length?

@lvwerra lvwerra marked this pull request as ready for review May 18, 2022 16:25
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments. LGTM 🚀

@sashavor sashavor merged commit ac19337 into main May 19, 2022
@lvwerra lvwerra mentioned this pull request May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants