Add metric for judging numeric relative error #459

r-barnes · 2021-06-20T16:40:57Z

Also alphabetizes some import lists

Unfortunately, I was unable to test this locally.

r-barnes · 2021-06-22T06:32:57Z

bigbench/api/test_task_metrics.py

+    targets, responses = zip(*test_data)
+    scores = metrics.exact_str_match_fn(targets=targets, responses=responses)
+
+    expected_scores = {"exact_str_match": 9/18}


Fixed, thanks!

r-barnes · 2021-06-22T06:32:53Z

bigbench/api/test_task_metrics.py

+        (0, "i am not a number"), # Bad
+    ]
+    targets, responses = zip(*test_data)
+    scores = metrics.exact_str_match_fn(targets=targets, responses=responses)


Fixed, thanks!

r-barnes · 2021-06-22T06:33:24Z

@RomanPlusPlus : I've fixed the issues you pointed out above and this should be ready for review again.

r-barnes · 2021-06-22T21:55:05Z

@RomanPlusPlus : All tests pass.

RomanPlusPlus · 2021-06-23T05:16:30Z

@r-barnes, I'm not an assigned reviewer, sorry for being unclear. Not sure why I'm listed as a reviewer in the Reviewers field. But thank you for the fixes anyway!

lewkowycz

Hello, thanks a lot for adding this! It is great to have submitters which add custom metrics! It looks great and I added some comments for minor modificatons.

bigbench/api/task_metrics.py

bigbench/api/test_task_metrics.py

r-barnes · 2021-06-26T22:02:11Z

@lewkowycz : Thanks for your review. I've made the changes you requested.

r-barnes · 2021-06-28T17:04:45Z

Fixed the metric's name in GENERATIVE_FN to numeric_match_with_0_1_relative_error with the most recent push.

Waiting for this to merge now.

r-barnes · 2021-06-28T17:18:46Z

@lewkowycz : Can we merge this after the tests pass? The tests for #372 won't pass until this lands and our reviewer is withholding acceptance until the tests pass.

lewkowycz · 2021-06-28T17:26:11Z

Yes, thanks for adding this again!

…etric Add metric for judging numeric relative error

google-cla bot added the cla: yes contributor license agreement: yes label Jun 20, 2021

r-barnes mentioned this pull request Jun 20, 2021

Benchmarks for ability to do math with physical units #275

Closed

RomanPlusPlus reviewed Jun 20, 2021

View reviewed changes

RomanPlusPlus reviewed Jun 21, 2021

View reviewed changes

r-barnes force-pushed the richard/add_numeric_rel_error_metric branch from 568b803 to 21175ac Compare June 22, 2021 06:32

r-barnes force-pushed the richard/add_numeric_rel_error_metric branch 4 times, most recently from b5ca56a to 4f0a818 Compare June 22, 2021 20:37

RomanPlusPlus approved these changes Jun 23, 2021

View reviewed changes

r-barnes mentioned this pull request Jun 23, 2021

Unit conversion #372

Merged

guygurari requested a review from lewkowycz June 24, 2021 19:34

lewkowycz suggested changes Jun 24, 2021

View reviewed changes

r-barnes force-pushed the richard/add_numeric_rel_error_metric branch from 4f0a818 to ec4e54b Compare June 26, 2021 22:02

lewkowycz approved these changes Jun 28, 2021

View reviewed changes

Add metric for judging numeric relative error

a6bb46d

r-barnes force-pushed the richard/add_numeric_rel_error_metric branch from ec4e54b to a6bb46d Compare June 28, 2021 17:03

lewkowycz merged commit dc11656 into google:main Jun 28, 2021

r-barnes deleted the richard/add_numeric_rel_error_metric branch June 28, 2021 23:51

Sohl-Dickstein pushed a commit that referenced this pull request Jun 29, 2021

Merge pull request #459 from r-barnes/richard/add_numeric_rel_error_m…

0d8936d

…etric Add metric for judging numeric relative error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add metric for judging numeric relative error #459

Add metric for judging numeric relative error #459

Uh oh!

r-barnes commented Jun 20, 2021 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

r-barnes Jun 22, 2021

Uh oh!

This comment was marked as resolved.

Uh oh!

r-barnes Jun 22, 2021

Uh oh!

r-barnes commented Jun 22, 2021

Uh oh!

r-barnes commented Jun 22, 2021

Uh oh!

RomanPlusPlus commented Jun 23, 2021

Uh oh!

lewkowycz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

r-barnes commented Jun 26, 2021

Uh oh!

r-barnes commented Jun 28, 2021

Uh oh!

r-barnes commented Jun 28, 2021

Uh oh!

lewkowycz commented Jun 28, 2021

Uh oh!

Uh oh!

Add metric for judging numeric relative error #459

Add metric for judging numeric relative error #459

Uh oh!

Conversation

r-barnes commented Jun 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

r-barnes Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

r-barnes Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

r-barnes commented Jun 22, 2021

Uh oh!

r-barnes commented Jun 22, 2021

Uh oh!

RomanPlusPlus commented Jun 23, 2021

Uh oh!

lewkowycz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

r-barnes commented Jun 26, 2021

Uh oh!

r-barnes commented Jun 28, 2021

Uh oh!

r-barnes commented Jun 28, 2021

Uh oh!

lewkowycz commented Jun 28, 2021

Uh oh!

Uh oh!

r-barnes commented Jun 20, 2021 •

edited

Loading