Physical intuition #405

sajantanand · 2021-06-02T00:15:23Z

We submit a task testing a model's physics intuition by posing it multiple-choice questions on chemical bonds, relativity, classical mechanics, fundamental forces, and atomic physics.

google-cla · 2021-06-02T00:15:57Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

edonoway · 2021-06-02T00:23:41Z

@googlebot I fixed it.

chiafullo · 2021-06-16T19:39:41Z

@sajantanand We are now requesting that task authors please include an explicit section in the README.md file called "data source", listing exactly where you got your data from (or explicitly state that you made it up yourself). Many thanks!

chiafullo · 2021-06-17T17:57:56Z

@Alicia-Parrish are one of the assigned reviewers for this task. Please provide feedback and be sure to notify me with a formal accept or does not meet criteria by Tuesday June 22 (if you have not already done so).

chiafullo · 2021-06-17T17:58:54Z

@aletheap are one of the assigned reviewers for this task. Please provide feedback and be sure to notify me with a formal accept or does not meet criteria by Tuesday June 22 (if you have not already done so).

Alicia-Parrish · 2021-06-18T18:25:14Z

Hi! I'm one of the reviewers for this task. Overall, it looks very well-put together and well justified. Below is my full review:

Correctness: The task.json file appears correctly formatted, but the workflow is still awaiting approval to run and see if it passes all the checks. It appears to work in the provided colab notebook.

Formatting: The task is formatted in a way that is easy for humans to read and interpret.

Specificity: The justification is clear and makes the task well-motivated. One small point, in the technical details section, I was a little confused by the framing "The model combines each answer with the question to form a complete, grammatically correct sentence," as this seems to only apply to the subset of examples that are sentence completions.

Thoroughness: The task seems to control for some possible confounds by introducing near minimal pairs of examples (using questions with both 'diamonds' and 'diamond gemstones' for an otherwise idential question, for example).

Difficulty: The authors report that current language models perform at around chance on the task.

Not solvable by memorizing the Internet: Some of these phrasings sound very much like textbook/quiz questions, but they are mostly phrasings that do not already exist on the internet, as far as I can tell. There are a few exceptions, for example the exact phrase "The bonds between atoms in a water molecule are covalent bond" forms the answer to one of the test questions and exists on answers.com. However, on the whole, the task requires some consolidation of knowledge, so I think most of the questions are not solvable by memorization.

Novelty: To my knowledge, this task is novel.

Justification: The readme clearly explains what the task intends to measure, and the choice of examples matches with the stated goals. The design considerations section was very clear and helpful in understanding how the examples were constructed.

Size: The size is within the specified limits, but would likely be more robust with more examples.

Compute resources: I expect no practical challenges due to limited computational resources with this task.

Other: Typos in task.json: line 415 'potasium' -> 'potassium'; line 312 'hm/h' -> 'km/h'

@chiafullo accept

sajantanand · 2021-06-19T06:32:14Z

Hi Alicia, thanks for the thorough review! I've fixed the typos and edited the README to be more clear about the two different types of questions, multiple choice and sentence completion.

chiafullo · 2021-06-22T20:24:03Z

The amount of tasks we received this round is tremendous. With so many tasks to review we have decided to extend the review period to Tuesday, June 29.

Reviewers: if the submitter has made revisions please be sure to notify me with a formal ”accept” or ”does not meet criteria” by Tuesday, June 29 (if you haven't already done so).

this is an automated message

sajantanand · 2021-06-29T17:18:55Z

@aletheap Just wanted to check if you will have a chance to review this task before the deadline. Thanks!

sajantanand · 2021-07-02T17:44:45Z

@chiafullo since we only got one review, is there anything I need to do so that this task can be merged?

chiafullo · 2021-07-07T20:26:06Z

Your task has been accepted in the initial review stage. The next stage of this process is for us to assign your task a meta-reviewer for final review and merge. An assigned meta-reviewer will follow-up by commenting on the PR should it need further revisions.

cdfreeman-google · 2021-07-20T16:08:45Z

bigbench/benchmark_tasks/physical_intuition/task.json

+  {
+      "input": "Helicopters create lift by exerting a force on air, pushing it down. In which direction does the air exert a force on the helicopter?",
+      "target_scores": {
+        "Up": 0,


I believe this should be "Up"

cdfreeman-google · 2021-07-20T16:14:08Z

bigbench/benchmark_tasks/physical_intuition/task.json

+      }
+    },
+  {
+      "input": "I am holding a ball and then let go. Which direction does the ball travel?",


these two assume some extra (albeit common sensical) constraints implicit to the question---i.e., that the holder is undergoing acceleration, and that their feet are in the direction of the acceleration. I'm tentatively okay with this, but depending on how precise you want to be, an "indeterminate" answer might be better, here.

cdfreeman-google · 2021-07-20T16:15:52Z

bigbench/benchmark_tasks/physical_intuition/task.json

+      }
+    },
+  {
+      "input": "I throw a ball. What direction is the ball moving right before it hits the ground?",


this is also technically down and to the right, and could, in many cases, still be moving faster to the right than it is moving down.

cdfreeman-google · 2021-07-20T16:28:26Z

Hello, I'm the meta-reviewer for this task!

This is a fun common sense task. I've made a couple of nit comments in the json file, but otherwise this is ready for merging, which I'll do shortly. Please handle these nits at your convenience (just add me as a reviewer to a PR where you make these changes) :)

sajantanand · 2021-07-29T05:08:30Z

@cdfreeman-google I have opened a PR #497 to address these errors in our questions. Thanks for the thorough review!

sajantanand and others added 16 commits June 1, 2021 10:07

Starting new physical intuition branch.

7a7580d

Update task.json

ba91ea2

Removing blank space.

e967fef

Fixing formatting issues.

3e70fbf

Update README.md

ffd4e3d

Update task.json

91635de

Update README.md

24ce69d

Update README.md

8503d45

Update README.md

6c84b06

Update README.md

89af5b2

Update README.md

9f06e1d

Update task.json

a6d2889

Update README.md

e0e30d7

Update README.md

4bf21d7

Update README.md

7c0f88f

Merge branch 'google:main' into physical_intuition

446856d

google-cla bot added the cla: no label Jun 2, 2021

google-cla bot added cla: yes contributor license agreement: yes and removed cla: no labels Jun 2, 2021

edonoway and others added 3 commits June 1, 2021 18:02

Update README.md

27135ca

Update task.json

e37f183

Fix optional JSON parameter.

79e0561

chiafullo added the task submission label Jun 2, 2021

Data Source.

591475d

Alicia Review

c89cb2a

Sohl-Dickstein force-pushed the main branch 2 times, most recently from 3fcd8da to 0afe508 Compare June 29, 2021 23:05

cdfreeman-google reviewed Jul 20, 2021

View reviewed changes

cdfreeman-google approved these changes Jul 20, 2021

View reviewed changes

cdfreeman-google merged commit 1e9f60f into google:main Jul 20, 2021

sajantanand mentioned this pull request Jul 29, 2021

Addressing meta-reviewer's comments on physical_intuition task #497

Merged

Physical intuition #405

Physical intuition #405

Uh oh!

Conversation

sajantanand commented Jun 2, 2021

Uh oh!

google-cla bot commented Jun 2, 2021

Uh oh!

edonoway commented Jun 2, 2021

Uh oh!

chiafullo commented Jun 16, 2021

Uh oh!

chiafullo commented Jun 17, 2021

Uh oh!

chiafullo commented Jun 17, 2021

Uh oh!

Alicia-Parrish commented Jun 18, 2021

Uh oh!

sajantanand commented Jun 19, 2021

Uh oh!

chiafullo commented Jun 22, 2021

Uh oh!

sajantanand commented Jun 29, 2021

Uh oh!

sajantanand commented Jul 2, 2021

Uh oh!

chiafullo commented Jul 7, 2021

Uh oh!

cdfreeman-google Jul 20, 2021

Choose a reason for hiding this comment

Uh oh!

cdfreeman-google Jul 20, 2021

Choose a reason for hiding this comment

Uh oh!

cdfreeman-google Jul 20, 2021

Choose a reason for hiding this comment

Uh oh!

cdfreeman-google commented Jul 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sajantanand commented Jul 29, 2021

Uh oh!

Uh oh!

cdfreeman-google commented Jul 20, 2021 •

edited

Loading