-
Notifications
You must be signed in to change notification settings - Fork 611
Physical intuition #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Physical intuition #405
Conversation
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
@googlebot I fixed it. |
@sajantanand We are now requesting that task authors please include an explicit section in the README.md file called "data source", listing exactly where you got your data from (or explicitly state that you made it up yourself). Many thanks! |
@Alicia-Parrish are one of the assigned reviewers for this task. Please provide feedback and be sure to notify me with a formal accept or does not meet criteria by Tuesday June 22 (if you have not already done so). |
@aletheap are one of the assigned reviewers for this task. Please provide feedback and be sure to notify me with a formal accept or does not meet criteria by Tuesday June 22 (if you have not already done so). |
Hi! I'm one of the reviewers for this task. Overall, it looks very well-put together and well justified. Below is my full review: Correctness: The Formatting: The task is formatted in a way that is easy for humans to read and interpret. Specificity: The justification is clear and makes the task well-motivated. One small point, in the technical details section, I was a little confused by the framing "The model combines each answer with the question to form a complete, grammatically correct sentence," as this seems to only apply to the subset of examples that are sentence completions. Thoroughness: The task seems to control for some possible confounds by introducing near minimal pairs of examples (using questions with both 'diamonds' and 'diamond gemstones' for an otherwise idential question, for example). Difficulty: The authors report that current language models perform at around chance on the task. Not solvable by memorizing the Internet: Some of these phrasings sound very much like textbook/quiz questions, but they are mostly phrasings that do not already exist on the internet, as far as I can tell. There are a few exceptions, for example the exact phrase "The bonds between atoms in a water molecule are covalent bond" forms the answer to one of the test questions and exists on answers.com. However, on the whole, the task requires some consolidation of knowledge, so I think most of the questions are not solvable by memorization. Novelty: To my knowledge, this task is novel. Justification: The readme clearly explains what the task intends to measure, and the choice of examples matches with the stated goals. The design considerations section was very clear and helpful in understanding how the examples were constructed. Size: The size is within the specified limits, but would likely be more robust with more examples. Compute resources: I expect no practical challenges due to limited computational resources with this task. Other: Typos in @chiafullo accept |
Hi Alicia, thanks for the thorough review! I've fixed the typos and edited the README to be more clear about the two different types of questions, multiple choice and sentence completion. |
The amount of tasks we received this round is tremendous. With so many tasks to review we have decided to extend the review period to Tuesday, June 29. Reviewers: if the submitter has made revisions please be sure to notify me with a formal ”accept” or ”does not meet criteria” by Tuesday, June 29 (if you haven't already done so). this is an automated message |
@aletheap Just wanted to check if you will have a chance to review this task before the deadline. Thanks! |
3fcd8da
to
0afe508
Compare
@chiafullo since we only got one review, is there anything I need to do so that this task can be merged? |
Your task has been accepted in the initial review stage. The next stage of this process is for us to assign your task a meta-reviewer for final review and merge. An assigned meta-reviewer will follow-up by commenting on the PR should it need further revisions. |
{ | ||
"input": "Helicopters create lift by exerting a force on air, pushing it down. In which direction does the air exert a force on the helicopter?", | ||
"target_scores": { | ||
"Up": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be "Up"
} | ||
}, | ||
{ | ||
"input": "I am holding a ball and then let go. Which direction does the ball travel?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two assume some extra (albeit common sensical) constraints implicit to the question---i.e., that the holder is undergoing acceleration, and that their feet are in the direction of the acceleration. I'm tentatively okay with this, but depending on how precise you want to be, an "indeterminate" answer might be better, here.
} | ||
}, | ||
{ | ||
"input": "I throw a ball. What direction is the ball moving right before it hits the ground?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also technically down and to the right, and could, in many cases, still be moving faster to the right than it is moving down.
Hello, I'm the meta-reviewer for this task! This is a fun common sense task. I've made a couple of nit comments in the json file, but otherwise this is ready for merging, which I'll do shortly. Please handle these nits at your convenience (just add me as a reviewer to a PR where you make these changes) :) |
@cdfreeman-google I have opened a PR #497 to address these errors in our questions. Thanks for the thorough review! |
We submit a task testing a model's physics intuition by posing it multiple-choice questions on chemical bonds, relativity, classical mechanics, fundamental forces, and atomic physics.