Skip to content

Conversation

xxman-google
Copy link
Contributor

@xxman-google xxman-google commented Jun 26, 2025

What does this PR do ?

Supports evaluation of multiple-choice benchmarks, such as GPQA/MMLU/etc.

Issues

N/A

Usage

  • Run evaluation on GPQA as follows:
uv run examples/run_eval.py --config examples/configs/evals/gpqa_eval.yaml

On GPQA, Qwen2.5-7B model scores 32.0 on average over 3 runs.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

@terrykong terrykong requested a review from yuki-97 June 26, 2025 16:25
Copy link
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for contributing, overall LGTM. Left some comments.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 27, 2025
Copy link
Contributor Author

@xxman-google xxman-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. All comments are addressed. PTAL agian.

@yuki-97
Copy link
Contributor

yuki-97 commented Jun 30, 2025

Thanks @xxman-google , all changes LGTM!

Could you also update the doc?

  1. Add a list of currently supported datasets in docs/guides/eval.md.
  2. Add brief examples of how to use supported datasets and local datasets at Run the Evaluation Script.
    ## Run the Evaluation Script
  3. Update the outdated examples.

    RL/docs/guides/eval.md

    Lines 54 to 65 in cfb803d

    # Override specific config values via command line
    # Example: Evaluation of DeepScaleR-1.5B-Preview on MATH-500 using 8 GPUs
    # Pass@1 accuracy averaged over 16 samples for each problem
    uv run python examples/run_eval.py \
    generation.model_name=agentica-org/DeepScaleR-1.5B-Preview \
    generation.temperature=0.6 \
    generation.top_p=0.95 \
    generation.vllm_cfg.max_model_len=32768 \
    data.dataset_name=HuggingFaceH4/MATH-500 \
    data.dataset_key=test \
    eval.num_tests_per_prompt=16 \
    cluster.gpus_per_node=8

@yuki-97
Copy link
Contributor

yuki-97 commented Jun 30, 2025

Also, your PR appears to have a DCO error. It looks like this commit 42155d9 not signed.

@xxman-google
Copy link
Contributor Author

Thanks @yuki-666 . Docs are updated and DCO erro is also fixed.

@yuki-97
Copy link
Contributor

yuki-97 commented Jun 30, 2025

@xxman-google Seems there is a conflict? Could you try merge main again to see if the conflict could be solved?

This branch has conflicts that must be resolved
Use the command line to resolve conflicts before continuing.

examples/configs/eval.yaml

xxman-google and others added 21 commits June 30, 2025 06:06
Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong xxman@google.com
Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong xxman@google.com
Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
…IA-NeMo#565)

Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
Signed-off-by: Xuehan <xxman@google.com>
…eMo#494)

Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong xxman@google.com
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
…IDIA-NeMo#516)

Signed-off-by: Luis Vega <vegaluisjose@users.noreply.github.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
…NeMo#570)

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
)

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
…req (NVIDIA-NeMo#574)

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
@parthchadha parthchadha enabled auto-merge July 2, 2025 18:00
Signed-off-by: Xuehan Xiong <xxman@google.com>
auto-merge was automatically disabled July 2, 2025 19:57

Head branch was pushed to by a user without write access

butsugiri and others added 10 commits July 2, 2025 21:20
Signed-off-by: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Xuehan <xxman@google.com>
@github-actions github-actions bot added the CI Relating to CI label Jul 2, 2025
Signed-off-by: Xuehan <xxman@google.com>
@github-actions github-actions bot removed the CI Relating to CI label Jul 2, 2025
@parthchadha parthchadha enabled auto-merge July 2, 2025 21:39
@parthchadha parthchadha added this pull request to the merge queue Jul 2, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jul 2, 2025
@parthchadha parthchadha added this pull request to the merge queue Jul 2, 2025
Merged via the queue into NVIDIA-NeMo:main with commit 3f6d52f Jul 3, 2025
13 of 14 checks passed
@xxman-google xxman-google deleted the xx/new_eval branch July 3, 2025 02:59
therealnaveenkamal pushed a commit to therealnaveenkamal/RL that referenced this pull request Jul 7, 2025
)

Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong xxman@google.com
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Luis Vega <vegaluisjose@users.noreply.github.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Dheeraj Peri <peri.dheeraj@gmail.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Yuki Huang <scsse_hqh2016@126.com>
Signed-off-by: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Zhaocheng Zhu <zhaochengz@nvidia.com>
Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Co-authored-by: Luis Vega <vegaluisjose@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Dheeraj Peri <peri.dheeraj@gmail.com>
Co-authored-by: Yuki Huang <scsse_hqh2016@126.com>
Co-authored-by: Shun Kiyono <shunk52@gmail.com>
Co-authored-by: Wei Du <wedu@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: atfujita <40932835+AtsunoriFujita@users.noreply.github.com>
YzjiaoNvd pushed a commit to YzjiaoNvd/NeMo-RL that referenced this pull request Jul 14, 2025
)

Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong xxman@google.com
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Luis Vega <vegaluisjose@users.noreply.github.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Dheeraj Peri <peri.dheeraj@gmail.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Yuki Huang <scsse_hqh2016@126.com>
Signed-off-by: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Zhaocheng Zhu <zhaochengz@nvidia.com>
Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Co-authored-by: Luis Vega <vegaluisjose@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Dheeraj Peri <peri.dheeraj@gmail.com>
Co-authored-by: Yuki Huang <scsse_hqh2016@126.com>
Co-authored-by: Shun Kiyono <shunk52@gmail.com>
Co-authored-by: Wei Du <wedu@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: atfujita <40932835+AtsunoriFujita@users.noreply.github.com>
jialei777 pushed a commit to jialei777/nemo-rl that referenced this pull request Jul 23, 2025
)

Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong xxman@google.com
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Luis Vega <vegaluisjose@users.noreply.github.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Dheeraj Peri <peri.dheeraj@gmail.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Yuki Huang <scsse_hqh2016@126.com>
Signed-off-by: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Zhaocheng Zhu <zhaochengz@nvidia.com>
Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Co-authored-by: Luis Vega <vegaluisjose@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Dheeraj Peri <peri.dheeraj@gmail.com>
Co-authored-by: Yuki Huang <scsse_hqh2016@126.com>
Co-authored-by: Shun Kiyono <shunk52@gmail.com>
Co-authored-by: Wei Du <wedu@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: atfujita <40932835+AtsunoriFujita@users.noreply.github.com>
Signed-off-by: Jialei Chen <jialeic@google.com>
KiddoZhu added a commit that referenced this pull request Jul 28, 2025
Signed-off-by: Xuehan Xiong <xxman@google.com>
Signed-off-by: Xuehan <xxman@google.com>
Signed-off-by: Xuehan Xiong xxman@google.com
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: KiddoZhu <zhaochengz@nvidia.com>
Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Luis Vega <vegaluisjose@users.noreply.github.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Dheeraj Peri <peri.dheeraj@gmail.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Yuki Huang <scsse_hqh2016@126.com>
Signed-off-by: Shun Kiyono <shun.kiyono@sbintuitions.co.jp>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Atsunori Fujita <afujita@nvidia.com>
Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Zhaocheng Zhu <zhaochengz@nvidia.com>
Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Co-authored-by: Luis Vega <vegaluisjose@users.noreply.github.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Dheeraj Peri <peri.dheeraj@gmail.com>
Co-authored-by: Yuki Huang <scsse_hqh2016@126.com>
Co-authored-by: Shun Kiyono <shunk52@gmail.com>
Co-authored-by: Wei Du <wedu@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: atfujita <40932835+AtsunoriFujita@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation external
Projects
None yet
Development

Successfully merging this pull request may close these issues.