feat: Fix CPU offloading + add options for FSDP offload and activation ckpting #123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

SahilJain314 merged 30 commits into main from yifu/cpu_offload2

Apr 16, 2025

Contributor

yfw commented Apr 2, 2025 •

edited

Loading

What does this PR do ?

Addresses #33 (FSDP1) and #67 by:

Fixing the cpu offloading implementation so that HFPolicyWorker's memory is closer to 0 during VLLM generation.
Adding an option for using FSDP's built-in cpu offloading. With FSDP cpu offloading enabled, only the forward and backward passes are done on the GPU. Everything else, including the optimizer step update, is done on the CPU.
Adding the ability to do activation checkpointing.

To test the impact of these settings, I did a sweep across different models, context lengths, and offload types. The results are here: https://docs.google.com/spreadsheets/d/1lWtw6-jbq4TAlM5Bu5Z3jNQjwAa_CEUX3bsKuMXHEkM/edit?usp=sharing

Update: the expandable segments changes are no longer part of this PR and will be added to another PR

Offload Types:

Main: Previous implementation on the main branch
Manual: "Fixed" cpu offloading with the changes in this PR. This will be the default setting after this PR.
FSDP: FSDP's built-in CPU offloading

Key Takeways:

"Manual" offloading generally maintains or improves step time at lower allocated and reserved memory when compared with "Main"
FSDP cpu offloading generally incurs some overhead in terms of Step 1 and Step 3 times when compared to "Manual" offloading.
- An exception is for 8B, 7500 sequence length (row 23) which is the longest sequence length I tested without OOMing for "Manual" offloading
Using expandable_segments keeps the reserved memory closer to allocated memory, but this is most evident with the "Main" offload type. Reserved memory is generally low for "Manual" and FSDP offload types.
- One case where this does make a difference for the "Manual" offload type is the Llama3.1-8B model on 7500 context length, which OOMs without expandable_segments, but can run (slowly) without OOMing with expandable_segments (rows 22 and 23)
For 8k context length and 8B model (FSDP1 memory usage still too high (8b on 8k seqlen not fitting) #67), FSDP cpu offloading is the only case that works without OOMing (Rows 30 and 31).
Enabling expandable_segments also incurs some overhead in terms of step time, but this is most evident in the SFT case for 8k context, 8B model (rows 50-55), where expandable_segments runs ~3x slower than non-expandable_segments.
Activation Checkpointing allows for GRPO training on 1 node, 8 GPUs for Llama3.1-8b and 8k context length

Issues

Closes #33 (FSDP1 part, FSDP2 part already merged in dtensor branch)
Closes #67

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...


          CPU offload changes

f8b9514

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw mentioned this pull request

feat: Fix CPU offloading + add options for FSDP offload and expandable segments #122

Closed

4 tasks

yfw requested review from terrykong and SahilJain314

April 2, 2025 21:48

yfw added the Run CICD label

yfw requested a review from parthchadha

April 2, 2025 21:49

yfw commented

View reviewed changes

nemo_reinforcer/distributed/worker_groups.py Show resolved Hide resolved

yfw added Run CICD and removed Run CICD labels

Contributor

terrykong commented Apr 3, 2025

Do you mind also adding some metrics for us to track offloading in the unit tests?

https://github.com/NVIDIA/reinforcer/blob/main/docs/testing.md#tracking-metrics-in-unit-tests

terrykong reviewed

View reviewed changes

examples/configs/grpo_math_1B.yaml Outdated Show resolved Hide resolved


          Fix unit tests

c02d454

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw added Run CICD and removed Run CICD labels

parthchadha reviewed

View reviewed changes

examples/configs/grpo_math_1B.yaml Outdated Show resolved Hide resolved

yfw added 2 commits

April 3, 2025 20:00


          Add memory tracking in unit test

7229b87

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>


          Merge branch 'main' into yifu/cpu_offload2

c299a75

yfw added Run CICD and removed Run CICD labels

Contributor Author

yfw commented Apr 4, 2025

Do you mind also adding some metrics for us to track offloading in the unit tests?

https://github.com/NVIDIA/reinforcer/blob/main/docs/testing.md#tracking-metrics-in-unit-tests

Added tracking metrics in unit test

yfw added 2 commits

April 4, 2025 11:26


          ruff

3b13a9a

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>


          Remove expandable_segments_enabled

95c4508

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw added Run CICD and removed Run CICD labels

yfw added 3 commits

April 4, 2025 14:09


          whitespace

44edbe8

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>


          Offload/onload buffers

412f57d

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>


          Activation checkpointing

ba48445

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw force-pushed the yifu/cpu_offload2 branch from 4acb7c9 to ba48445 Compare

April 5, 2025 02:42

Contributor Author

yfw commented Apr 5, 2025

Added activation checkpointing which allows grpo training for llama3.1-8b on 8k context length on 1 node, 8gpus without using FSDP's cpu offloading: https://wandb.ai/nvidia/reinforcer-dev-yifu-mem/runs/hr7cuz2f

yfw changed the title ~~feat: Fix CPU offloading + add options for FSDP offload and expandable segments~~ feat: Fix CPU offloading + add options for FSDP offload and activation checkpointing


          Fix test

989ea20

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw dismissed SahilJain314’s stale review via

989ea20

April 16, 2025 17:40

yfw added Run CICD and removed Run CICD labels


          Longer cicd timeout

6dd553c

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

github-actions bot added the CI label

yfw added Run CICD and removed Run CICD labels


          30 min timeout

31223ac

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw added Run CICD and removed Run CICD labels


          Merge remote-tracking branch 'origin' into yifu/cpu_offload2

dd3e713

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw added Run CICD and removed Run CICD labels

SahilJain314 added the CI:L0 label

SahilJain314 previously approved these changes

View reviewed changes

Contributor

SahilJain314 left a comment

LGTM. I've have been using this for my dev and don't see any issues.


          Merge remote-tracking branch 'origin' into yifu/cpu_offload2

75c0d2a

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw dismissed SahilJain314’s stale review via

75c0d2a

April 16, 2025 22:02

yfw added CI:L0 and removed CI:L0 labels

parthchadha approved these changes

View reviewed changes

SahilJain314 approved these changes

View reviewed changes

SahilJain314 enabled auto-merge

April 16, 2025 22:44

SahilJain314 added this pull request to the merge queue

Merged via the queue into main with commit 6db2f7a

19 checks passed

SahilJain314 deleted the yifu/cpu_offload2 branch

April 16, 2025 23:20

terrykong added a commit that referenced this pull request


          Squashed commit of the following:

ce9abe7

commit ebb46c3
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:46 2025 -0700

    fix: fix dtype of empty `token_ids` for consistency (#290)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit cf8f045
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:19 2025 -0700

    chore: Remove outdated comment in DPO config (#293)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 04f30bb
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Wed Apr 30 12:19:47 2025 -0700

    fix: Fixed max seqlen not respected correctly (#299)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit daac5d9
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 29 17:30:05 2025 -0700

    chore: Remove online hf checkpointing (#285)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 3cd8be8
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 29 15:18:37 2025 -0700

    feat: Remove 'last 100' hack for math verifier (#287)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Co-authored-by: Terry Kong <terryk@nvidia.com>

commit 506910a
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 11:29:22 2025 -0700

    test: add a test that checks if recipes can be merged into the base config (#288)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af43261
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 09:18:14 2025 -0700

    chore: add isort rules and pyflakes in ruff/precommit (#291)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8b0837c
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 29 23:57:41 2025 +0800

    ci: add eval functional test (#269)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit 68beb6d
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 23:35:01 2025 -0700

    feat: rename ratio_eps_{min/max} to ratio_clip_{min/max} for clarity (#283)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 2f5d22f
Author: Hemil Desai <hemild@nvidia.com>
Date:   Mon Apr 28 16:09:00 2025 -0700

    feat: Add hydra style overrides to SFT (#208)

    Signed-off-by: Hemil Desai <hemild@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Co-authored-by: ashors1 <ashors@nvidia.com>

commit 8a22c44
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 15:11:03 2025 -0700

    feat: publish convergence/release runs (#214)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af94d43
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 15:02:19 2025 -0700

    fix: fixes #264 where tied weights check didn't work on fsdp1 (#284)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Parth Chadha <parth29@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 1363dba
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 12:44:56 2025 -0700

    fix: improve port selection and exiting early from ray.sub (#272)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 044f385
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Mon Apr 28 14:22:55 2025 -0500

    docs: Correcting build issues and CI (#270)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 0fae6bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 11:08:51 2025 -0700

    feat: Updated Name to NeMo RL (#265)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 34cae3a
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 08:16:51 2025 -0700

    fix: add bibtex entry (#273)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit ee0d2c8
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sat Apr 26 20:15:38 2025 -0700

    docs: instruct users to git clone before beginning (#257)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 09f5416
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 25 13:46:41 2025 -0700

    feat: E2E multi-turn RL example with a sliding puzzle game (#242)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 47e51d3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Fri Apr 25 10:13:59 2025 -0700

    chore: better logging when insufficient resources (#271)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 98473c6
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 22:28:05 2025 -0700

    fix: Update DPO and SFT configs to use dtensor (#256)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 2558444
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 11:02:26 2025 -0700

    fix: Fix fsdp1 grad clipping and log grad norm (#251)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit c8f0a01
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 17:58:43 2025 -0700

    docs: add qwen 32b instruction and add 0.3 planned features (#255)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0a5f31d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 17:49:06 2025 -0400

    fix: fix broken eval script (#253)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2f8a140
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 12:47:18 2025 -0700

    ci: L1 default and increase test time (#252)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 1c7cbd9
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 12:52:13 2025 -0400

    fix: use find_tied_parameters api from HF for tied weight keys (#250)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 1788e4c
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 22 22:05:53 2025 -0400

    fix: raise error if tied weights model is being trained with fsdp1 or… (#229)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: mckimn <nmckimpson@nvidia.com>

commit 1fa4c7a
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 16:38:50 2025 -0700

    fix: Fix indent in dtensor policy (#248)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit ed546ae
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Wed Apr 23 07:29:47 2025 +0800

    feat: streaming each dtensor in refit (#176)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>

commit 5c62657
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Tue Apr 22 14:14:40 2025 -0700

    feat: Importance sampling trick (#174)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit deaece6
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 22 12:39:35 2025 -0700

    feat: Add support for multi-turn generations and RL (tools, games, etc) (#218)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 1245c50
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 12:19:42 2025 -0700

    fix: Speed up DPO functional test (#241)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit af369a3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 22 12:17:03 2025 -0700

    fix: Move ray worker port range start from 20001 to 53001 (#235)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 756152c
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 10:02:34 2025 -0700

    feat: Support multi-epoch training in SFT (#177)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit bbdd671
Author: Anna Shors <ashors@nvidia.com>
Date:   Mon Apr 21 22:16:15 2025 -0700

    feat: DPO (#180)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 88bc0fd
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 21 17:31:23 2025 -0700

    ci: Remove external config from project (#200)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 4a2e126
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 21 17:34:59 2025 -0400

    fix: skip vllm p2p check since its flaky (#238)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 22af21c
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:29 2025 -0700

    feat: FSDP2 SFT (#206)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit e36f488
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:24 2025 -0700

    fix: Fix missing import (#222)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 98b7a90
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 10:06:09 2025 -0700

    docs: update docs everywhere to remove uv pip install which isn't reliable (#217)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit da191b4
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 07:56:55 2025 -0700

    feat: introduce a debug API for backoff and retries for RayVirtualCluster (#234)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8780093
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 18 17:06:54 2025 -0700

    feat: Add total logging of generations in training (#172)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit ce2d121
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Sat Apr 19 00:22:11 2025 +0800

    fix: fix chat_template in eval (#210)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit f8b6ba9
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 12:52:19 2025 -0700

    fix: grpo func test 10 step -> 3 step to speed up CI (#209)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 4a6f62b
Author: Gerald Shen <119401249+gshennvm@users.noreply.github.com>
Date:   Thu Apr 17 11:06:05 2025 -0700

    feat: Add FSDP2, DTensor SP/TP, activation checkpointing support (#131)

    Signed-off-by: Gerald Shen <geshen@nvidia.com>

commit 78a9834
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 10:03:34 2025 -0700

    fix: ci uses umask (#211)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 5ff10f6
Author: alexchiu <qiuzhaopeng@foxmail.com>
Date:   Thu Apr 17 08:38:45 2025 +0800

    fix: prevent division by zero in ClippedPGLossFn calculation (#166)

    Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 6db2f7a
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Wed Apr 16 15:53:12 2025 -0700

    feat: Fix CPU offloading + add options for FSDP offload and activation ckpting (#123)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit 62ac8d2
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 15:38:53 2025 -0500

    ci: Only include dependencies in test container (#203)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit b00fcc8
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 16 13:23:40 2025 -0700

    fix: chat template improvements (#148)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit df31f50
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 13:13:58 2025 -0500

    ci: Run tests only in merge queue or when labeled (#159)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>

commit e3af337
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 16 09:23:30 2025 -0700

    feat: Upgrade to vllm v1 runtime (#170)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
    Co-authored-by: Anna Shors <ashors@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit dd7c2d7
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 16:00:04 2025 -0700

    fix: unit test script halts on first failure (#189)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 92c3f1d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 15 15:42:01 2025 -0700

    feat: add a unique seed for each vllm llm engine (#171)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2ae8935
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 14:41:21 2025 -0700

    docs: remove backticks from uv.md title (#179)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 9ac4e62
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 12:37:35 2025 -0700

    fix: convert DCP to HF script works without ray cluster (#185)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8213014
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Tue Apr 15 13:55:54 2025 -0500

    docs: Correcting file names (#161)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 4db3167
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 11:07:51 2025 -0700

    fix: default to less verbose logging + uv-venv log once per worker  (#141)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit bda6522
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 22:31:56 2025 -0700

    docs: run tests with --group test to avoid missing test deps (#188)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit c1fc972
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 20:43:51 2025 -0700

    ci: Update to include public/ folder for pages deployment (#182)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit e9812f1
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 14 20:05:46 2025 -0700

    fix: don't use cuda-graphs for vllm generation (#187)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit d7d4cd6
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 15:46:13 2025 -0700

    ci: labels for docs/L0/L1/L2 and run even if only doc test (#181)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0637511
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 15 05:24:07 2025 +0800

    feat: support arbitrary end_strings (#96)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit c99585c
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 14:44:43 2025 -0700

    fix: allow configuring ray ports in ray.sub in case conflict on cluster (#173)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit a5547f2
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 09:18:02 2025 -0700

    docs: Fix doc build warnings and add external CI config (#157)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 32953be
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Fri Apr 11 10:18:03 2025 -0700

    fix: always test vllm (#167)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit c00b8bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Thu Apr 10 22:38:40 2025 -0700

    test: Add grpo/reinforce/ppo loss tests (prep for incoming vocab parallel changes) (#162)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong pushed a commit that referenced this pull request


          Tech pubs updates to file

7125a27

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

test: Add grpo/reinforce/ppo loss tests (prep for incoming vocab parallel changes) (#162)

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Tech pubs updates to file

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

fix typo

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Incorporated Reviewer Comments in ReadMe

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to file

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech pups updates to resolve some threads

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech pubs updates to resolve some threads

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs minor edits to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Squashed commit of the following:

commit ebb46c3
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:46 2025 -0700

    fix: fix dtype of empty `token_ids` for consistency (#290)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit cf8f045
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:19 2025 -0700

    chore: Remove outdated comment in DPO config (#293)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 04f30bb
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Wed Apr 30 12:19:47 2025 -0700

    fix: Fixed max seqlen not respected correctly (#299)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit daac5d9
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 29 17:30:05 2025 -0700

    chore: Remove online hf checkpointing (#285)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 3cd8be8
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 29 15:18:37 2025 -0700

    feat: Remove 'last 100' hack for math verifier (#287)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Co-authored-by: Terry Kong <terryk@nvidia.com>

commit 506910a
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 11:29:22 2025 -0700

    test: add a test that checks if recipes can be merged into the base config (#288)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af43261
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 09:18:14 2025 -0700

    chore: add isort rules and pyflakes in ruff/precommit (#291)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8b0837c
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 29 23:57:41 2025 +0800

    ci: add eval functional test (#269)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit 68beb6d
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 23:35:01 2025 -0700

    feat: rename ratio_eps_{min/max} to ratio_clip_{min/max} for clarity (#283)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 2f5d22f
Author: Hemil Desai <hemild@nvidia.com>
Date:   Mon Apr 28 16:09:00 2025 -0700

    feat: Add hydra style overrides to SFT (#208)

    Signed-off-by: Hemil Desai <hemild@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Co-authored-by: ashors1 <ashors@nvidia.com>

commit 8a22c44
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 15:11:03 2025 -0700

    feat: publish convergence/release runs (#214)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af94d43
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 15:02:19 2025 -0700

    fix: fixes #264 where tied weights check didn't work on fsdp1 (#284)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Parth Chadha <parth29@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 1363dba
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 12:44:56 2025 -0700

    fix: improve port selection and exiting early from ray.sub (#272)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 044f385
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Mon Apr 28 14:22:55 2025 -0500

    docs: Correcting build issues and CI (#270)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 0fae6bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 11:08:51 2025 -0700

    feat: Updated Name to NeMo RL (#265)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 34cae3a
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 08:16:51 2025 -0700

    fix: add bibtex entry (#273)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit ee0d2c8
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sat Apr 26 20:15:38 2025 -0700

    docs: instruct users to git clone before beginning (#257)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 09f5416
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 25 13:46:41 2025 -0700

    feat: E2E multi-turn RL example with a sliding puzzle game (#242)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 47e51d3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Fri Apr 25 10:13:59 2025 -0700

    chore: better logging when insufficient resources (#271)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 98473c6
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 22:28:05 2025 -0700

    fix: Update DPO and SFT configs to use dtensor (#256)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 2558444
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 11:02:26 2025 -0700

    fix: Fix fsdp1 grad clipping and log grad norm (#251)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit c8f0a01
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 17:58:43 2025 -0700

    docs: add qwen 32b instruction and add 0.3 planned features (#255)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0a5f31d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 17:49:06 2025 -0400

    fix: fix broken eval script (#253)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2f8a140
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 12:47:18 2025 -0700

    ci: L1 default and increase test time (#252)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 1c7cbd9
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 12:52:13 2025 -0400

    fix: use find_tied_parameters api from HF for tied weight keys (#250)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 1788e4c
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 22 22:05:53 2025 -0400

    fix: raise error if tied weights model is being trained with fsdp1 or… (#229)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: mckimn <nmckimpson@nvidia.com>

commit 1fa4c7a
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 16:38:50 2025 -0700

    fix: Fix indent in dtensor policy (#248)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit ed546ae
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Wed Apr 23 07:29:47 2025 +0800

    feat: streaming each dtensor in refit (#176)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>

commit 5c62657
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Tue Apr 22 14:14:40 2025 -0700

    feat: Importance sampling trick (#174)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit deaece6
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 22 12:39:35 2025 -0700

    feat: Add support for multi-turn generations and RL (tools, games, etc) (#218)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 1245c50
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 12:19:42 2025 -0700

    fix: Speed up DPO functional test (#241)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit af369a3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 22 12:17:03 2025 -0700

    fix: Move ray worker port range start from 20001 to 53001 (#235)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 756152c
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 10:02:34 2025 -0700

    feat: Support multi-epoch training in SFT (#177)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit bbdd671
Author: Anna Shors <ashors@nvidia.com>
Date:   Mon Apr 21 22:16:15 2025 -0700

    feat: DPO (#180)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 88bc0fd
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 21 17:31:23 2025 -0700

    ci: Remove external config from project (#200)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 4a2e126
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 21 17:34:59 2025 -0400

    fix: skip vllm p2p check since its flaky (#238)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 22af21c
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:29 2025 -0700

    feat: FSDP2 SFT (#206)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit e36f488
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:24 2025 -0700

    fix: Fix missing import (#222)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 98b7a90
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 10:06:09 2025 -0700

    docs: update docs everywhere to remove uv pip install which isn't reliable (#217)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit da191b4
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 07:56:55 2025 -0700

    feat: introduce a debug API for backoff and retries for RayVirtualCluster (#234)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8780093
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 18 17:06:54 2025 -0700

    feat: Add total logging of generations in training (#172)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit ce2d121
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Sat Apr 19 00:22:11 2025 +0800

    fix: fix chat_template in eval (#210)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit f8b6ba9
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 12:52:19 2025 -0700

    fix: grpo func test 10 step -> 3 step to speed up CI (#209)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 4a6f62b
Author: Gerald Shen <119401249+gshennvm@users.noreply.github.com>
Date:   Thu Apr 17 11:06:05 2025 -0700

    feat: Add FSDP2, DTensor SP/TP, activation checkpointing support (#131)

    Signed-off-by: Gerald Shen <geshen@nvidia.com>

commit 78a9834
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 10:03:34 2025 -0700

    fix: ci uses umask (#211)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 5ff10f6
Author: alexchiu <qiuzhaopeng@foxmail.com>
Date:   Thu Apr 17 08:38:45 2025 +0800

    fix: prevent division by zero in ClippedPGLossFn calculation (#166)

    Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 6db2f7a
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Wed Apr 16 15:53:12 2025 -0700

    feat: Fix CPU offloading + add options for FSDP offload and activation ckpting (#123)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit 62ac8d2
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 15:38:53 2025 -0500

    ci: Only include dependencies in test container (#203)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit b00fcc8
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 16 13:23:40 2025 -0700

    fix: chat template improvements (#148)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit df31f50
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 13:13:58 2025 -0500

    ci: Run tests only in merge queue or when labeled (#159)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>

commit e3af337
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 16 09:23:30 2025 -0700

    feat: Upgrade to vllm v1 runtime (#170)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
    Co-authored-by: Anna Shors <ashors@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit dd7c2d7
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 16:00:04 2025 -0700

    fix: unit test script halts on first failure (#189)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 92c3f1d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 15 15:42:01 2025 -0700

    feat: add a unique seed for each vllm llm engine (#171)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2ae8935
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 14:41:21 2025 -0700

    docs: remove backticks from uv.md title (#179)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 9ac4e62
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 12:37:35 2025 -0700

    fix: convert DCP to HF script works without ray cluster (#185)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8213014
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Tue Apr 15 13:55:54 2025 -0500

    docs: Correcting file names (#161)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 4db3167
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 11:07:51 2025 -0700

    fix: default to less verbose logging + uv-venv log once per worker  (#141)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit bda6522
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 22:31:56 2025 -0700

    docs: run tests with --group test to avoid missing test deps (#188)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit c1fc972
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 20:43:51 2025 -0700

    ci: Update to include public/ folder for pages deployment (#182)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit e9812f1
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 14 20:05:46 2025 -0700

    fix: don't use cuda-graphs for vllm generation (#187)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit d7d4cd6
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 15:46:13 2025 -0700

    ci: labels for docs/L0/L1/L2 and run even if only doc test (#181)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0637511
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 15 05:24:07 2025 +0800

    feat: support arbitrary end_strings (#96)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit c99585c
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 14:44:43 2025 -0700

    fix: allow configuring ray ports in ray.sub in case conflict on cluster (#173)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit a5547f2
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 09:18:02 2025 -0700

    docs: Fix doc build warnings and add external CI config (#157)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 32953be
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Fri Apr 11 10:18:03 2025 -0700

    fix: always test vllm (#167)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit c00b8bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Thu Apr 10 22:38:40 2025 -0700

    test: Add grpo/reinforce/ppo loss tests (prep for incoming vocab parallel changes) (#162)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>

KiddoZhu pushed a commit that referenced this pull request


          feat: Fix CPU offloading + add options for FSDP offload and activatio…

df696f5

…n ckpting (#123)

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 CI