feat: fp8 block scaling #543

jiemingz · 2025-06-24T16:46:11Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

nemo_rl/models/generation/fp8.py

rybakov

Should we also add a config, RL/examples/configs/grpo_math_8B_fp8_L3_F1_G_i.yaml
For example, below config can be a good candidate (with optionally set num_last_layers_in_bf16: 0 num_first_layers_in_bf16: 0):

GRPO Algorithm Configuration

defaults: "grpo_math_1B.yaml"

grpo:
num_prompts_per_step: 64
num_generations_per_prompt: 32

loss_fn:
use_importance_sampling_correction: true

policy:
model_name: "meta-llama/Llama-3.1-8B-Instruct"
tokenizer:
name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default
train_global_batch_size: 512
train_micro_batch_size: 1
generation_batch_size: 32 # Only used when generating using HF backend
logprob_batch_size: 2
max_total_sequence_length: 4096
precision: "bfloat16"
fsdp_offload_enabled: false
activation_checkpointing_enabled: false

dtensor_cfg:
enabled: True

dynamic_batching:
train_mb_tokens: 4096
logprob_mb_tokens: 8192

optimizer:
name: "torch.optim.AdamW"
kwargs:
lr: 3.0e-7
weight_decay: 0.01
betas: [0.9, 0.999]
eps: 1e-8

scheduler:
- name: "torch.optim.lr_scheduler.LinearLR"
kwargs:
start_factor: 0.1
end_factor: 1.0
# The scheduler iteration is per GPRO step and is decoupled with the optimizer step (may be >=1 per GPRO step)
total_iters: 13
- name: "torch.optim.lr_scheduler.ConstantLR"
kwargs:
factor: 1.0
total_iters: 10000000000
- milestones: [13]

generation:
backend: "vllm"
max_new_tokens: ${policy.max_total_sequence_length}
temperature: 1.0
top_p: 1.0
top_k: null
stop_token_ids: null
stop_strings: null
vllm_cfg:
precision: 'fp8'
use_deep_gemm: true
num_last_layers_in_bf16: 3
num_first_layers_in_bf16: 1
tensor_parallel_size: 1
gpu_memory_utilization: 0.6
max_model_len: ${policy.max_total_sequence_length}

cluster:
gpus_per_node: 8
num_nodes: 1

SahilJain314

Not super necessary immediately, but I think it'd be nice to include convergence plots for proof in the repo.

nemo_rl/models/generation/vllm_backend.py

nemo_rl/models/generation/fp8.py

nemo_rl/algorithms/grpo.py

pyproject.toml

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com> Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com> Signed-off-by: Qidong Su <qidongs@nvidia.com>

jiemingz force-pushed the jiemingz/fp8_block branch 4 times, most recently from fb57ec1 to 5b9c1ba Compare June 26, 2025 14:35

vcuinv reviewed Jul 1, 2025

View reviewed changes

nemo_rl/models/generation/fp8.py Outdated Show resolved Hide resolved

nemo_rl/models/generation/fp8.py Show resolved Hide resolved

vcuinv reviewed Jul 1, 2025

View reviewed changes

nemo_rl/models/generation/fp8.py Outdated Show resolved Hide resolved

jiemingz force-pushed the jiemingz/fp8_block branch 4 times, most recently from 53d8ec3 to 59e8b12 Compare July 8, 2025 19:47

jiemingz force-pushed the jiemingz/fp8_block branch 3 times, most recently from 975df8c to 36c1710 Compare July 14, 2025 15:48

jiemingz changed the title ~~draft: fp8 block scaling~~ feat: fp8 block scaling Jul 14, 2025

terrykong added the r0.3.0 Release r0.3.0 label Jul 14, 2025

jiemingz force-pushed the jiemingz/fp8_block branch from c8304c0 to 5bc8868 Compare July 14, 2025 18:57

jiemingz requested review from terrykong, parthchadha and SahilJain314 July 14, 2025 19:01

jiemingz force-pushed the jiemingz/fp8_block branch from d68514a to e3a8daf Compare July 14, 2025 19:49

jiemingz requested a review from vcuinv July 14, 2025 21:55

terrykong removed the r0.3.0 Release r0.3.0 label Jul 15, 2025

rybakov previously approved these changes Jul 16, 2025

View reviewed changes

jiemingz force-pushed the jiemingz/fp8_block branch from e3a8daf to 32ada21 Compare July 21, 2025 19:55

jiemingz dismissed rybakov’s stale review via 4a35c9d July 21, 2025 21:33

vcuinv approved these changes Jul 22, 2025

View reviewed changes

jiemingz force-pushed the jiemingz/fp8_block branch 2 times, most recently from 36a127e to b899f3b Compare July 23, 2025 14:59

SahilJain314 reviewed Jul 23, 2025

View reviewed changes

jiemingz force-pushed the jiemingz/fp8_block branch from b899f3b to f5401dc Compare July 24, 2025 03:49

jiemingz added the CI:L1 Run doctests, unit tests, and functional tests label Aug 20, 2025

jiemingz temporarily deployed to nemo-ci August 20, 2025 21:57 — with GitHub Actions Inactive

jiemingz temporarily deployed to nemo-ci August 20, 2025 23:02 — with GitHub Actions Inactive

jiemingz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 21, 2025

jiemingz temporarily deployed to nemo-ci August 21, 2025 01:11 — with GitHub Actions Inactive

jiemingz temporarily deployed to nemo-ci August 21, 2025 02:32 — with GitHub Actions Inactive

add functional

b2d7e9a

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

jiemingz force-pushed the jiemingz/fp8_block branch from efe09d7 to b2d7e9a Compare August 21, 2025 15:11

jiemingz added 2 commits August 21, 2025 11:11

Merge branch 'main' into jiemingz/fp8_block

5cf2e68

Update grpo-llama3.1-8b-instruct-1n8g-megatron-fp8.yaml

33988a8

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

jiemingz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 21, 2025

jiemingz temporarily deployed to nemo-ci August 21, 2025 15:15 — with GitHub Actions Inactive

jiemingz temporarily deployed to nemo-ci August 21, 2025 15:21 — with GitHub Actions Inactive

add missed cfgs

4d30861

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

jiemingz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 21, 2025

jiemingz temporarily deployed to nemo-ci August 21, 2025 18:07 — with GitHub Actions Inactive

jiemingz temporarily deployed to nemo-ci August 21, 2025 18:08 — with GitHub Actions Inactive

jiemingz temporarily deployed to nemo-ci August 21, 2025 20:16 — with GitHub Actions Inactive

terrykong approved these changes Aug 21, 2025

View reviewed changes

terrykong enabled auto-merge August 21, 2025 20:34

terrykong added this pull request to the merge queue Aug 21, 2025

Merged via the queue into main with commit bc24887 Aug 22, 2025
34 of 36 checks passed

terrykong deleted the jiemingz/fp8_block branch August 22, 2025 01:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: fp8 block scaling #543

feat: fp8 block scaling #543

Uh oh!

jiemingz commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rybakov left a comment

Uh oh!

SahilJain314 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: fp8 block scaling #543

feat: fp8 block scaling #543

Uh oh!

Conversation

jiemingz commented Jun 24, 2025

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rybakov left a comment

Choose a reason for hiding this comment

GRPO Algorithm Configuration

Uh oh!

SahilJain314 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!