Skip to content

Conversation

jiemingz
Copy link
Contributor

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 4 times, most recently from fb57ec1 to 5b9c1ba Compare June 26, 2025 14:35
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 4 times, most recently from 53d8ec3 to 59e8b12 Compare July 8, 2025 19:47
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 3 times, most recently from 975df8c to 36c1710 Compare July 14, 2025 15:48
@jiemingz jiemingz changed the title draft: fp8 block scaling feat: fp8 block scaling Jul 14, 2025
@terrykong terrykong added the r0.3.0 Release r0.3.0 label Jul 14, 2025
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from c8304c0 to 5bc8868 Compare July 14, 2025 18:57
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from d68514a to e3a8daf Compare July 14, 2025 19:49
@jiemingz jiemingz requested a review from vcuinv July 14, 2025 21:55
@terrykong terrykong removed the r0.3.0 Release r0.3.0 label Jul 15, 2025
rybakov
rybakov previously approved these changes Jul 16, 2025
Copy link

@rybakov rybakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a config, RL/examples/configs/grpo_math_8B_fp8_L3_F1_G_i.yaml
For example, below config can be a good candidate (with optionally set num_last_layers_in_bf16: 0 num_first_layers_in_bf16: 0):

GRPO Algorithm Configuration

defaults: "grpo_math_1B.yaml"

grpo:
num_prompts_per_step: 64
num_generations_per_prompt: 32

loss_fn:
use_importance_sampling_correction: true

policy:
model_name: "meta-llama/Llama-3.1-8B-Instruct"
tokenizer:
name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default
train_global_batch_size: 512
train_micro_batch_size: 1
generation_batch_size: 32 # Only used when generating using HF backend
logprob_batch_size: 2
max_total_sequence_length: 4096
precision: "bfloat16"
fsdp_offload_enabled: false
activation_checkpointing_enabled: false

dtensor_cfg:
enabled: True

dynamic_batching:
train_mb_tokens: 4096
logprob_mb_tokens: 8192

optimizer:
name: "torch.optim.AdamW"
kwargs:
lr: 3.0e-7
weight_decay: 0.01
betas: [0.9, 0.999]
eps: 1e-8

scheduler:
- name: "torch.optim.lr_scheduler.LinearLR"
kwargs:
start_factor: 0.1
end_factor: 1.0
# The scheduler iteration is per GPRO step and is decoupled with the optimizer step (may be >=1 per GPRO step)
total_iters: 13
- name: "torch.optim.lr_scheduler.ConstantLR"
kwargs:
factor: 1.0
total_iters: 10000000000
- milestones: [13]

generation:
backend: "vllm"
max_new_tokens: ${policy.max_total_sequence_length}
temperature: 1.0
top_p: 1.0
top_k: null
stop_token_ids: null
stop_strings: null
vllm_cfg:
precision: 'fp8'
use_deep_gemm: true
num_last_layers_in_bf16: 3
num_first_layers_in_bf16: 1
tensor_parallel_size: 1
gpu_memory_utilization: 0.6
max_model_len: ${policy.max_total_sequence_length}

cluster:
gpus_per_node: 8
num_nodes: 1

@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 2 times, most recently from 36a127e to b899f3b Compare July 23, 2025 14:59
Copy link
Contributor

@SahilJain314 SahilJain314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super necessary immediately, but I think it'd be nice to include convergence plots for proof in the repo.

@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from b899f3b to f5401dc Compare July 24, 2025 03:49
@jiemingz jiemingz added the CI:L1 Run doctests, unit tests, and functional tests label Aug 20, 2025
@jiemingz jiemingz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 21, 2025
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from efe09d7 to b2d7e9a Compare August 21, 2025 15:11
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
@jiemingz jiemingz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 21, 2025
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
@jiemingz jiemingz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 21, 2025
@terrykong terrykong enabled auto-merge August 21, 2025 20:34
@terrykong terrykong added this pull request to the merge queue Aug 21, 2025
Merged via the queue into main with commit bc24887 Aug 22, 2025
34 of 36 checks passed
@terrykong terrykong deleted the jiemingz/fp8_block branch August 22, 2025 01:08
jveronvialard pushed a commit that referenced this pull request Aug 27, 2025
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Co-authored-by: Sahil Jain <sahilj@nvidia.com>
Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Julien Veron Vialard <jveronvialar@nvidia.com>
soodoshll pushed a commit to soodoshll/RL that referenced this pull request Aug 28, 2025
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Co-authored-by: Sahil Jain <sahilj@nvidia.com>
Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Qidong Su <qidongs@nvidia.com>
soodoshll pushed a commit to soodoshll/RL that referenced this pull request Sep 4, 2025
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Co-authored-by: Sahil Jain <sahilj@nvidia.com>
Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Signed-off-by: Qidong Su <qidongs@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FP8 vLLM inference
6 participants