🎀 Simplify logging text #3219

qgallouedec · 2025-04-03T00:02:38Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-04-03T00:07:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-04-03T00:50:34Z

from datasets import load_dataset
from trl import GRPOTrainer, GRPOConfig

dataset = load_dataset("trl-internal-testing/zen", "standard_prompt_only", split="train")


# Dummy reward function: count the number of unique characters in the completions
def reward_num_unique_chars(completions, **kwargs):
    return [len(set(c)) for c in completions]


training_args = GRPOConfig(
    log_completions=True,
    gradient_accumulation_steps=2,
    num_generations=3,
    per_device_train_batch_size=2,
    logging_steps=4,
    max_completion_length=8,
)

trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_funcs=reward_num_unique_chars,
    train_dataset=dataset,
    args = training_args
)
trainer.train()

accelerate launch --num_processes 3 3219.py

wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: qgallouedec (huggingface) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.8
wandb: Run data is saved locally in /fsx/qgallouedec/trl/wandb/run-20250403_004247-47nd8ofl
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run trainer_output
wandb: ⭐️ View project at https://wandb.ai/huggingface/huggingface
wandb: 🚀 View run at https://wandb.ai/huggingface/huggingface/runs/47nd8ofl
  0%|                                                       | 0/12 [00:00<?, ?it/s][rank0]:[W403 00:42:49.628634618 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank1]:[W403 00:42:49.629350692 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank2]:[W403 00:42:49.629607447 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
{'loss': 0.0, 'grad_norm': 37.730892181396484, 'learning_rate': 6.666666666666666e-07, 'num_tokens': 594.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.5625, 'rewards/reward_num_unique_chars/std': 2.098927229642868, 'reward': 16.562500476837158, 'reward_std': 1.9038410782814026, 'kl': 0.000272930920800718, 'clip_ratio': 0.0, 'epoch': 0.89}
 33%|███████████████▋                               | 4/12 [00:02<00:05,  1.51it/s]
╭──────────────────────────────────── Step 4 ─────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ Explicit is             │  not a polynomial       │                   14.00 │ │
│ │                         │                         │                         │ │
│ │                         │ I am trying to          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Explicit is             │  a great tool for       │                   16.00 │ │
│ │                         │ people who are          │                         │ │
│ │                         │ interested              │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Explicit is             │  a function in this     │                   15.00 │ │
│ │                         │ context. It is          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │ . They might not have   │                   16.00 │ │
│ │ special                 │ as many options         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │ : they're just special  │                   18.00 │ │
│ │ special                 │ cases of other          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │ ! When we speak of      │                   17.00 │ │
│ │ special                 │ special cases,          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Now is                  │  the time to help your  │                   16.00 │ │
│ │                         │ community. Let          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Now is                  │  a good time to begin   │                   16.00 │ │
│ │                         │ planning for the        │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Now is                  │  the perfect time to be │                   18.00 │ │
│ │                         │ looking into your       │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Beautiful is better     │  the perfect.           │                   19.00 │ │
│ │ than                    │ As we all know,         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Beautiful is better     │  beautiful. Beauty is   │                   19.00 │ │
│ │ than                    │ better than everything. │                         │ │
│ │                         │                         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Beautiful is better     │  the average. And good  │                   16.00 │ │
│ │ than                    │ is better than          │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'loss': 0.0009, 'grad_norm': 37.897972106933594, 'learning_rate': 3.333333333333333e-07, 'num_tokens': 1188.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.770833492279053, 'rewards/reward_num_unique_chars/std': 2.1402209401130676, 'reward': 16.770833730697632, 'reward_std': 2.0098465979099274, 'kl': 0.021565582719631493, 'clip_ratio': 0.0, 'epoch': 1.89}
 67%|███████████████████████████████▎               | 8/12 [00:05<00:02,  1.57it/s]
╭──────────────────────────────────── Step 8 ─────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ Unless explicitly       │  stated, what is the    │                   16.00 │ │
│ │                         │ relationship between    │                         │ │
│ │                         │ the                     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Unless explicitly       │  stated otherwise, the  │                   18.00 │ │
│ │                         │ software being provided │                         │ │
│ │                         │ with                    │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Unless explicitly       │  stated, is it possible │                   18.00 │ │
│ │                         │ for a company           │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │ ! - Mathematical        │                   19.00 │ │
│ │ special                 │ Physics                 │                         │ │
│ │                         │                         │                         │ │
│ │                         │ # Special cases         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │  2021                   │                   13.00 │ │
│ │ special                 │                         │                         │ │
│ │                         │ Given that              │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │                         │                   19.00 │ │
│ │ special                 │                         │                         │ │
│ │                         │ I'm on the third day    │                         │ │
│ │                         │ studying                │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ There should be one--   │  only one -- place      │                   17.00 │ │
│ │ and preferably          │ where you can find      │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ There should be one--   │  only one-- office or   │                   16.00 │ │
│ │ and preferably          │ building or a           │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ There should be one--   │  two-- windows on your  │                   17.00 │ │
│ │ and preferably          │ website.                │                         │ │
│ │                         │ It                      │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │ : What should I use for │                   17.00 │ │
│ │                         │ the new                 │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │  & Accessibility: When  │                   20.00 │ │
│ │                         │ are you ready to        │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │ . A book that reads as  │                   15.00 │ │
│ │                         │ if it                   │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'loss': 0.002, 'grad_norm': 41.002220153808594, 'learning_rate': 0.0, 'num_tokens': 1770.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.37499976158142, 'rewards/reward_num_unique_chars/std': 2.165749952197075, 'reward': 16.375000596046448, 'reward_std': 2.0700144916772842, 'kl': 0.051057277189102024, 'clip_ratio': 0.0, 'epoch': 2.89}
100%|██████████████████████████████████████████████| 12/12 [00:08<00:00,  1.64it/s]
╭──────────────────────────────────── Step 12 ────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ In the face of          │  to answer. In today's  │                   18.00 │ │
│ │ ambiguity, refuse       │ media climate           │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ In the face of          │  to be misunderstood.   │                   15.00 │ │
│ │ ambiguity, refuse       │ To be misunderstood is  │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ In the face of          │  to answer.             │                   16.00 │ │
│ │ ambiguity, refuse       │ This is an allusion     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  it will not be good    │                   17.00 │ │
│ │ is hard to explain,     │ for business.           │                         │ │
│ │                         │                         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  I'm sure your project  │                   19.00 │ │
│ │ is hard to explain,     │ will be easier          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  we can use (but not    │                   17.00 │ │
│ │ is hard to explain,     │ necessarily enforce     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │                         │                    7.00 │ │
│ │                         │ A. 初级                 │                         │ │
│ │                         │ B                       │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │  of your paper depends  │                   17.00 │ │
│ │                         │ on several factors,     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │  is the foundation from │                   16.00 │ │
│ │                         │ which most of the       │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  the heart of a man     │                   13.00 │ │
│ │ beats                   │ many a times            │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  aesthetics every day,  │                   16.00 │ │
│ │ beats                   │ it's a good             │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  emotion in politics, I │                   16.00 │ │
│ │ beats                   │ can't help              │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'train_runtime': 19.9479, 'train_samples_per_second': 2.557, 'train_steps_per_second': 0.602, 'train_loss': 0.0009722735073106984, 'epoch': 2.89}
100%|██████████████████████████████████████████████| 12/12 [00:18<00:00,  1.64it/s]
╭──────────────────────────────────── Step 12 ────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ In the face of          │  to answer. In today's  │                   18.00 │ │
│ │ ambiguity, refuse       │ media climate           │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ In the face of          │  to be misunderstood.   │                   15.00 │ │
│ │ ambiguity, refuse       │ To be misunderstood is  │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ In the face of          │  to answer.             │                   16.00 │ │
│ │ ambiguity, refuse       │ This is an allusion     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  it will not be good    │                   17.00 │ │
│ │ is hard to explain,     │ for business.           │                         │ │
│ │                         │                         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  I'm sure your project  │                   19.00 │ │
│ │ is hard to explain,     │ will be easier          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  we can use (but not    │                   17.00 │ │
│ │ is hard to explain,     │ necessarily enforce     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │                         │                    7.00 │ │
│ │                         │ A. 初级                 │                         │ │
│ │                         │ B                       │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │  of your paper depends  │                   17.00 │ │
│ │                         │ on several factors,     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │  is the foundation from │                   16.00 │ │
│ │                         │ which most of the       │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  the heart of a man     │                   13.00 │ │
│ │ beats                   │ many a times            │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  aesthetics every day,  │                   16.00 │ │
│ │ beats                   │ it's a good             │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  emotion in politics, I │                   16.00 │ │
│ │ beats                   │ can't help              │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
100%|██████████████████████████████████████████████| 12/12 [00:18<00:00,  1.58s/it]
wandb: 
wandb: 🚀 View run trainer_output at: https://wandb.ai/huggingface/huggingface/runs/47nd8ofl
wandb: Find logs at: wandb/run-20250403_004247-47nd8ofl/logs
[rank0]:[W403 00:43:08.870327835 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

…e/trl into simplify-logging-text

qgallouedec · 2025-04-03T00:58:58Z

With num_completions_to_print=2 we get:

andb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: qgallouedec (huggingface) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.8
wandb: Run data is saved locally in /fsx/qgallouedec/trl/wandb/run-20250403_005751-yij5hg8w
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run trainer_output
wandb: ⭐️ View project at https://wandb.ai/huggingface/huggingface
wandb: 🚀 View run at https://wandb.ai/huggingface/huggingface/runs/yij5hg8w
  0%|                                                       | 0/12 [00:00<?, ?it/s][rank0]:[W403 00:57:52.295133203 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank1]:[W403 00:57:52.296845625 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank2]:[W403 00:57:52.297741432 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
{'loss': 0.0, 'grad_norm': 37.730892181396484, 'learning_rate': 6.666666666666666e-07, 'num_tokens': 594.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.5625, 'rewards/reward_num_unique_chars/std': 2.098927229642868, 'reward': 16.562500476837158, 'reward_std': 1.9038410782814026, 'kl': 0.000272930920800718, 'clip_ratio': 0.0, 'epoch': 0.89}
 33%|███████████████▋                               | 4/12 [00:02<00:05,  1.51it/s]
╭──────────────────────────────────── Step 4 ─────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ Beautiful is better     │  beautiful. Beauty is   │                   19.00 │ │
│ │ than                    │ better than everything. │                         │ │
│ │                         │                         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Explicit is             │  a great tool for       │                   16.00 │ │
│ │                         │ people who are          │                         │ │
│ │                         │ interested              │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'loss': 0.0009, 'grad_norm': 37.897972106933594, 'learning_rate': 3.333333333333333e-07, 'num_tokens': 1188.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.770833492279053, 'rewards/reward_num_unique_chars/std': 2.1402209401130676, 'reward': 16.770833730697632, 'reward_std': 2.0098465979099274, 'kl': 0.021565582719631493, 'clip_ratio': 0.0, 'epoch': 1.89}
 67%|███████████████████████████████▎               | 8/12 [00:05<00:02,  1.58it/s]
╭──────────────────────────────────── Step 8 ─────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ Unless explicitly       │  stated, what is the    │                   16.00 │ │
│ │                         │ relationship between    │                         │ │
│ │                         │ the                     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │  2021                   │                   13.00 │ │
│ │ special                 │                         │                         │ │
│ │                         │ Given that              │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'loss': 0.002, 'grad_norm': 41.002220153808594, 'learning_rate': 0.0, 'num_tokens': 1770.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.37499976158142, 'rewards/reward_num_unique_chars/std': 2.165749952197075, 'reward': 16.375000596046448, 'reward_std': 2.0700144916772842, 'kl': 0.051057277189102024, 'clip_ratio': 0.0, 'epoch': 2.89}
100%|██████████████████████████████████████████████| 12/12 [00:08<00:00,  1.63it/s]
╭──────────────────────────────────── Step 12 ────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ If the implementation   │  it will not be good    │                   17.00 │ │
│ │ is hard to explain,     │ for business.           │                         │ │
│ │                         │                         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  emotion in politics, I │                   16.00 │ │
│ │ beats                   │ can't help              │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'train_runtime': 15.5779, 'train_samples_per_second': 3.274, 'train_steps_per_second': 0.77, 'train_loss': 0.0009722735073106984, 'epoch': 2.89}
100%|██████████████████████████████████████████████| 12/12 [00:14<00:00,  1.63it/s]
╭──────────────────────────────────── Step 12 ────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                   ┃ Completion             ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ In the face of           │  to answer.            │                   16.00 │ │
│ │ ambiguity, refuse        │ This is an allusion    │                         │ │
│ ├──────────────────────────┼────────────────────────┼─────────────────────────┤ │
│ │ In the face of           │  to be misunderstood.  │                   15.00 │ │
│ │ ambiguity, refuse        │ To be misunderstood is │                         │ │
│ └──────────────────────────┴────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
100%|██████████████████████████████████████████████| 12/12 [00:14<00:00,  1.22s/it]

qgallouedec · 2025-04-03T01:01:23Z

And with wandb_log_unique_prompts=True

lewtun

Very elegant refactor - LGTM!

lewtun · 2025-04-03T06:13:33Z

trl/trainer/grpo_trainer.py

+        # maxlen is set to the total number of forward passes per step. This value of `maxlen` ensures we log only the
+        # final optimization step.
+        maxlen = self.accelerator.num_processes * args.per_device_train_batch_size * args.gradient_accumulation_steps
+        self._textual_logs = {


Very clever idea to use a queue!

Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>

lewtun and others added 3 commits April 2, 2025 19:43

Accumulate completions for logging

4cc6b8c

simplify logging steps

eb71f93

Merge branch 'main' into simplify-logging-text

040c34b

qgallouedec requested a review from lewtun April 3, 2025 00:56

qgallouedec added 2 commits April 3, 2025 00:57

only log in my process

971be3d

Merge branch 'simplify-logging-text' of https://github.com/huggingfac…

21c7697

…e/trl into simplify-logging-text

qgallouedec requested review from kashif, edbeeching and shirinyamani April 3, 2025 00:59

lewtun approved these changes Apr 3, 2025

View reviewed changes

edbeeching approved these changes Apr 3, 2025

View reviewed changes

qgallouedec added 2 commits April 3, 2025 07:37

Merge branch 'main' into simplify-logging-text

b8cf4a2

Merge branch 'main' into simplify-logging-text

4e97e6f

qgallouedec changed the title ~~Simplify logging text~~ 🎀 Simplify logging text Apr 5, 2025

qgallouedec merged commit 17e33cd into main Apr 5, 2025
8 of 10 checks passed

qgallouedec deleted the simplify-logging-text branch April 5, 2025 22:38

yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025

🎀 Simplify logging text (huggingface#3219)

a1f41ea

Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🎀 Simplify logging text #3219

🎀 Simplify logging text #3219

Uh oh!

qgallouedec commented Apr 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 3, 2025

Uh oh!

qgallouedec commented Apr 3, 2025 •

edited

Loading

Uh oh!

qgallouedec commented Apr 3, 2025 •

edited

Loading

Uh oh!

qgallouedec commented Apr 3, 2025

Uh oh!

lewtun left a comment

Uh oh!

lewtun Apr 3, 2025

Uh oh!

Uh oh!

Uh oh!

🎀 Simplify logging text #3219

🎀 Simplify logging text #3219

Uh oh!

Conversation

qgallouedec commented Apr 3, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 3, 2025

Uh oh!

qgallouedec commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Apr 3, 2025

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

lewtun Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented Apr 3, 2025 •

edited

Loading

qgallouedec commented Apr 3, 2025 •

edited

Loading