Skip to content

🎀 Simplify logging text #3219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 5, 2025
Merged

🎀 Simplify logging text #3219

merged 7 commits into from
Apr 5, 2025

Conversation

qgallouedec
Copy link
Member

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec
Copy link
Member Author

qgallouedec commented Apr 3, 2025

from datasets import load_dataset
from trl import GRPOTrainer, GRPOConfig

dataset = load_dataset("trl-internal-testing/zen", "standard_prompt_only", split="train")


# Dummy reward function: count the number of unique characters in the completions
def reward_num_unique_chars(completions, **kwargs):
    return [len(set(c)) for c in completions]


training_args = GRPOConfig(
    log_completions=True,
    gradient_accumulation_steps=2,
    num_generations=3,
    per_device_train_batch_size=2,
    logging_steps=4,
    max_completion_length=8,
)

trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_funcs=reward_num_unique_chars,
    train_dataset=dataset,
    args = training_args
)
trainer.train()
accelerate launch --num_processes 3 3219.py 
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: qgallouedec (huggingface) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.8
wandb: Run data is saved locally in /fsx/qgallouedec/trl/wandb/run-20250403_004247-47nd8ofl
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run trainer_output
wandb: ⭐️ View project at https://wandb.ai/huggingface/huggingface
wandb: 🚀 View run at https://wandb.ai/huggingface/huggingface/runs/47nd8ofl
  0%|                                                       | 0/12 [00:00<?, ?it/s][rank0]:[W403 00:42:49.628634618 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank1]:[W403 00:42:49.629350692 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank2]:[W403 00:42:49.629607447 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
{'loss': 0.0, 'grad_norm': 37.730892181396484, 'learning_rate': 6.666666666666666e-07, 'num_tokens': 594.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.5625, 'rewards/reward_num_unique_chars/std': 2.098927229642868, 'reward': 16.562500476837158, 'reward_std': 1.9038410782814026, 'kl': 0.000272930920800718, 'clip_ratio': 0.0, 'epoch': 0.89}
 33%|███████████████▋                               | 4/12 [00:02<00:05,  1.51it/s]
╭──────────────────────────────────── Step 4 ─────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ Explicit is             │  not a polynomial       │                   14.00 │ │
│ │                         │                         │                         │ │
│ │                         │ I am trying to          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Explicit is             │  a great tool for       │                   16.00 │ │
│ │                         │ people who are          │                         │ │
│ │                         │ interested              │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Explicit is             │  a function in this     │                   15.00 │ │
│ │                         │ context. It is          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │ . They might not have   │                   16.00 │ │
│ │ special                 │ as many options         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │ : they're just special  │                   18.00 │ │
│ │ special                 │ cases of other          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │ ! When we speak of      │                   17.00 │ │
│ │ special                 │ special cases,          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Now is                  │  the time to help your  │                   16.00 │ │
│ │                         │ community. Let          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Now is                  │  a good time to begin   │                   16.00 │ │
│ │                         │ planning for the        │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Now is                  │  the perfect time to be │                   18.00 │ │
│ │                         │ looking into your       │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Beautiful is better     │  the perfect.           │                   19.00 │ │
│ │ than                    │ As we all know,         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Beautiful is better     │  beautiful. Beauty is   │                   19.00 │ │
│ │ than                    │ better than everything. │                         │ │
│ │                         │                         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Beautiful is better     │  the average. And good  │                   16.00 │ │
│ │ than                    │ is better than          │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'loss': 0.0009, 'grad_norm': 37.897972106933594, 'learning_rate': 3.333333333333333e-07, 'num_tokens': 1188.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.770833492279053, 'rewards/reward_num_unique_chars/std': 2.1402209401130676, 'reward': 16.770833730697632, 'reward_std': 2.0098465979099274, 'kl': 0.021565582719631493, 'clip_ratio': 0.0, 'epoch': 1.89}
 67%|███████████████████████████████▎               | 8/12 [00:05<00:02,  1.57it/s]
╭──────────────────────────────────── Step 8 ─────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ Unless explicitly       │  stated, what is the    │                   16.00 │ │
│ │                         │ relationship between    │                         │ │
│ │                         │ the                     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Unless explicitly       │  stated otherwise, the  │                   18.00 │ │
│ │                         │ software being provided │                         │ │
│ │                         │ with                    │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Unless explicitly       │  stated, is it possible │                   18.00 │ │
│ │                         │ for a company           │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │ ! - Mathematical        │                   19.00 │ │
│ │ special                 │ Physics                 │                         │ │
│ │                         │                         │                         │ │
│ │                         │ # Special cases         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │  2021                   │                   13.00 │ │
│ │ special                 │                         │                         │ │
│ │                         │ Given that              │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │                         │                   19.00 │ │
│ │ special                 │                         │                         │ │
│ │                         │ I'm on the third day    │                         │ │
│ │                         │ studying                │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ There should be one--   │  only one -- place      │                   17.00 │ │
│ │ and preferably          │ where you can find      │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ There should be one--   │  only one-- office or   │                   16.00 │ │
│ │ and preferably          │ building or a           │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ There should be one--   │  two-- windows on your  │                   17.00 │ │
│ │ and preferably          │ website.                │                         │ │
│ │                         │ It                      │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │ : What should I use for │                   17.00 │ │
│ │                         │ the new                 │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │  & Accessibility: When  │                   20.00 │ │
│ │                         │ are you ready to        │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │ . A book that reads as  │                   15.00 │ │
│ │                         │ if it                   │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'loss': 0.002, 'grad_norm': 41.002220153808594, 'learning_rate': 0.0, 'num_tokens': 1770.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.37499976158142, 'rewards/reward_num_unique_chars/std': 2.165749952197075, 'reward': 16.375000596046448, 'reward_std': 2.0700144916772842, 'kl': 0.051057277189102024, 'clip_ratio': 0.0, 'epoch': 2.89}
100%|██████████████████████████████████████████████| 12/12 [00:08<00:00,  1.64it/s]
╭──────────────────────────────────── Step 12 ────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ In the face of          │  to answer. In today's  │                   18.00 │ │
│ │ ambiguity, refuse       │ media climate           │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ In the face of          │  to be misunderstood.   │                   15.00 │ │
│ │ ambiguity, refuse       │ To be misunderstood is  │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ In the face of          │  to answer.             │                   16.00 │ │
│ │ ambiguity, refuse       │ This is an allusion     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  it will not be good    │                   17.00 │ │
│ │ is hard to explain,     │ for business.           │                         │ │
│ │                         │                         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  I'm sure your project  │                   19.00 │ │
│ │ is hard to explain,     │ will be easier          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  we can use (but not    │                   17.00 │ │
│ │ is hard to explain,     │ necessarily enforce     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │                         │                    7.00 │ │
│ │                         │ A. 初级                 │                         │ │
│ │                         │ B                       │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │  of your paper depends  │                   17.00 │ │
│ │                         │ on several factors,     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │  is the foundation from │                   16.00 │ │
│ │                         │ which most of the       │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  the heart of a man     │                   13.00 │ │
│ │ beats                   │ many a times            │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  aesthetics every day,  │                   16.00 │ │
│ │ beats                   │ it's a good             │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  emotion in politics, I │                   16.00 │ │
│ │ beats                   │ can't help              │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'train_runtime': 19.9479, 'train_samples_per_second': 2.557, 'train_steps_per_second': 0.602, 'train_loss': 0.0009722735073106984, 'epoch': 2.89}
100%|██████████████████████████████████████████████| 12/12 [00:18<00:00,  1.64it/s]
╭──────────────────────────────────── Step 12 ────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ In the face of          │  to answer. In today's  │                   18.00 │ │
│ │ ambiguity, refuse       │ media climate           │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ In the face of          │  to be misunderstood.   │                   15.00 │ │
│ │ ambiguity, refuse       │ To be misunderstood is  │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ In the face of          │  to answer.             │                   16.00 │ │
│ │ ambiguity, refuse       │ This is an allusion     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  it will not be good    │                   17.00 │ │
│ │ is hard to explain,     │ for business.           │                         │ │
│ │                         │                         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  I'm sure your project  │                   19.00 │ │
│ │ is hard to explain,     │ will be easier          │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ If the implementation   │  we can use (but not    │                   17.00 │ │
│ │ is hard to explain,     │ necessarily enforce     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │                         │                    7.00 │ │
│ │                         │ A. 初级                 │                         │ │
│ │                         │ B                       │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │  of your paper depends  │                   17.00 │ │
│ │                         │ on several factors,     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Readability             │  is the foundation from │                   16.00 │ │
│ │                         │ which most of the       │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  the heart of a man     │                   13.00 │ │
│ │ beats                   │ many a times            │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  aesthetics every day,  │                   16.00 │ │
│ │ beats                   │ it's a good             │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  emotion in politics, I │                   16.00 │ │
│ │ beats                   │ can't help              │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
100%|██████████████████████████████████████████████| 12/12 [00:18<00:00,  1.58s/it]
wandb: 
wandb: 🚀 View run trainer_output at: https://wandb.ai/huggingface/huggingface/runs/47nd8ofl
wandb: Find logs at: wandb/run-20250403_004247-47nd8ofl/logs
[rank0]:[W403 00:43:08.870327835 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Screenshot 2025-04-02 at 17 54 46 Screenshot 2025-04-02 at 17 54 50

@qgallouedec qgallouedec requested a review from lewtun April 3, 2025 00:56
@qgallouedec
Copy link
Member Author

qgallouedec commented Apr 3, 2025

With num_completions_to_print=2 we get:

andb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: qgallouedec (huggingface) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.8
wandb: Run data is saved locally in /fsx/qgallouedec/trl/wandb/run-20250403_005751-yij5hg8w
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run trainer_output
wandb: ⭐️ View project at https://wandb.ai/huggingface/huggingface
wandb: 🚀 View run at https://wandb.ai/huggingface/huggingface/runs/yij5hg8w
  0%|                                                       | 0/12 [00:00<?, ?it/s][rank0]:[W403 00:57:52.295133203 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank1]:[W403 00:57:52.296845625 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank2]:[W403 00:57:52.297741432 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
{'loss': 0.0, 'grad_norm': 37.730892181396484, 'learning_rate': 6.666666666666666e-07, 'num_tokens': 594.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.5625, 'rewards/reward_num_unique_chars/std': 2.098927229642868, 'reward': 16.562500476837158, 'reward_std': 1.9038410782814026, 'kl': 0.000272930920800718, 'clip_ratio': 0.0, 'epoch': 0.89}
 33%|███████████████▋                               | 4/12 [00:02<00:05,  1.51it/s]
╭──────────────────────────────────── Step 4 ─────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ Beautiful is better     │  beautiful. Beauty is   │                   19.00 │ │
│ │ than                    │ better than everything. │                         │ │
│ │                         │                         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Explicit is             │  a great tool for       │                   16.00 │ │
│ │                         │ people who are          │                         │ │
│ │                         │ interested              │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'loss': 0.0009, 'grad_norm': 37.897972106933594, 'learning_rate': 3.333333333333333e-07, 'num_tokens': 1188.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.770833492279053, 'rewards/reward_num_unique_chars/std': 2.1402209401130676, 'reward': 16.770833730697632, 'reward_std': 2.0098465979099274, 'kl': 0.021565582719631493, 'clip_ratio': 0.0, 'epoch': 1.89}
 67%|███████████████████████████████▎               | 8/12 [00:05<00:02,  1.58it/s]
╭──────────────────────────────────── Step 8 ─────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ Unless explicitly       │  stated, what is the    │                   16.00 │ │
│ │                         │ relationship between    │                         │ │
│ │                         │ the                     │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Special cases aren't    │  2021                   │                   13.00 │ │
│ │ special                 │                         │                         │ │
│ │                         │ Given that              │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'loss': 0.002, 'grad_norm': 41.002220153808594, 'learning_rate': 0.0, 'num_tokens': 1770.0, 'mean_completion_length': 8.0, 'min_completion_length': 8.0, 'max_completion_length': 8.0, 'clipped_completions_ratio': 1.0, 'mean_terminated_completion_length': 0.0, 'min_terminated_completion_length': 0.0, 'max_terminated_completion_length': 0.0, 'rewards/reward_num_unique_chars/mean': 16.37499976158142, 'rewards/reward_num_unique_chars/std': 2.165749952197075, 'reward': 16.375000596046448, 'reward_std': 2.0700144916772842, 'kl': 0.051057277189102024, 'clip_ratio': 0.0, 'epoch': 2.89}
100%|██████████████████████████████████████████████| 12/12 [00:08<00:00,  1.63it/s]
╭──────────────────────────────────── Step 12 ────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                  ┃ Completion              ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ If the implementation   │  it will not be good    │                   17.00 │ │
│ │ is hard to explain,     │ for business.           │                         │ │
│ │                         │                         │                         │ │
│ ├─────────────────────────┼─────────────────────────┼─────────────────────────┤ │
│ │ Although practicality   │  emotion in politics, I │                   16.00 │ │
│ │ beats                   │ can't help              │                         │ │
│ └─────────────────────────┴─────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
{'train_runtime': 15.5779, 'train_samples_per_second': 3.274, 'train_steps_per_second': 0.77, 'train_loss': 0.0009722735073106984, 'epoch': 2.89}
100%|██████████████████████████████████████████████| 12/12 [00:14<00:00,  1.63it/s]
╭──────────────────────────────────── Step 12 ────────────────────────────────────╮
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ Prompt                   ┃ Completion             ┃ reward_num_unique_chars ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ In the face of           │  to answer.            │                   16.00 │ │
│ │ ambiguity, refuse        │ This is an allusion    │                         │ │
│ ├──────────────────────────┼────────────────────────┼─────────────────────────┤ │
│ │ In the face of           │  to be misunderstood.  │                   15.00 │ │
│ │ ambiguity, refuse        │ To be misunderstood is │                         │ │
│ └──────────────────────────┴────────────────────────┴─────────────────────────┘ │
╰─────────────────────────────────────────────────────────────────────────────────╯
100%|██████████████████████████████████████████████| 12/12 [00:14<00:00,  1.22s/it]

@qgallouedec
Copy link
Member Author

And with wandb_log_unique_prompts=True

Screenshot 2025-04-02 at 18 00 53

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very elegant refactor - LGTM!

# maxlen is set to the total number of forward passes per step. This value of `maxlen` ensures we log only the
# final optimization step.
maxlen = self.accelerator.num_processes * args.per_device_train_batch_size * args.gradient_accumulation_steps
self._textual_logs = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clever idea to use a queue!

@qgallouedec qgallouedec changed the title Simplify logging text 🎀 Simplify logging text Apr 5, 2025
@qgallouedec qgallouedec merged commit 17e33cd into main Apr 5, 2025
8 of 10 checks passed
@qgallouedec qgallouedec deleted the simplify-logging-text branch April 5, 2025 22:38
yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants