GRPOTrainer does not have a feature flag to prevent dataset shuffling

### Reproduction

In [GRPOTrainer](https://github.com/huggingface/trl/blob/491921c1a4167e7c84429382470b0bb3158e66b0/trl/trainer/grpo_trainer.py#L144):

```python
def __iter__(self):
    # E.g., [2, 4, 3, 1, 0, 6, 5] (num_samples = 7)
    indexes = torch.randperm(self.num_samples, generator=self.generator).tolist()

    #    [2, 4, 3, 1, 0, 6, 5]
    # -> [[2, 4, 3], [1, 0, 6], [5]]  (batch_size = 3)
    indexes = [indexes[i : i + self.batch_size] for i in range(0, len(indexes), self.batch_size)]

    #    [[2, 4, 3], [1, 0, 6], [5]]
    # -> [[2, 4, 3], [1, 0, 6]]
    indexes = [chunk for chunk in indexes if len(chunk) == self.batch_size]

    for chunk in indexes:
        for _ in range(self.repeat_count):
            for index in chunk:
                for _ in range(self.mini_repeat_count):
                    yield index

def __len__(self) -> int:
    return self.num_samples * self.mini_repeat_count * self.repeat_count
```

outputs:

Shuffled dataset even though I want to apply Curriculum Learning (no shuffling). I've pasted some code that works the way I want below, but would love any advice/help in potentially creating a PR for this issue since this would be my first time contributing, and I am unsure what other files would need to be changed (i.e. GRPOConfig, PretrainedConfig, etc.). I also want to keep the option for shuffling based on the Config class since others will likely want to retain the original behavior.

### Potential Solution:

Modify or create new class similar to `RepeatRandomShuffler` called `RepeatSequentialShuffler` that modifies the iteration logic as follow:

```python
def __iter__(self):
    # Sequential order, repeat each index `repeat_count` times
    indexes = [idx for idx in range(self.num_samples) for _ in range(self.repeat_count)]
    return iter(indexes)

def __len__(self):
    return self.num_samples * self.repeat_count
```

### System Info

- Platform: Linux-5.15.0-113-generic-x86_64-with-glibc2.35
- Python version: 3.12.9
- PyTorch version: 2.5.1
- CUDA device(s): NVIDIA A100-SXM4-80GB
- Transformers version: 4.47.1
- Accelerate version: 1.2.0
- Accelerate config: not found
- Datasets version: 3.1.0
- HF Hub version: 0.29.1
- TRL version: 0.12.1
- bitsandbytes version: 0.45.0
- DeepSpeed version: 0.16.1
- Diffusers version: 0.32.2
- Liger-Kernel version: 0.4.2
- LLM-Blender version: not installed
- OpenAI version: 1.64.0
- PEFT version: 0.14.0

### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPOTrainer does not have a feature flag to prevent dataset shuffling #2998

Reproduction

Potential Solution:

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPOTrainer does not have a feature flag to prevent dataset shuffling #2998

Description

Reproduction

Potential Solution:

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions