TypeError when not passing total_episodes in PPOv2Trainer

Hi! I've been attempting to run `examples/scripts/ppo/ppo_tldr.py` and encountered an "TypeError: 'float' object cannot be interpreted as an integer" on the line of `for update in range(1, args.num_updates + 1)`. After reading `ppov2_trainer.py`, I think the underlying issue originates from:

```python
        #########
        # calculate various batch sizes
        #########
        if args.total_episodes is None:  # allow the users to define episodes in terms of epochs.
            args.total_episodes = args.num_train_epochs * self.train_dataset_len
```

num_train_epochs is an float argument in `TraininingArguments`, which PPOv2Config inherits. I suppose we should use args.num_ppo_epochs here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TypeError when not passing total_episodes in PPOv2Trainer #1740

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TypeError when not passing total_episodes in PPOv2Trainer #1740

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions