You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I've been attempting to run examples/scripts/ppo/ppo_tldr.py and encountered an "TypeError: 'float' object cannot be interpreted as an integer" on the line of for update in range(1, args.num_updates + 1). After reading ppov2_trainer.py, I think the underlying issue originates from:
########## calculate various batch sizes#########ifargs.total_episodesisNone: # allow the users to define episodes in terms of epochs.args.total_episodes=args.num_train_epochs*self.train_dataset_len
num_train_epochs is an float argument in TraininingArguments, which PPOv2Config inherits. I suppose we should use args.num_ppo_epochs here.