generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Labels
Description
Feature request
The existing GRPO Trainer is pretty rigid in its structure, only accepting an untokenized dataset, and handling tokenization internally. This is problematic for VLMs that often need to do additional work to construct the input (e.g. Qwen2VL
and Qwen2.5VL
both require some additional work - generally in collate_fn
- to inject image data).
There's probably other aspects of the design that need to be updated as well, but I'm not sure yet?
Thanks for your consideration!
Motivation
GRPO on VLMs would be very cool!
Your contribution
I'm going to try modifying the TRL source code to support GRPO-ing a Qwen2.5VL, we'll see how that goes.
robotrapta, korbinian-hoermann, BrianPulfer, LukeLIN-web, daniel3303 and 1 more