-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[misc] feat: spport rmpad/data-packing in FSDP with transformers #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Shall we add a supported model list and raise error if the model is not in the list? |
Try to avoid using log_probs_from_logits_response_rmpad because there is an unpad op inside. unpad is a cuda-blocking op. Instead, we can directly use unpad input_ids from the input |
I think this list depends on transformers lib. No sure where to get this list. I didn't find any doc about the feature in transformers. |
Simply add potential models in the CI. If the model passes CI, then add to the supported list. I guess we can target
|
Sure, I will write a new API for unpad input_ids |
Shall we add the test_transformers.py to CI? I didn't do it as I think it only depends on the transformers version and flash_attn version. So, I guess the goal for the CI is to test whether the latest transformers + flash_attn would break our implementation |
After this PR, we should set a minimum version of transformers |
input_ids_rmpad, indices, cu_seqlens, max_seqlen_in_batch = unpad_input( |
Could you try flash-attn < 2.7? I guess flash-attn 2.7 changes the API. @PeterSH6 We should either enforce the version in setup.py or adapt the API for flash-attn >= 2.7 |
Or we can move the unpad logic into verl in case future breaks. |
Thanks for your help! :) |
I think we should deal with both cases: greater than 2.7 and lower. A utility function would be enough for the two cases |
from verl.utils.torch_functional import prepare_input_for_rmpad I find that thie function is not used. And not in the codebase. Maybe should just delete this line? |
Sure, simply delete it. Sorry for the inconvenience |
It's OK. And there is another error I need help. |
|
||
set -e -x | ||
|
||
python3 tests/e2e/arithmetic_sequence/rl/main_trainer.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this CI only contains critic rmpad but not the actor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to the misalignment between digit completion task and the Qwen model. It may have some bug when using CharTokenizer
You are using an actual reward model? |
Could you toggle off reward model remove-padding for now? I guess we lack CI for model-based reward function. |
OK. Thanks~~~ |
Also, I guess you have to change AutoModelForTokenClassification to AutoModelForSequenceClassification in order to make non-rmpad version work. We will fix this as soon as possible and add a CI for this case. |
…cengine#91) * init commit of rmpad * add rmpad test * support rmpad in actor model * add test for value model * support rmpad in critic and rm * fix actor return and fix num_labels and clean not used rmpad * fix critic and benchmark * update script * fix critic * lint * fix util issue * fix unnecessary unpad * address issues * fix args * update test and update rmpad support model list * fix typo * fix typo and fix name * rename rmpad to rename padding * fix arch to model_type * add ci for e2e rmpad and fix typo * lint * fix ci * fix typo * update tests for customize tokenizer in actor * fix rmpad test * update requirement of transformers as hf_rollout may have issue
…cengine#91) * init commit of rmpad * add rmpad test * support rmpad in actor model * add test for value model * support rmpad in critic and rm * fix actor return and fix num_labels and clean not used rmpad * fix critic and benchmark * update script * fix critic * lint * fix util issue * fix unnecessary unpad * address issues * fix args * update test and update rmpad support model list * fix typo * fix typo and fix name * rename rmpad to rename padding * fix arch to model_type * add ci for e2e rmpad and fix typo * lint * fix ci * fix typo * update tests for customize tokenizer in actor * fix rmpad test * update requirement of transformers as hf_rollout may have issue
…ne#91) Co-authored-by: ZYHowell <yhzhuang@cmu.edu>
actor_rollout_ref.model.use_rmpad=True
+critic.model.use_rmpad=True \
+reward_model.model.use_rmpad=True
to enable rmpad for different models. Default set to FalseAutoModelForTokenClassification
for Value and Reward Model. Instead of using SeqenceClassificationlog_probs_from_logits_response_rmpad
Resolve: #53
Comparison using DeepSeek7b and GSM8k:
About 1.7x speedup compare to no rmpad (original cases)