Skip to content

Conversation

PeterSH6
Copy link
Collaborator

@PeterSH6 PeterSH6 commented Jan 10, 2025

  • Use actor_rollout_ref.model.use_rmpad=True + critic.model.use_rmpad=True \ + reward_model.model.use_rmpad=True to enable rmpad for different models. Default set to False
  • Using AutoModelForTokenClassification for Value and Reward Model. Instead of using SeqenceClassification
  • Compute logprob convert to log_probs_from_logits_response_rmpad

Resolve: #53

Comparison using DeepSeek7b and GSM8k:
About 1.7x speedup compare to no rmpad (original cases)

@vermouth1992
Copy link
Collaborator

Shall we add a supported model list and raise error if the model is not in the list?

@vermouth1992
Copy link
Collaborator

Try to avoid using log_probs_from_logits_response_rmpad because there is an unpad op inside. unpad is a cuda-blocking op. Instead, we can directly use unpad input_ids from the input

@PeterSH6
Copy link
Collaborator Author

I think this list depends on transformers lib. No sure where to get this list. I didn't find any doc about the feature in transformers.

@vermouth1992
Copy link
Collaborator

vermouth1992 commented Jan 10, 2025

I think this list depends on transformers lib. No sure where to get this list. I didn't find any doc about the feature in transformers.

Simply add potential models in the CI. If the model passes CI, then add to the supported list. I guess we can target

  • Llama
  • Mistral
  • QWen
  • Gemma

@PeterSH6
Copy link
Collaborator Author

Try to avoid using log_probs_from_logits_response_rmpad because there is an unpad op inside. unpad is a cuda-blocking op. Instead, we can directly use unpad input_ids from the input

Sure, I will write a new API for unpad input_ids

@PeterSH6
Copy link
Collaborator Author

Simply add potential models in the CI. If the model passes CI, then add to the supported list. I guess we can target

Shall we add the test_transformers.py to CI? I didn't do it as I think it only depends on the transformers version and flash_attn version.

So, I guess the goal for the CI is to test whether the latest transformers + flash_attn would break our implementation

@vermouth1992
Copy link
Collaborator

Simply add potential models in the CI. If the model passes CI, then add to the supported list. I guess we can target

Shall we add the test_transformers.py to CI? I didn't do it as I think it only depends on the transformers version and flash_attn version.

So, I guess the goal for the CI is to test whether the latest transformers + flash_attn would break our implementation

After this PR, we should set a minimum version of transformers

@vermouth1992 vermouth1992 merged commit 569210e into volcengine:main Jan 11, 2025
7 checks passed
@jeremyyx
Copy link

input_ids_rmpad, indices, cu_seqlens, max_seqlen_in_batch = unpad_input(
ValueError: too many values to unpack (expected 4)
When using hh-rlhf with rmpadding=True

@vermouth1992
Copy link
Collaborator

Could you try flash-attn < 2.7? I guess flash-attn 2.7 changes the API. @PeterSH6 We should either enforce the version in setup.py or adapt the API for flash-attn >= 2.7

@vermouth1992
Copy link
Collaborator

Or we can move the unpad logic into verl in case future breaks.

@jeremyyx
Copy link

Or we can move the unpad logic into verl in case future breaks.

Thanks for your help! :)
My flash_attn version is 2.7.2. I will try to downgrade it.

@PeterSH6
Copy link
Collaborator Author

I think we should deal with both cases: greater than 2.7 and lower.

A utility function would be enough for the two cases

@jeremyyx
Copy link

from verl.utils.torch_functional import prepare_input_for_rmpad
ImportError: cannot import name 'prepare_input_for_rmpad' from 'verl.utils.torch_functional' (/verl_0111/verl/utils/torch_functional.py)

I find that thie function is not used. And not in the codebase. Maybe should just delete this line?

@vermouth1992
Copy link
Collaborator

Sure, simply delete it. Sorry for the inconvenience

@jeremyyx
Copy link

It's OK. And there is another error I need help.
File "/home/verl_0111/verl/single_controller/ray/base.py", line 399, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
File "/home/verl_0111/verl/single_controller/base/decorator.py", line 404, in inner
return func(*args, **kwargs)
File "/home/verl_0111/verl/workers/fsdp_workers.py", line 853, in compute_rm_score
token_level_scores = self._expand_to_token_level(data, scores)
File "/home/verl_0111/verl/workers/fsdp_workers.py", line 775, in _expand_to_token_level
token_level_scores[torch.arange(batch_size), eos_mask_idx] = scores
RuntimeError: shape mismatch: value tensor of shape [128, 2048] cannot be broadcast to indexing result of shape [128]


set -e -x

python3 tests/e2e/arithmetic_sequence/rl/main_trainer.py \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this CI only contains critic rmpad but not the actor?

Copy link
Collaborator Author

@PeterSH6 PeterSH6 Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to the misalignment between digit completion task and the Qwen model. It may have some bug when using CharTokenizer

@vermouth1992
Copy link
Collaborator

You are using an actual reward model?

@vermouth1992
Copy link
Collaborator

Could you toggle off reward model remove-padding for now? I guess we lack CI for model-based reward function.

@jeremyyx
Copy link

OK. Thanks~~~

@vermouth1992
Copy link
Collaborator

Also, I guess you have to change AutoModelForTokenClassification to AutoModelForSequenceClassification in order to make non-rmpad version work. We will fix this as soon as possible and add a CI for this case.

@PeterSH6 PeterSH6 mentioned this pull request Jan 16, 2025
33 tasks
yuchenwang3 pushed a commit to yuchenwang3/verl that referenced this pull request Apr 25, 2025
…cengine#91)

* init commit of rmpad

* add rmpad test

* support rmpad in actor model

* add test for value model

* support rmpad in critic and rm

* fix actor return and fix num_labels and clean not used rmpad

* fix critic and benchmark

* update script

* fix critic

* lint

* fix util issue

* fix unnecessary unpad

* address issues

* fix args

* update test and update rmpad support model list

* fix typo

* fix typo and fix name

* rename rmpad to rename padding

* fix arch to model_type

* add ci for e2e rmpad and fix typo

* lint

* fix ci

* fix typo

* update tests for customize tokenizer in actor

* fix rmpad test

* update requirement of transformers as hf_rollout may have issue
histmeisah pushed a commit to SJTU-IAAR/verl that referenced this pull request Apr 27, 2025
…cengine#91)

* init commit of rmpad

* add rmpad test

* support rmpad in actor model

* add test for value model

* support rmpad in critic and rm

* fix actor return and fix num_labels and clean not used rmpad

* fix critic and benchmark

* update script

* fix critic

* lint

* fix util issue

* fix unnecessary unpad

* address issues

* fix args

* update test and update rmpad support model list

* fix typo

* fix typo and fix name

* rename rmpad to rename padding

* fix arch to model_type

* add ci for e2e rmpad and fix typo

* lint

* fix ci

* fix typo

* update tests for customize tokenizer in actor

* fix rmpad test

* update requirement of transformers as hf_rollout may have issue
kaiyliu pushed a commit to kaiyliu/knowl_verl that referenced this pull request Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do we have plans for data packing?
4 participants