Skip to content

Conversation

jinqinn
Copy link
Contributor

@jinqinn jinqinn commented Jun 4, 2025

I encountered an error when training DeepSeek V3 with the latest code due to the TransformerConfig not including q_lora_rank, which is required for DeepSeek V3.

Error Message

(TaskRunner pid=1256989)   File "/workspace/verl/verl/single_controller/base/megatron/worker.py", line 69, in _init_hf_config_and_tf_config
(TaskRunner pid=1256989)     tf_config = hf_to_mcore_config(hf_config, dtype)
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/registry.py", line 131, in hf_to_mcore_config
(TaskRunner pid=1256989)     return MODEL_CONFIG_CONVERTER_REGISTRY[model](hf_config, dtype)
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/config_converter.py", line 210, in hf_to_mcore_config_dpskv3
(TaskRunner pid=1256989)     args = _get_base_transformer_config(
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/config_converter.py", line 85, in _get_base_transformer_config
(TaskRunner pid=1256989)     return TransformerConfig(**base_config)
(TaskRunner pid=1256989) TypeError: TransformerConfig.__init__() got an unexpected keyword argument 'q_lora_rank'

Solution

The hf_to_mcore_config_dpskv3 function should directly create an MLATransformerConfig instance instead of going through _get_base_transformer_config(), since DeepSeek V3 uses Multi-Latent Attention (MLA) which requires MLA-specific parameters.

**override_transformer_config_kwargs,
)
transformer_config = MLATransformerConfig(**args)
# Common parallel state parameters
Copy link
Collaborator

@ETOgaosion ETOgaosion Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this~

What about making an abstraction of using a _get_mla_transformer_config as a basic function for future models?

@ETOgaosion
Copy link
Collaborator

Could you consider my PR to your repo?
jinqinn#1

@jinqinn
Copy link
Contributor Author

jinqinn commented Jun 5, 2025

Could you consider my PR to your repo?
jinqinn#1

Done! Thanks for your updates.

@ETOgaosion
Copy link
Collaborator

@jinqinn I'm so sorry, seems that it's not a good idea to directly use your main branch to do rebase work, could you accept this PR jinqinn#2 to restore our work? and I will reopen this~

@jinqinn
Copy link
Contributor Author

jinqinn commented Jun 6, 2025

@jinqinn I'm so sorry, seems that it's not a good idea to directly use your main branch to do rebase work, could you accept this PR jinqinn#2 to restore our work? and I will reopen this~

done

@ETOgaosion
Copy link
Collaborator

ETOgaosion commented Jun 6, 2025

And maybe need a merge after approving? qaq

Also maybe rebase and merge, not Squash and merge~

@jinqinn
Copy link
Contributor Author

jinqinn commented Jun 6, 2025

And maybe need a merge after approving? qaq

Also maybe rebase and merge, not Squash and merge~

sry, i am not familiar with this. its ok now ?

@ETOgaosion
Copy link
Collaborator

Sure, successfully, thanks a lot!

@ETOgaosion ETOgaosion reopened this Jun 6, 2025
@ETOgaosion
Copy link
Collaborator

@ISEEKYAN Could you please help to review this refactor and make a double check~

@ISEEKYAN
Copy link
Contributor

ISEEKYAN commented Jun 9, 2025

@ISEEKYAN Could you please help to review this refactor and make a double check~

sure

@ISEEKYAN
Copy link
Contributor

ISEEKYAN commented Jun 9, 2025

@jinqinn It seams that the error you encountered has been fix in the latest main branch.
Now this PR is a refactor of _get_mla_transformer_config only. The refactor look good but I am not sure if it is necessary to abstract the _get_mla_transformer_config since deepseekv3 is the only model arch that use MLA now.

@ETOgaosion ETOgaosion changed the title fix TransformerConfig doesn't support q_lora_rank for DeepSeek V3 [megatron] fix: TransformerConfig doesn't support q_lora_rank for DeepSeek V3 Jun 10, 2025
@ETOgaosion ETOgaosion changed the title [megatron] fix: TransformerConfig doesn't support q_lora_rank for DeepSeek V3 [megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 Jun 10, 2025
@ETOgaosion ETOgaosion merged commit 2b5d66a into volcengine:main Jun 10, 2025
48 of 53 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jun 10, 2025
…pSeek V3 (volcengine#1836)

I encountered an error when training DeepSeek V3 with the latest code
due to the TransformerConfig not including q_lora_rank, which is
required for DeepSeek V3.

#### Error Message
```
(TaskRunner pid=1256989)   File "/workspace/verl/verl/single_controller/base/megatron/worker.py", line 69, in _init_hf_config_and_tf_config
(TaskRunner pid=1256989)     tf_config = hf_to_mcore_config(hf_config, dtype)
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/registry.py", line 131, in hf_to_mcore_config
(TaskRunner pid=1256989)     return MODEL_CONFIG_CONVERTER_REGISTRY[model](hf_config, dtype)
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/config_converter.py", line 210, in hf_to_mcore_config_dpskv3
(TaskRunner pid=1256989)     args = _get_base_transformer_config(
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/config_converter.py", line 85, in _get_base_transformer_config
(TaskRunner pid=1256989)     return TransformerConfig(**base_config)
(TaskRunner pid=1256989) TypeError: TransformerConfig.__init__() got an unexpected keyword argument 'q_lora_rank'
```

#### Solution
The `hf_to_mcore_config_dpskv3` function should directly create an
`MLATransformerConfig` instance instead of going through
`_get_base_transformer_config()`, since DeepSeek V3 uses Multi-Latent
Attention (MLA) which requires MLA-specific parameters.

---------

Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…pSeek V3 (volcengine#1836)

I encountered an error when training DeepSeek V3 with the latest code
due to the TransformerConfig not including q_lora_rank, which is
required for DeepSeek V3.

#### Error Message
```
(TaskRunner pid=1256989)   File "/workspace/verl/verl/single_controller/base/megatron/worker.py", line 69, in _init_hf_config_and_tf_config
(TaskRunner pid=1256989)     tf_config = hf_to_mcore_config(hf_config, dtype)
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/registry.py", line 131, in hf_to_mcore_config
(TaskRunner pid=1256989)     return MODEL_CONFIG_CONVERTER_REGISTRY[model](hf_config, dtype)
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/config_converter.py", line 210, in hf_to_mcore_config_dpskv3
(TaskRunner pid=1256989)     args = _get_base_transformer_config(
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/config_converter.py", line 85, in _get_base_transformer_config
(TaskRunner pid=1256989)     return TransformerConfig(**base_config)
(TaskRunner pid=1256989) TypeError: TransformerConfig.__init__() got an unexpected keyword argument 'q_lora_rank'
```

#### Solution
The `hf_to_mcore_config_dpskv3` function should directly create an
`MLATransformerConfig` instance instead of going through
`_get_base_transformer_config()`, since DeepSeek V3 uses Multi-Latent
Attention (MLA) which requires MLA-specific parameters.

---------

Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants