-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 #1836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
**override_transformer_config_kwargs, | ||
) | ||
transformer_config = MLATransformerConfig(**args) | ||
# Common parallel state parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this~
What about making an abstraction of using a _get_mla_transformer_config
as a basic function for future models?
Could you consider my PR to your repo? |
Done! Thanks for your updates. |
And maybe need a merge after approving? qaq Also maybe rebase and merge, not Squash and merge~ |
sry, i am not familiar with this. its ok now ? |
Sure, successfully, thanks a lot! |
@ISEEKYAN Could you please help to review this refactor and make a double check~ |
sure |
@jinqinn It seams that the error you encountered has been fix in the latest main branch. |
…pSeek V3 (volcengine#1836) I encountered an error when training DeepSeek V3 with the latest code due to the TransformerConfig not including q_lora_rank, which is required for DeepSeek V3. #### Error Message ``` (TaskRunner pid=1256989) File "/workspace/verl/verl/single_controller/base/megatron/worker.py", line 69, in _init_hf_config_and_tf_config (TaskRunner pid=1256989) tf_config = hf_to_mcore_config(hf_config, dtype) (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/registry.py", line 131, in hf_to_mcore_config (TaskRunner pid=1256989) return MODEL_CONFIG_CONVERTER_REGISTRY[model](hf_config, dtype) (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/config_converter.py", line 210, in hf_to_mcore_config_dpskv3 (TaskRunner pid=1256989) args = _get_base_transformer_config( (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/config_converter.py", line 85, in _get_base_transformer_config (TaskRunner pid=1256989) return TransformerConfig(**base_config) (TaskRunner pid=1256989) TypeError: TransformerConfig.__init__() got an unexpected keyword argument 'q_lora_rank' ``` #### Solution The `hf_to_mcore_config_dpskv3` function should directly create an `MLATransformerConfig` instance instead of going through `_get_base_transformer_config()`, since DeepSeek V3 uses Multi-Latent Attention (MLA) which requires MLA-specific parameters. --------- Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
…pSeek V3 (volcengine#1836) I encountered an error when training DeepSeek V3 with the latest code due to the TransformerConfig not including q_lora_rank, which is required for DeepSeek V3. #### Error Message ``` (TaskRunner pid=1256989) File "/workspace/verl/verl/single_controller/base/megatron/worker.py", line 69, in _init_hf_config_and_tf_config (TaskRunner pid=1256989) tf_config = hf_to_mcore_config(hf_config, dtype) (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/registry.py", line 131, in hf_to_mcore_config (TaskRunner pid=1256989) return MODEL_CONFIG_CONVERTER_REGISTRY[model](hf_config, dtype) (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/config_converter.py", line 210, in hf_to_mcore_config_dpskv3 (TaskRunner pid=1256989) args = _get_base_transformer_config( (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/config_converter.py", line 85, in _get_base_transformer_config (TaskRunner pid=1256989) return TransformerConfig(**base_config) (TaskRunner pid=1256989) TypeError: TransformerConfig.__init__() got an unexpected keyword argument 'q_lora_rank' ``` #### Solution The `hf_to_mcore_config_dpskv3` function should directly create an `MLATransformerConfig` instance instead of going through `_get_base_transformer_config()`, since DeepSeek V3 uses Multi-Latent Attention (MLA) which requires MLA-specific parameters. --------- Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
I encountered an error when training DeepSeek V3 with the latest code due to the TransformerConfig not including q_lora_rank, which is required for DeepSeek V3.
Error Message
Solution
The
hf_to_mcore_config_dpskv3
function should directly create anMLATransformerConfig
instance instead of going through_get_base_transformer_config()
, since DeepSeek V3 uses Multi-Latent Attention (MLA) which requires MLA-specific parameters.