[megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 #1836

jinqinn · 2025-06-04T06:22:22Z

I encountered an error when training DeepSeek V3 with the latest code due to the TransformerConfig not including q_lora_rank, which is required for DeepSeek V3.

Error Message

(TaskRunner pid=1256989)   File "/workspace/verl/verl/single_controller/base/megatron/worker.py", line 69, in _init_hf_config_and_tf_config
(TaskRunner pid=1256989)     tf_config = hf_to_mcore_config(hf_config, dtype)
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/registry.py", line 131, in hf_to_mcore_config
(TaskRunner pid=1256989)     return MODEL_CONFIG_CONVERTER_REGISTRY[model](hf_config, dtype)
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/config_converter.py", line 210, in hf_to_mcore_config_dpskv3
(TaskRunner pid=1256989)     args = _get_base_transformer_config(
(TaskRunner pid=1256989)   File "/workspace/verl/verl/models/mcore/config_converter.py", line 85, in _get_base_transformer_config
(TaskRunner pid=1256989)     return TransformerConfig(**base_config)
(TaskRunner pid=1256989) TypeError: TransformerConfig.__init__() got an unexpected keyword argument 'q_lora_rank'

Solution

The hf_to_mcore_config_dpskv3 function should directly create an MLATransformerConfig instance instead of going through _get_base_transformer_config(), since DeepSeek V3 uses Multi-Latent Attention (MLA) which requires MLA-specific parameters.

ETOgaosion · 2025-06-04T08:55:13Z

verl/models/mcore/config_converter.py

-        **override_transformer_config_kwargs,
-    )
-    transformer_config = MLATransformerConfig(**args)
+    # Common parallel state parameters


Thanks for fixing this~

What about making an abstraction of using a _get_mla_transformer_config as a basic function for future models?

ETOgaosion · 2025-06-05T06:05:54Z

Could you consider my PR to your repo?
jinqinn#1

jinqinn · 2025-06-05T07:25:27Z

Could you consider my PR to your repo?
jinqinn#1

Done! Thanks for your updates.

ETOgaosion · 2025-06-06T13:29:34Z

@jinqinn I'm so sorry, seems that it's not a good idea to directly use your main branch to do rebase work, could you accept this PR jinqinn#2 to restore our work? and I will reopen this~

jinqinn · 2025-06-06T13:54:42Z

@jinqinn I'm so sorry, seems that it's not a good idea to directly use your main branch to do rebase work, could you accept this PR jinqinn#2 to restore our work? and I will reopen this~

done

ETOgaosion · 2025-06-06T13:55:23Z

And maybe need a merge after approving? qaq

Also maybe rebase and merge, not Squash and merge~

jinqinn · 2025-06-06T14:06:43Z

And maybe need a merge after approving? qaq

Also maybe rebase and merge, not Squash and merge~

sry, i am not familiar with this. its ok now ?

ETOgaosion · 2025-06-06T14:08:15Z

Sure, successfully, thanks a lot!

ETOgaosion · 2025-06-09T05:45:42Z

@ISEEKYAN Could you please help to review this refactor and make a double check~

ISEEKYAN · 2025-06-09T05:50:10Z

@ISEEKYAN Could you please help to review this refactor and make a double check~

sure

ISEEKYAN · 2025-06-09T06:01:39Z

@jinqinn It seams that the error you encountered has been fix in the latest main branch.
Now this PR is a refactor of _get_mla_transformer_config only. The refactor look good but I am not sure if it is necessary to abstract the _get_mla_transformer_config since deepseekv3 is the only model arch that use MLA now.

…pSeek V3 (volcengine#1836) I encountered an error when training DeepSeek V3 with the latest code due to the TransformerConfig not including q_lora_rank, which is required for DeepSeek V3. #### Error Message ``` (TaskRunner pid=1256989) File "/workspace/verl/verl/single_controller/base/megatron/worker.py", line 69, in _init_hf_config_and_tf_config (TaskRunner pid=1256989) tf_config = hf_to_mcore_config(hf_config, dtype) (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/registry.py", line 131, in hf_to_mcore_config (TaskRunner pid=1256989) return MODEL_CONFIG_CONVERTER_REGISTRY[model](hf_config, dtype) (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/config_converter.py", line 210, in hf_to_mcore_config_dpskv3 (TaskRunner pid=1256989) args = _get_base_transformer_config( (TaskRunner pid=1256989) File "/workspace/verl/verl/models/mcore/config_converter.py", line 85, in _get_base_transformer_config (TaskRunner pid=1256989) return TransformerConfig(**base_config) (TaskRunner pid=1256989) TypeError: TransformerConfig.__init__() got an unexpected keyword argument 'q_lora_rank' ``` #### Solution The `hf_to_mcore_config_dpskv3` function should directly create an `MLATransformerConfig` instance instead of going through `_get_base_transformer_config()`, since DeepSeek V3 uses Multi-Latent Attention (MLA) which requires MLA-specific parameters. --------- Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>

ETOgaosion reviewed Jun 4, 2025

View reviewed changes

ETOgaosion closed this Jun 6, 2025

ETOgaosion force-pushed the main branch from 8b5527d to 3d5f15f Compare June 6, 2025 13:22

jimmy.qin and others added 4 commits June 6, 2025 22:05

fix TransformerConfig doesn't support q_lora_rank for DeepSeek V3

6c9171a

refactor

094f735

missing comments

61c2ef0

try fix error config

d54992e

ETOgaosion reopened this Jun 6, 2025

fix a rotary type

6bac448

ETOgaosion changed the title ~~fix TransformerConfig doesn't support q_lora_rank for DeepSeek V3~~ [megatron] fix: TransformerConfig doesn't support q_lora_rank for DeepSeek V3 Jun 10, 2025

ETOgaosion changed the title ~~[megatron] fix: TransformerConfig doesn't support q_lora_rank for DeepSeek V3~~ [megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 Jun 10, 2025

ETOgaosion approved these changes Jun 10, 2025

View reviewed changes

ETOgaosion merged commit 2b5d66a into volcengine:main Jun 10, 2025
48 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 #1836

[megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 #1836

Uh oh!

jinqinn commented Jun 4, 2025 •

edited

Loading

Uh oh!

ETOgaosion Jun 4, 2025 •

edited

Loading

Uh oh!

ETOgaosion commented Jun 5, 2025

Uh oh!

jinqinn commented Jun 5, 2025

Uh oh!

ETOgaosion commented Jun 6, 2025

Uh oh!

jinqinn commented Jun 6, 2025

Uh oh!

ETOgaosion commented Jun 6, 2025 •

edited

Loading

Uh oh!

jinqinn commented Jun 6, 2025

Uh oh!

ETOgaosion commented Jun 6, 2025

Uh oh!

ETOgaosion commented Jun 9, 2025

Uh oh!

ISEEKYAN commented Jun 9, 2025

Uh oh!

ISEEKYAN commented Jun 9, 2025

Uh oh!

Uh oh!

Uh oh!

[megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 #1836

[megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 #1836

Uh oh!

Conversation

jinqinn commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Error Message

Solution

Uh oh!

ETOgaosion Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ETOgaosion commented Jun 5, 2025

Uh oh!

jinqinn commented Jun 5, 2025

Uh oh!

ETOgaosion commented Jun 6, 2025

Uh oh!

jinqinn commented Jun 6, 2025

Uh oh!

ETOgaosion commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jinqinn commented Jun 6, 2025

Uh oh!

ETOgaosion commented Jun 6, 2025

Uh oh!

ETOgaosion commented Jun 9, 2025

Uh oh!

ISEEKYAN commented Jun 9, 2025

Uh oh!

ISEEKYAN commented Jun 9, 2025

Uh oh!

Uh oh!

Uh oh!

jinqinn commented Jun 4, 2025 •

edited

Loading

ETOgaosion Jun 4, 2025 •

edited

Loading

ETOgaosion commented Jun 6, 2025 •

edited

Loading