-
-
Notifications
You must be signed in to change notification settings - Fork 10k
Description
The model to consider.
https://huggingface.co/tencent/Tencent-Hunyuan-Large
Tencent released a 389B MoE with only 52B activated parameters which beats the Llama 3.1 405B.
There are three checkpoints in the model card: Pretrain, Instruct, and Instruct-FP8 (AutoFP8 format)
Some notable features of the model:
-
High-Quality Synthetic Data: By enhancing training with synthetic data, Hunyuan-Large can learn richer representations, handle long-context inputs, and generalize better to unseen data.
-
KV Cache Compression: Utilizes Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies to significantly reduce memory usage and computational overhead of KV caches, improving inference throughput.
-
Expert-Specific Learning Rate Scaling: Sets different learning rates for different experts to ensure each sub-model effectively learns from the data and contributes to overall performance.
-
Long-Context Processing Capability: The pre-trained model supports text sequences up to 256K, and the Instruct model supports up to 128K, significantly enhancing the ability to handle long-context tasks.
-
Extensive Benchmarking: Conducts extensive experiments across various languages and tasks to validate the practical effectiveness and safety of Hunyuan-Large.
I think the inclusion of Cross-Layer Attention (CLA) described in https://arxiv.org/abs/2405.12981 and by Character.AI is the most interesting element.
The closest model vllm already supports.
Since there is a shared expert at each MoE MLP, I think DeepSeekV2 is the closest comparison.
What's your difficulty of supporting the model you want?
Medium to high difficulty. I believe the difficulty lies with supporting CLA, where most other feature should already be implementable.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.