[New Model]: Support Tencent-Hunyuan-Large

### The model to consider.

https://huggingface.co/tencent/Tencent-Hunyuan-Large

Tencent released a 389B MoE with only 52B activated parameters which beats the Llama 3.1 405B.
There are three checkpoints in the model card: Pretrain, Instruct, and Instruct-FP8 (AutoFP8 format)

Some notable features of the model:

- **High-Quality Synthetic Data**: By enhancing training with synthetic data, Hunyuan-Large can learn richer representations, handle long-context inputs, and generalize better to unseen data.

- **KV Cache Compression**: Utilizes Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies to significantly reduce memory usage and computational overhead of KV caches, improving inference throughput.

- **Expert-Specific Learning Rate Scaling**: Sets different learning rates for different experts to ensure each sub-model effectively learns from the data and contributes to overall performance.

- **Long-Context Processing Capability**: The pre-trained model supports text sequences up to 256K, and the Instruct model supports up to 128K, significantly enhancing the ability to handle long-context tasks.

- **Extensive Benchmarking**: Conducts extensive experiments across various languages and tasks to validate the practical effectiveness and safety of Hunyuan-Large.

I think the inclusion of Cross-Layer Attention (CLA) described in https://arxiv.org/abs/2405.12981 and by [Character.AI](https://research.character.ai/optimizing-inference/?ref=blog.character.ai) is the most interesting element.

### The closest model vllm already supports.

Since there is a shared expert at each MoE MLP, I think DeepSeekV2 is the closest comparison.

### What's your difficulty of supporting the model you want?

Medium to high difficulty. I believe the difficulty lies with supporting CLA, where most other feature should already be implementable.

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[New Model]: Support Tencent-Hunyuan-Large #10043

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[New Model]: Support Tencent-Hunyuan-Large #10043

Description

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions