Skip to content

[New Model]: Support Tencent-Hunyuan-Large #10043

@mgoin

Description

@mgoin

The model to consider.

https://huggingface.co/tencent/Tencent-Hunyuan-Large

Tencent released a 389B MoE with only 52B activated parameters which beats the Llama 3.1 405B.
There are three checkpoints in the model card: Pretrain, Instruct, and Instruct-FP8 (AutoFP8 format)

Some notable features of the model:

  • High-Quality Synthetic Data: By enhancing training with synthetic data, Hunyuan-Large can learn richer representations, handle long-context inputs, and generalize better to unseen data.

  • KV Cache Compression: Utilizes Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies to significantly reduce memory usage and computational overhead of KV caches, improving inference throughput.

  • Expert-Specific Learning Rate Scaling: Sets different learning rates for different experts to ensure each sub-model effectively learns from the data and contributes to overall performance.

  • Long-Context Processing Capability: The pre-trained model supports text sequences up to 256K, and the Instruct model supports up to 128K, significantly enhancing the ability to handle long-context tasks.

  • Extensive Benchmarking: Conducts extensive experiments across various languages and tasks to validate the practical effectiveness and safety of Hunyuan-Large.

I think the inclusion of Cross-Layer Attention (CLA) described in https://arxiv.org/abs/2405.12981 and by Character.AI is the most interesting element.

The closest model vllm already supports.

Since there is a shared expert at each MoE MLP, I think DeepSeekV2 is the closest comparison.

What's your difficulty of supporting the model you want?

Medium to high difficulty. I believe the difficulty lies with supporting CLA, where most other feature should already be implementable.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    new-modelRequests to new modelsstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions