Tencent has released Hunyuan-Large, an open-source Transformer-based MoE model with a total of 389 billion parameters and 52 billion active parameters.
Hunyuan-Large is designed to handle large contexts of up to 256K tokens and rivals other leading models in performance.
The model incorporates technical advancements such as pre-training on diverse data, mixed expert routing strategy, key-value cache compression, and expert-specific learning rate.
Hunyuan-Large outperforms existing models on NLP tasks and addresses the need for long-context understanding in AI applications.