Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
Abstract
Grove MoE, a novel architecture with heterogeneous experts of varying sizes, improves computational efficiency and performance in large language models by dynamically activating parameters based on input complexity.
The Mixture of Experts (MoE) architecture is a cornerstone of modern state-of-the-art (SOTA) large language models (LLMs). MoE models facilitate scalability by enabling sparse parameter activation. However, traditional MoE architecture uses homogeneous experts of a uniform size, activating a fixed number of parameters irrespective of input complexity and thus limiting computational efficiency. To overcome this limitation, we introduce Grove MoE, a novel architecture incorporating experts of varying sizes, inspired by the heterogeneous big.LITTLE CPU architecture. This architecture features novel adjugate experts with a dynamic activation mechanism, enabling model capacity expansion while maintaining manageable computational overhead. Building on this architecture, we present GroveMoE-Base and GroveMoE-Inst, 33B-parameter LLMs developed by applying an upcycling strategy to the Qwen3-30B-A3B-Base model during mid-training and post-training. GroveMoE models dynamically activate 3.14-3.28B parameters based on token complexity and achieve performance comparable to SOTA open-source models of similar or even larger size.
Community
GroveMoE is an open-source family of large language models developed by the AGI Center, Ant Group Research that introduces Grove MoE, a new sparse architecture using adjugate experts for dynamic computation allocation.
With 33 B total parameters and 3.14–3.28 B active parameters per token, GroveMoE delivers state-of-the-art results across reasoning, mathematics, and code generation while keeping inference costs low.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs (2025)
- SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment (2025)
- Unveiling Super Experts in Mixture-of-Experts Large Language Models (2025)
- InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities (2025)
- Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling (2025)
- MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models (2025)
- Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper