arxiv:2508.07785

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Published on Aug 11

· Submitted by

hywu on Aug 12

Upvote

Authors:

Haoyuan Wu ,

Xiaodong Chen ,

Zhanchao Zhou ,

Abstract

Grove MoE, a novel architecture with heterogeneous experts of varying sizes, improves computational efficiency and performance in large language models by dynamically activating parameters based on input complexity.

AI-generated summary

The Mixture of Experts (MoE) architecture is a cornerstone of modern state-of-the-art (SOTA) large language models (LLMs). MoE models facilitate scalability by enabling sparse parameter activation. However, traditional MoE architecture uses homogeneous experts of a uniform size, activating a fixed number of parameters irrespective of input complexity and thus limiting computational efficiency. To overcome this limitation, we introduce Grove MoE, a novel architecture incorporating experts of varying sizes, inspired by the heterogeneous big.LITTLE CPU architecture. This architecture features novel adjugate experts with a dynamic activation mechanism, enabling model capacity expansion while maintaining manageable computational overhead. Building on this architecture, we present GroveMoE-Base and GroveMoE-Inst, 33B-parameter LLMs developed by applying an upcycling strategy to the Qwen3-30B-A3B-Base model during mid-training and post-training. GroveMoE models dynamically activate 3.14-3.28B parameters based on token complexity and achieve performance comparable to SOTA open-source models of similar or even larger size.

View arXiv page View PDF GitHub 14 Add to collection

Community

hywu

Paper author Paper submitter 4 days ago

GroveMoE is an open-source family of large language models developed by the AGI Center, Ant Group Research that introduces Grove MoE, a new sparse architecture using adjugate experts for dynamic computation allocation.
With 33 B total parameters and 3.14–3.28 B active parameters per token, GroveMoE delivers state-of-the-art results across reasoning, mathematics, and code generation while keeping inference costs low.