Papers
arxiv:2508.07785

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Published on Aug 11
· Submitted by hywu on Aug 12
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Grove MoE, a novel architecture with heterogeneous experts of varying sizes, improves computational efficiency and performance in large language models by dynamically activating parameters based on input complexity.

AI-generated summary

The Mixture of Experts (MoE) architecture is a cornerstone of modern state-of-the-art (SOTA) large language models (LLMs). MoE models facilitate scalability by enabling sparse parameter activation. However, traditional MoE architecture uses homogeneous experts of a uniform size, activating a fixed number of parameters irrespective of input complexity and thus limiting computational efficiency. To overcome this limitation, we introduce Grove MoE, a novel architecture incorporating experts of varying sizes, inspired by the heterogeneous big.LITTLE CPU architecture. This architecture features novel adjugate experts with a dynamic activation mechanism, enabling model capacity expansion while maintaining manageable computational overhead. Building on this architecture, we present GroveMoE-Base and GroveMoE-Inst, 33B-parameter LLMs developed by applying an upcycling strategy to the Qwen3-30B-A3B-Base model during mid-training and post-training. GroveMoE models dynamically activate 3.14-3.28B parameters based on token complexity and achieve performance comparable to SOTA open-source models of similar or even larger size.

Community

Paper author Paper submitter

GroveMoE is an open-source family of large language models developed by the AGI Center, Ant Group Research that introduces Grove MoE, a new sparse architecture using adjugate experts for dynamic computation allocation.
With 33 B total parameters and 3.14–3.28 B active parameters per token, GroveMoE delivers state-of-the-art results across reasoning, mathematics, and code generation while keeping inference costs low.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.07785 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.07785 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.07785 in a Space README.md to link it from this page.

Collections including this paper 3