Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Abstract
While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs), its performance often falls short of Full Fine-Tuning (Full FT). Current methods optimize LoRA by initializing with static singular value decomposition (SVD) subsets, leading to suboptimal leveraging of pre-trained knowledge. Another path for improving LoRA is incorporating a Mixture-of-Experts (MoE) architecture. However, weight misalignment and complex gradient dynamics make it challenging to adopt SVD prior to the LoRA MoE architecture. To mitigate these issues, we propose Great LoRA Mixture-of-Expert (GOAT), a framework that (1) adaptively integrates relevant priors using an SVD-structured MoE, and (2) aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor. We demonstrate that proper scaling, without modifying the architecture or training algorithms, boosts LoRA MoE's efficiency and performance. Experiments across 25 datasets, including natural language understanding, commonsense reasoning, image classification, and natural language generation, demonstrate GOAT's state-of-the-art performance, closing the gap with Full FT.
Community
The paper proposes a framework called GOAT (Great LoRA Mixture-of-Experts), which combines LoRA with a Mixture-of-Experts (MoE) architecture. GOAT aims to improve performance by (1) adaptively integrating relevant priors using SVD-structured MoE and (2) aligning optimization with full fine-tuned MoE through a theoretical scaling factor. This approach enhances LoRA MoE's efficiency and performance without requiring changes to the architecture or training algorithms.
The paper reports experimental results across 25 datasets, covering tasks like natural language understanding, commonsense reasoning, image classification, and natural language generation, showing that GOAT achieves state-of-the-art performance and significantly reduces the gap with Full FT.
That's not how acronyms work.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- NLoRA: Nystr"om-Initiated Low-Rank Adaptation for Large Language Models (2025)
- LoRA-GGPO: Mitigating Double Descent in LoRA Fine-Tuning via Gradient-Guided Perturbation Optimization (2025)
- A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models (2025)
- BeamLoRA: Beam-Constraint Low-Rank Adaptation (2025)
- CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization (2025)
- Sparsity May Be All You Need: Sparse Random Parameter Adaptation (2025)
- DiffoRA: Enabling Parameter-Efficient LLM Fine-Tuning via Differential Low-Rank Matrix Adaptation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper