metadata

base_model:
  - NousResearch/Llama-2-7b-hf
  - NousResearch/Meta-Llama-3-8B-Instruct
  - NousResearch/Llama-2-13b-hf
  - NousResearch/Meta-Llama-3.1-8B
language:
  - en
license: mit
pipeline_tag: other
library_name: transformers

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

Abstract

Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We propose a lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset of weights for loss and gradient computation. Also, we systematically analyze and address sensitivity across layers, tokens, and training steps. Our experimental results show that SparseLoRA reduces computational cost by up to 2.2 times and a measured speedup of up to 1.6 times while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following.

This repository contains the pre-computed SVD predictors for all 4 models used in our paper. By default, the required predictors are downloaded to your local machine when you first launch the training script.

We have precomputed the SVD predictors at Rank 8 for the following models, as used in the main paper:

"NousResearch/Llama-2-7b-hf"
"NousResearch/Llama-2-13b-hf"
"NousResearch/Meta-Llama-3-8B-Instruct"
"NousResearch/Meta-Llama-3.1-8B"

Quick Start for SparseLoRA

from transformers import AutoModelForCausalLM, Trainer
from peft import get_peft_model, LoraConfig
from spft.api import SPFTConfig, get_spft_model

##* Load LoRA + LLM
model = get_peft_model(model: AutoModelForCausalLM, lora_cfg: LoraConfig)

##* Load Sparse Fine Tuning Config
spft_config = SPFTConfig.from_file("configs/sparsity/llama3-8b-math10k.yaml")

##* Apply SparseLoRA Patches (SVD sparsity estimator & liger-kernel optimizations)
model = get_spft_model(model, spft_config)

##* Launch Sparse Training:
trainer = Trainer(
    model=model,
    ...
)
trainer.train()