Base Models for Fine-tuning in (ICML 2025) Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation

This repository hosts the compressed base models used in the fine-tuning experiments from our ICML 2025 paper: Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation. The available models and formats are as follows.

Model Bits GPR (Groups Per Row)
Llama-3-8B INT4 1/2/4/8
Llama-2-7B INT4 1/2/4/8
Llama-7B INT4 1/2/4/8
Llama-13B INT4 1/2/4/8

For full details on how to reproduce the experiments, please refer to our GitHub repository:

๐Ÿ‘‰ https://github.com/LeanModels/SketchTune.

What is SketchTune?

SketchTune is a novel method for adapting large language models (LLMs) that focuses on reducing memory usage and improving speed while fine-tuning. Instead of adding low-rank adapters like LoRA or DoRA, it compresses the model's weights into compact, trainable "sketches" for downstream adaptation.

Key benefits:

  • Combines compression and adaptation - SketchTune trains directly on compressed representations, removing the need for separate adapters. This saves memory, improves model performance and speed.
  • Avoids low-rank limits - Low-rank adapters assume weight updates follow a low rank structure. SketchTune skips this assumption, using sketching to better capture complex changes in model weights.

Performance highlights:

  • Even with base models that are 2.6โ€“3.5ร— smaller, SketchTune outperforms LoRA, DoRA, and S2FT on commonsense and math reasoning benchmarks.
  • On the GSM8K math dataset, SketchTune achieves a 14.48% higher accuracy than LoftQ, while training 7.3ร— fewer parameters.

For a deep dive into how sketching works, including math details and extensive test results, check out our full paper: https://arxiv.org/abs/2410.06364.

Citation

If you find this work helpful, please consider citing our paper:

@inproceedings{
  zhang2025sketch,
  title={Sketch to Adapt: Fine-Tunable Sketches for Efficient {LLM} Adaptation},
  author={Tianyi Zhang and Junda Su and Aditya Desai and Oscar Wu and Zhaozhuo Xu and Anshumali Shrivastava},
  booktitle={Forty-second International Conference on Machine Learning},
  year={2025},
  url={https://openreview.net/forum?id=zZXOXhxO6I}
}
Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support