metadata

pipeline_tag: text-generation
base_model: nota-ai/st-vicuna-v1.3-5.5b-ppl
library_name: transformers
tags:
  - llama

QuantFactory/st-vicuna-v1.3-5.5b-ppl-GGUF

This is quantized version of nota-ai/st-vicuna-v1.3-5.5b-ppl created using llama.cpp

Model Description

Shortened LLaMA Model Card

Shortened LLaMA is a depth-pruned version of LLaMA models & variants for efficient text generation.

Developed by: Nota AI
License: Non-commercial license
Repository: https://github.com/Nota-NetsPresso/shortened-llm
Paper: https://arxiv.org/abs/2402.02834

Compression Method

After identifying unimportant Transformer blocks, we perform one-shot pruning and light LoRA-based retraining.

Click to see a method figure.

Model Links

Source Model	Pruning Ratio	Pruning Criterion	HF Models Link
LLaMA-1-7B	20%	PPL	nota-ai/st-llama-1-5.5b-ppl
LLaMA-1-7B	20%	Taylor+	nota-ai/st-llama-1-5.5b-taylor
Vicuna-v1.3-7B	20%	PPL	nota-ai/st-vicuna-v1.3-5.5b-ppl
Vicuna-v1.3-7B	20%	Taylor+	nota-ai/st-vicuna-v1.3-5.5b-taylor
Vicuna-v1.3-13B	21%	PPL	nota-ai/st-vicuna-v1.3-10.5b-ppl
Vicuna-v1.3-13B	21%	Taylor+	nota-ai/st-vicuna-v1.3-10.5b-taylor

Zero-shot Performance & Efficiency Results

EleutherAI/lm-evaluation-harness version 3326c54

License

All rights related to this repository and the compressed models are reserved by Nota Inc.
The intended use is strictly limited to research and non-commercial projects.

Model Acknowledgments

LLM-Pruner, which utilizes LM Evaluation Harness, PEFT, and Alpaca-LoRA. Thanks for the pioneering work on structured pruning of LLMs!
Meta AI's LLaMA and LMSYS Org's Vicuna. Thanks for the open-source LLMs!

Original Model Citation

@article{kim2024shortened,
  title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
  author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
  journal={arXiv preprint arXiv:2402.02834},      
  year={2024},
  url={https://arxiv.org/abs/2402.02834}
}

@article{kim2024mefomo,
  title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
  author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
  journal={ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)},
  year={2024},
  url={https://openreview.net/forum?id=18VGxuOdpu}
}