metadata

base_model: meta-llama/Meta-Llama-3-8B
inference: true
model_type: llama
pipeline_tag: text-generation
tags:
  - sparse

SparseLlama-3-8B-pruned_50.2of4

This repo contains model files for a 2:4 (N:M) sparse Meta-Llama-3-8B model pruned in one-shot with SparseGPT, and then additionally retrained with the SquareHead knowledge distillation while maintaining the 2:4 sparsity mask.

Note: This is still a work in progress and subject to change. We expect to release new weights with even better accuracy soon.

Running the model

# pip install transformers accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nm-testing/SparseLlama-3-8B-pruned_50.2of4")
model = AutoModelForCausalLM.from_pretrained("nm-testing/SparseLlama-3-8B-pruned_50.2of4", device_map="auto")

input_text = "A poem about Machine Learning goes as follows:"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Evaluation Benchmark Results

Model evaluation results obtained via lm-evaluation-harness following the configuration of Open LLM Leaderboard.

Benchmark	Meta-Llama-3-8B	SparseLlama-3-8B-pruned_50.2of4 (this model)
ARC-c 25-shot	59.47%	57.76%
MMLU 5-shot	65.29%	60.44%
HellaSwag 10-shot	82.14%	79.97%
WinoGrande 5-shot	77.27%	77.19%
GSM8K 5-shot	44.81%	47.92%
TruthfulQA 0-shot	43.96%	41.02%
Average Accuracy	62.16%	60.72%
Recovery	100%	97.68%

Model evaluation results obtained via Mosaic Eval Gauntlet following the configuration of Eval Gauntlet v0.3.

Benchmark	Meta-Llama-3-8B	SparseLlama-3-8B-pruned_50.2of4 (this model)
World Knowledge	58.08%	54.61%
Commonsense Reasoning	47.66%	47.62%
Language Understanding	71.13%	67.58%
Symbolic Problem Solving	38.44%	32.15%
Reading Comprehension	57.48%	55.76%
Average Accuracy	54.70%	51.54%
Recovery	100%	94.22%

Help

For further support, and discussions on these models and AI in general, join Neural Magic's Slack Community

Acknowledgment

This model is built with Meta Llama 3. For more details on its licence please check the model card of Meta-Llama-3-8B.