File size: 2,527 Bytes
9492007
 
 
 
 
 
 
 
 
 
dfa71fc
9492007
6f677cc
9492007
 
 
 
 
 
 
736010e
4eb2935
9492007
 
 
 
4eb2935
 
9492007
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
tags:
- sparse
- fp8
- vllm
---

# Meta-Llama-3-8B-pruned_50.2of4-FP8

This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask. 
It was then quantized using [AutoFP8](https://github.com/neuralmagic/AutoFP8) to FP8 weights and activations with per-tensor scales, calibrated on UltraChat2k.

**Note:** The unquantized [Meta-Llama-3-8B-pruned_50.2of4-FP8](https://huggingface.co/nm-testing/SparseLlama-3-8B-pruned_50.2of4) is still a work in progress and subject to change. This FP8 model will be updated once the unquantized model is updated too.

## Evaluation Benchmark Results

Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard).

| Benchmark                                      | Meta-Llama-3-8B  | Meta-Llama-3-8B-pruned_50.2of4 | Meta-Llama-3-8B-pruned_50.2of4-FP8<br>(this model) |
|:----------------------------------------------:|:-----------:|:-----------------------------:|:-----------------------------:|
| [ARC-c](https://arxiv.org/abs/1911.01547)<br> 25-shot     | 59.47%       | 57.76%           | 58.02%            |
| [MMLU](https://arxiv.org/abs/2009.03300)<br> 5-shot       | 65.29%       | 60.44%           | 60.71%            |
| [HellaSwag](https://arxiv.org/abs/1905.07830)<br> 10-shot | 82.14%       | 79.97%           | 79.61%            |
| [WinoGrande](https://arxiv.org/abs/1907.10641)<br> 5-shot | 77.27%       | 77.19%           | 76.32%            |
| [GSM8K](https://arxiv.org/abs/2110.14168)<br> 5-shot      | 44.81%       | 47.92%           | 49.36%            |
| [TruthfulQA](https://arxiv.org/abs/2109.07958)<br> 0-shot | 43.96%       | 41.02%           | 40.82%            |
| **Average<br>Accuracy**                                   | **62.16%**   | **60.72%**       | **60.81%**        |
| **Recovery**                                              | **100%**     | **97.68%**       | **97.83%**        |


## Help

For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)