--- tags: - sparse - fp8 - vllm --- # Meta-Llama-3-8B-pruned_50.2of4-FP8 This repo contains model files for a 2:4 (N:M) sparse [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) model pruned in one-shot with [SparseGPT](https://arxiv.org/abs/2301.00774), and then additionally retrained with the [SquareHead](https://arxiv.org/abs/2310.06927) knowledge distillation while maintaining the 2:4 sparsity mask. It was then quantized using [AutoFP8](https://github.com/neuralmagic/AutoFP8) to FP8 weights and activations with per-tensor scales, calibrated on UltraChat2k. **Note:** The unquantized [Meta-Llama-3-8B-pruned_50.2of4-FP8](https://huggingface.co/nm-testing/SparseLlama-3-8B-pruned_50.2of4) is still a work in progress and subject to change. This FP8 model will be updated once the unquantized model is updated too. ## Evaluation Benchmark Results Model evaluation results obtained via [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) following the configuration of [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard). | Benchmark | Meta-Llama-3-8B | Meta-Llama-3-8B-pruned_50.2of4 | Meta-Llama-3-8B-pruned_50.2of4-FP8
(this model) | |:----------------------------------------------:|:-----------:|:-----------------------------:|:-----------------------------:| | [ARC-c](https://arxiv.org/abs/1911.01547)
25-shot | 59.47% | 57.76% | 58.02% | | [MMLU](https://arxiv.org/abs/2009.03300)
5-shot | 65.29% | 60.44% | 60.71% | | [HellaSwag](https://arxiv.org/abs/1905.07830)
10-shot | 82.14% | 79.97% | 79.61% | | [WinoGrande](https://arxiv.org/abs/1907.10641)
5-shot | 77.27% | 77.19% | 76.32% | | [GSM8K](https://arxiv.org/abs/2110.14168)
5-shot | 44.81% | 47.92% | 49.36% | | [TruthfulQA](https://arxiv.org/abs/2109.07958)
0-shot | 43.96% | 41.02% | 40.82% | | **Average
Accuracy** | **62.16%** | **60.72%** | **60.81%** | | **Recovery** | **100%** | **97.68%** | **97.83%** | ## Help For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)