-
neuralmagic/granite-3.1-2b-instruct-quantized.w4a16
Text Generation • Updated • 214 -
neuralmagic/granite-3.1-2b-instruct-quantized.w8a8
Text Generation • Updated • 179 -
neuralmagic/granite-3.1-8b-instruct-quantized.w4a16
Text Generation • Updated • 126 • 1 -
neuralmagic/granite-3.1-8b-instruct-quantized.w8a8
Text Generation • Updated • 139
Neural Magic
company
Verified
AI & ML interests
LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV
Recent Activity
View all activity
Organization Card
The Future of AI is Open
Neural Magic helps developers in accelerating deep learning performance using automated model compression technologies and inference engines. Download our compression-aware inference engines and open source tools for fast model inference.
- nm-vllm: Enterprise-ready inferencing system based on the open-source library, vLLM, for at-scale operationalization of performant open-source LLMs
- LLM Compressor: HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM
- DeepSparse: Inference runtime offering accelerated performance on CPUs and APIs to integrate ML into your application
In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.
Collections
13
2:4 sparse versions of Llama-3.1, including transfer learning
-
neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic
Text Generation • Updated • 32 • 1 -
neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4
Text Generation • Updated • 56 • 1 -
neuralmagic/Sparse-Llama-3.1-8B-2of4
Text Generation • Updated • 900 • 61 -
neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4
Text Generation • Updated • 22 • 1
spaces
8
Running
2
🔥
Quant Llms Text Generation
Quantized vs. Unquantized LLM: Text Generation Comparison
Running
on
CPU Upgrade
🏃
Llama 3 8B Chat Deepsparse
Sleeping
🏃
Llama 2 Sparse Transfer Chat Deepsparse
Runtime error
1
⚡
DeepSparse Sentiment Analysis
Runtime error
6
🏢
DeepSparse Named Entity Recognition
Running
on
CPU Upgrade
16
📚
Sparse Llama Gsm8k
models
263
neuralmagic/granite-3.1-2b-instruct-FP8-dynamic
Text Generation
•
Updated
•
34
neuralmagic/granite-3.1-2b-base-quantized.w8a8
Text Generation
•
Updated
•
15
neuralmagic/granite-3.1-2b-base-FP8-dynamic
Text Generation
•
Updated
•
14
neuralmagic/granite-3.1-8b-base-quantized.w8a8
Text Generation
•
Updated
•
7
neuralmagic/granite-3.1-8b-base-FP8-dynamic
Updated
neuralmagic/granite-3.1-8b-instruct-quantized.w8a8
Text Generation
•
Updated
•
139
neuralmagic/granite-3.1-8b-base-quantized.w4a16
Text Generation
•
Updated
•
24
neuralmagic/granite-3.1-2b-base-quantized.w4a16
Text Generation
•
Updated
•
23
neuralmagic/granite-3.1-8b-instruct-FP8-dynamic
Text Generation
•
Updated
•
40
neuralmagic/granite-3.1-8b-instruct-quantized.w4a16
Text Generation
•
Updated
•
126
•
1
datasets
12
neuralmagic/mmlu_it
Viewer
•
Updated
•
14k
•
114
neuralmagic/mmlu_fr
Viewer
•
Updated
•
14k
•
314
neuralmagic/mmlu_th
Viewer
•
Updated
•
14k
•
203
neuralmagic/mmlu_de
Viewer
•
Updated
•
14k
•
146
neuralmagic/mmlu_es
Viewer
•
Updated
•
14k
•
116
neuralmagic/mmlu_hi
Viewer
•
Updated
•
14k
•
122
neuralmagic/mmlu_pt
Viewer
•
Updated
•
14k
•
156
neuralmagic/quantized-llama-3.1-leaderboard-v2-evals
Viewer
•
Updated
•
247k
•
212
neuralmagic/quantized-llama-3.1-humaneval-evals
Viewer
•
Updated
•
73.8k
•
54
neuralmagic/quantized-llama-3.1-arena-hard-evals
Viewer
•
Updated
•
6k
•
110