🚀 Optimized Models: torchao & Pruna Quantization
Collection
Quantized Models using torchao & Pruna for efficient inference and deployment.
•
6 items
•
Updated
•
1
SmolLM3‑3B • Quantized
This is an int8 quantized version of SmolLM3–3B, a highly efficient, open-source 3 B parameter LLM. It delivers nearly state-of-the-art multilingual reasoning and long-context performance (up to 128k tokens) with drastically reduced memory usage and inference cost, enabling fast deployment on mid‑range GPUs and edge devices.
Ideal for:
Base model
HuggingFaceTB/SmolLM3-3B-Base