library_name: vllm
language:
- en
- fr
- de
- es
- pt
- it
- ja
- ko
- ru
- zh
- ar
- fa
- id
- ms
- ne
- pl
- ro
- sr
- sv
- tr
- uk
- vi
- hi
- bn
license: apache-2.0
inference: false
base_model:
- mistralai/Mistral-Small-3.1-24B-Base-2503
- togethercomputer/mistral-3.2-instruct-2506
extra_gated_description: >-
If you want to learn more about how we process your personal data, please read
our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
tags:
- mistral-common
- quantized
model_type: mistral
quantization: bitsandbytes
Mistral-Small-3.2-24B-Instruct-2506 (Quantized)
This is a quantized version of togethercomputer/mistral-3.2-instruct-2506, optimized for reduced memory usage while maintaining performance.
Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.
Quantization Details
This model has been quantized to reduce memory requirements while preserving model quality. The quantization reduces the model size significantly compared to the original fp16/bf16 version.
Base Model Improvements
Small-3.2 improves in the following categories:
- Instruction following: Small-3.2 is better at following precise instructions
- Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
- Function calling: Small-3.2's function calling template is more robust
In all other categories Small-3.2 should match or slightly improve compared to Mistral-Small-3.1-24B-Instruct-2503.
Key Features
- same as Mistral-Small-3.1-24B-Instruct-2503
- Reduced memory footprint through quantization
- Optimized for inference with maintained quality
Usage
The quantized model can be used with the following frameworks;
Note 1: We recommend using a relatively low temperature, such as temperature=0.15
.
Note 2: Make sure to add a system prompt to the model to best tailor it to your needs.
Memory Requirements
This quantized version requires significantly less GPU memory than the original model:
- Original: ~55 GB of GPU RAM in bf16 or fp16
- Quantized: Reduced memory footprint (exact requirements depend on quantization method used)
License
This model inherits the same license as the base model: Apache-2.0
Original Model
For benchmark results and detailed usage examples, please refer to the original model: togethercomputer/mistral-3.2-instruct-2506