metadata

library_name: vllm
language:
  - en
  - fr
  - de
  - es
  - pt
  - it
  - ja
  - ko
  - ru
  - zh
  - ar
  - fa
  - id
  - ms
  - ne
  - pl
  - ro
  - sr
  - sv
  - tr
  - uk
  - vi
  - hi
  - bn
license: apache-2.0
inference: false
base_model:
  - mistralai/Mistral-Small-3.1-24B-Base-2503
  - togethercomputer/mistral-3.2-instruct-2506
extra_gated_description: >-
  If you want to learn more about how we process your personal data, please read
  our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
tags:
  - mistral-common
  - quantized
model_type: mistral
quantization: bitsandbytes

Mistral-Small-3.2-24B-Instruct-2506 (Quantized)

This is a quantized version of togethercomputer/mistral-3.2-instruct-2506, optimized for reduced memory usage while maintaining performance.

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.

Quantization Details

This model has been quantized to reduce memory requirements while preserving model quality. The quantization reduces the model size significantly compared to the original fp16/bf16 version.

Base Model Improvements

Small-3.2 improves in the following categories:

Instruction following: Small-3.2 is better at following precise instructions
Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
Function calling: Small-3.2's function calling template is more robust

In all other categories Small-3.2 should match or slightly improve compared to Mistral-Small-3.1-24B-Instruct-2503.

Key Features

same as Mistral-Small-3.1-24B-Instruct-2503
Reduced memory footprint through quantization
Optimized for inference with maintained quality

Usage

The quantized model can be used with the following frameworks;

Note 1: We recommend using a relatively low temperature, such as temperature=0.15.

Note 2: Make sure to add a system prompt to the model to best tailor it to your needs.

Memory Requirements

This quantized version requires significantly less GPU memory than the original model:

Original: ~55 GB of GPU RAM in bf16 or fp16
Quantized: Reduced memory footprint (exact requirements depend on quantization method used)

License

This model inherits the same license as the base model: Apache-2.0

Original Model

For benchmark results and detailed usage examples, please refer to the original model: togethercomputer/mistral-3.2-instruct-2506