VincentGOURBIN/voxtral-small-8bit-mixed

This is an 8-bit quantized version of the mistralai/Voxtral-Small-24B-2507 language model.
It is provided in standard Hugging Face Transformers format and compatible with mlx.voxtral.

馃敡 About this model

  • Base model: mistralai/Voxtral-Small-24B-2507
  • Quantization: 8-bit mixed precision
  • Format: Transformers-compatible (safetensors), usable with MLX and Hugging Face

馃檹 Acknowledgments

Huge thanks to:

  • Mistral AI for releasing the original Voxtral-Small model
  • mlx-voxtral for the quantization tooling and MLX support

This work is a quantized derivative of mistralai/Voxtral-Small-24B-2507, made easier by the amazing work of the voxtral project.

馃殌 Usage

馃 With Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "VincentGOURBIN/voxtral-small-8bit-mixed"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
220
Safetensors
Model size
24.9B params
Tensor type
F16
U32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for VincentGOURBIN/voxtral-small-8bit

Quantized
(6)
this model