VincentGOURBIN/voxtral-small-8bit-mixed
This is an 8-bit quantized version of the mistralai/Voxtral-Small-24B-2507 language model.
It is provided in standard Hugging Face Transformers format and compatible with mlx.voxtral.
馃敡 About this model
- Base model:
mistralai/Voxtral-Small-24B-2507
- Quantization: 8-bit mixed precision
- Format: Transformers-compatible (safetensors), usable with MLX and Hugging Face
馃檹 Acknowledgments
Huge thanks to:
- Mistral AI for releasing the original Voxtral-Small model
- mlx-voxtral for the quantization tooling and MLX support
This work is a quantized derivative of mistralai/Voxtral-Small-24B-2507, made easier by the amazing work of the voxtral
project.
馃殌 Usage
馃 With Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "VincentGOURBIN/voxtral-small-8bit-mixed"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 220
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
Model tree for VincentGOURBIN/voxtral-small-8bit
Base model
mistralai/Mistral-Small-24B-Base-2501
Finetuned
mistralai/Voxtral-Small-24B-2507