|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- mistralai/Voxtral-Small-24B-2507 |
|
tags: |
|
- mistral |
|
- quantized |
|
- 4bit |
|
- llm |
|
- language-model |
|
- transformers |
|
- mlx |
|
--- |
|
|
|
# VincentGOURBIN/voxtral-small-4bit-mixed |
|
|
|
This is a 4-bit quantized version of the [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) language model. |
|
It is provided in standard Hugging Face Transformers format and compatible with [mlx.voxtral](https://github.com/mzbac/mlx.voxtral). |
|
|
|
## π§ About this model |
|
|
|
- **Base model**: [`mistralai/Voxtral-Small-24B-2507`](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) |
|
- **Quantization**: 4-bit mixed precision |
|
- **Format**: Transformers-compatible (safetensors), usable with MLX and Hugging Face |
|
|
|
## π Acknowledgments |
|
|
|
Huge thanks to: |
|
|
|
- **[Mistral AI](https://mistral.ai/)** for releasing the original Voxtral-Small model |
|
- **[mlx-voxtral](https://github.com/mzbac/mlx.voxtral)** for the quantization tooling and MLX support |
|
|
|
This work is a quantized derivative of [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507), made easier by the amazing work of the `voxtral` project. |
|
|
|
## π Usage |
|
|
|
### π€ With Hugging Face Transformers |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_id = "VincentGOURBIN/voxtral-small-4bit-mixed" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") |
|
|
|
prompt = "What is the capital of France?" |
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
outputs = model.generate(**inputs, max_new_tokens=50) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |