VincentGOURBIN
/

voxtral-small-4bit-mixed

text2text-generation

4-bit precision

Model card Files Files and versions

voxtral-small-4bit-mixed / README.md

VincentGOURBIN's picture

Update README.md

cabc749 verified 16 days ago

|

history blame contribute delete

1.72 kB

	---
	license: apache-2.0
	base_model:
	- mistralai/Voxtral-Small-24B-2507
	tags:
	- mistral
	- quantized
	- 4bit
	- llm
	- language-model
	- transformers
	- mlx
	---

	# VincentGOURBIN/voxtral-small-4bit-mixed

	This is a 4-bit quantized version of the [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) language model.
	It is provided in standard Hugging Face Transformers format and compatible with [mlx.voxtral](https://github.com/mzbac/mlx.voxtral).

	## 🔧 About this model

	- Base model: [`mistralai/Voxtral-Small-24B-2507`](https://huggingface.co/mistralai/Voxtral-Small-24B-2507)
	- Quantization: 4-bit mixed precision
	- Format: Transformers-compatible (safetensors), usable with MLX and Hugging Face

	## 🙏 Acknowledgments

	Huge thanks to:

	- [Mistral AI](https://mistral.ai/) for releasing the original Voxtral-Small model
	- [mlx-voxtral](https://github.com/mzbac/mlx.voxtral) for the quantization tooling and MLX support

	This work is a quantized derivative of [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507), made easier by the amazing work of the `voxtral` project.

	## 🚀 Usage

	### 🤗 With Hugging Face Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "VincentGOURBIN/voxtral-small-4bit-mixed"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

	prompt = "What is the capital of France?"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=50)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))