vschandramourya
/

mistral-3.2-instruct-2506-quantized

Model card Files Files and versions

mistral-3.2-instruct-2506-quantized / README.md

vschandramourya's picture

vschandramourya

Upload README.md with huggingface_hub

d7cf3e9 verified 7 days ago

|

history blame contribute delete

2.89 kB

	---
	library_name: vllm
	language:
	- en
	- fr
	- de
	- es
	- pt
	- it
	- ja
	- ko
	- ru
	- zh
	- ar
	- fa
	- id
	- ms
	- ne
	- pl
	- ro
	- sr
	- sv
	- tr
	- uk
	- vi
	- hi
	- bn
	license: apache-2.0
	inference: false
	base_model:
	- mistralai/Mistral-Small-3.1-24B-Base-2503
	- togethercomputer/mistral-3.2-instruct-2506
	extra_gated_description: >-
	If you want to learn more about how we process your personal data, please read
	our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
	tags:
	- mistral-common
	- quantized
	model_type: mistral
	quantization: bitsandbytes
	---

	# Mistral-Small-3.2-24B-Instruct-2506 (Quantized)

	This is a quantized version of [togethercomputer/mistral-3.2-instruct-2506](https://huggingface.co/togethercomputer/mistral-3.2-instruct-2506), optimized for reduced memory usage while maintaining performance.

	Mistral-Small-3.2-24B-Instruct-2506 is a minor update of [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503).

	## Quantization Details

	This model has been quantized to reduce memory requirements while preserving model quality. The quantization reduces the model size significantly compared to the original fp16/bf16 version.

	## Base Model Improvements

	Small-3.2 improves in the following categories:
	- Instruction following: Small-3.2 is better at following precise instructions
	- Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
	- Function calling: Small-3.2's function calling template is more robust

	In all other categories Small-3.2 should match or slightly improve compared to [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503).

	## Key Features
	- same as [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503#key-features)
	- Reduced memory footprint through quantization
	- Optimized for inference with maintained quality

	## Usage

	The quantized model can be used with the following frameworks;
	- [`vllm (recommended)`](https://github.com/vllm-project/vllm)
	- [`transformers`](https://github.com/huggingface/transformers)

	Note 1: We recommend using a relatively low temperature, such as `temperature=0.15`.

	Note 2: Make sure to add a system prompt to the model to best tailor it to your needs.

	### Memory Requirements

	This quantized version requires significantly less GPU memory than the original model:
	- Original: ~55 GB of GPU RAM in bf16 or fp16
	- Quantized: Reduced memory footprint (exact requirements depend on quantization method used)

	## License

	This model inherits the same license as the base model: Apache-2.0

	## Original Model

	For benchmark results and detailed usage examples, please refer to the original model: [togethercomputer/mistral-3.2-instruct-2506](https://huggingface.co/togethercomputer/mistral-3.2-instruct-2506)