cmarkea/Qwen2.5-32B-Instruct-4bit

Converted version of Qwen2.5-32B-Instruct to 4-bit using bitsandbytes. For more information about the model, refer to the model's page.

Impact on performance

Impact of quantization on a set of models.

Evaluation of the model was conducted using the PoLL (Pool of LLM) technique, assessing performance on 100 French questions with scores aggregated from six evaluations (two per evaluator). The evaluators included GPT-4o, Gemini-1.5-pro, and Claude3.5-sonnet.

Performance Scores (on a scale of 5):

Model	Score	# params (Billion)	size (GB)
gpt-4o	4.13	N/A	N/A
gpt-4o-mini	4.02	N/A	N/A
Qwen/Qwen2.5-32B-Instruct	3.99	32.8	65.6
cmarkea/Qwen2.5-32B-Instruct-4bit	3.98	32.8	16.4
mistralai/Mixtral-8x7B-Instruct-v0.1	3.71	46.7	93.4
cmarkea/Mixtral-8x7B-Instruct-v0.1-4bit	3.68	46.7	23.35
meta-llama/Meta-Llama-3.1-70B-Instruct	3.68	70.06	140.12
gpt-3.5-turbo	3.66	175	350
cmarkea/Meta-Llama-3.1-70B-Instruct-4bit	3.64	70.06	35.3
TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ	3.56	46.7	46.7
meta-llama/Meta-Llama-3.1-8B-Instruct	3.25	8.03	16.06
mistralai/Mistral-7B-Instruct-v0.2	1.98	7.25	14.5
cmarkea/bloomz-7b1-mt-sft-chat	1.69	7.07	14.14
cmarkea/bloomz-3b-dpo-chat	1.68	3	6
cmarkea/bloomz-3b-sft-chat	1.51	3	6
croissantllm/CroissantLLMChat-v0.1	1.19	1.3	2.7
cmarkea/bloomz-560m-sft-chat	1.04	0.56	1.12
OpenLLM-France/Claire-Mistral-7B-0.1	0.38	7.25	14.5

The impact of quantization is negligible.

Prompt Pattern

Here is a reminder of the command pattern to interact with the model:

<|im_start|>user\n{user_prompt_1}<|im_end|>\n<|im_start|>assistant\n{model_answer_1}...

cmarkea
/

Qwen2.5-32B-Instruct-4bit

Impact on performance

Prompt Pattern

Model tree for cmarkea/Qwen2.5-32B-Instruct-4bit

Collection including cmarkea/Qwen2.5-32B-Instruct-4bit

Quantized 4-bit models