How did you use auto-round to quantize?
Hi!
First of all thanks for your work to support the AWQ community. ;-)
With the release of Mistral Small 3.2 I tried myself to auto-round that model, but I failed with an error:
ValueError: Unrecognized configuration class <class 'transformers.models.mistral3.configuration_mistral3.Mistral3Config'> for this kind of AutoModel: AutoModelForCausalLM.
Did you just use the (at that time) current transformers library to use auto-round? Or is there more necessary?
May be you could drop the script used for quantizing the Small 3.1.
Thanks in advance (for any hint)!
do you use the same CLI command auto-round-mllm
as shown in this model page https://huggingface.co/OPEA/Mistral-Small-3.1-24B-Instruct-2503-int4-AutoRound-awq-sym#generate-the-model? I believe AutoRound 0.5.1 should work fine for this model.
At the time we started supporting VLMs, automatic detection of VLM models was challenging, so we used different APIs depending on the model type.
We will try to merge these two APIs later
As far as I recall I didn't use that particular call via CLI. But if I use this as described on the model card, I get the following error using version 0.5.1:
$ auto-round-mllm \
--model mistralai/Mistral-Small-3.2-24B-Instruct-2506 \
--device 0 \
--bits 4 \
--format 'auto_awq,auto_gptq' \
--output_dir "./quants"
'torch.use_deterministic_algorithms' is turned on by default for reproducibility, and can be turned off by setting the '--disable_deterministic_algorithms' parameter.
2025-07-08 16:24:36 INFO mllm.py L319: start to quantize mistralai/Mistral-Small-3.2-24B-Instruct-2506
Traceback (most recent call last):
File "/data/build/.venv/bin/auto-round-mllm", line 8, in <module>
sys.exit(run_mllm())
File "/data/build/.venv/lib/python3.10/site-packages/auto_round/__main__.py", line 69, in run_mllm
tune(args)
File "/data/build/.venv/lib/python3.10/site-packages/auto_round/script/mllm.py", line 326, in tune
model, processor, tokenizer, image_processor = mllm_load_model(
File "/data/build/.venv/lib/python3.10/site-packages/auto_round/utils.py", line 1322, in mllm_load_model
config = json.load(hf_file.open(pretrained_model_name_or_path + "/config.json"))
File "/usr/lib/python3.10/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.10/json/__init__.py", line 341, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Seems something fishy in the config.json:
$ hexdump -C config.json
00000000 7b 0a 20 20 22 61 72 63 68 69 74 65 63 74 75 72 |{. "architectur|
00000010 65 73 22 3a 20 5b 0a 20 20 20 20 22 4d 69 73 74 |es": [. "Mist|
00000020 72 61 6c 33 46 6f 72 43 6f 6e 64 69 74 69 6f 6e |ral3ForCondition|
00000030 61 6c 47 65 6e 65 72 61 74 69 6f 6e 22 0a 20 20 |alGeneration". |
But I don't see the problem when checking the config.json or I misinterpret the error message.
sorry, please use the main branch, Hugging Face Hub has had some updates recently, and we fixed this issue a few days ago. We will release the new version of auto-round soon.