How did you use auto-round to quantize?

#4
by stelterlab - opened

Hi!

First of all thanks for your work to support the AWQ community. ;-)

With the release of Mistral Small 3.2 I tried myself to auto-round that model, but I failed with an error:

ValueError: Unrecognized configuration class <class 'transformers.models.mistral3.configuration_mistral3.Mistral3Config'> for this kind of AutoModel: AutoModelForCausalLM.

Did you just use the (at that time) current transformers library to use auto-round? Or is there more necessary?

May be you could drop the script used for quantizing the Small 3.1.

Thanks in advance (for any hint)!

Open Platform for Enterprise AI org

do you use the same CLI command auto-round-mllm as shown in this model page https://huggingface.co/OPEA/Mistral-Small-3.1-24B-Instruct-2503-int4-AutoRound-awq-sym#generate-the-model? I believe AutoRound 0.5.1 should work fine for this model.

At the time we started supporting VLMs, automatic detection of VLM models was challenging, so we used different APIs depending on the model type.

We will try to merge these two APIs later

As far as I recall I didn't use that particular call via CLI. But if I use this as described on the model card, I get the following error using version 0.5.1:

$ auto-round-mllm \
--model mistralai/Mistral-Small-3.2-24B-Instruct-2506 \
--device 0 \
--bits 4 \
--format 'auto_awq,auto_gptq' \
--output_dir "./quants"
'torch.use_deterministic_algorithms' is turned on by default for reproducibility, and can be turned off by setting the '--disable_deterministic_algorithms' parameter.
2025-07-08 16:24:36 INFO mllm.py L319: start to quantize mistralai/Mistral-Small-3.2-24B-Instruct-2506
Traceback (most recent call last):
  File "/data/build/.venv/bin/auto-round-mllm", line 8, in <module>
    sys.exit(run_mllm())
  File "/data/build/.venv/lib/python3.10/site-packages/auto_round/__main__.py", line 69, in run_mllm
    tune(args)
  File "/data/build/.venv/lib/python3.10/site-packages/auto_round/script/mllm.py", line 326, in tune
    model, processor, tokenizer, image_processor = mllm_load_model(
  File "/data/build/.venv/lib/python3.10/site-packages/auto_round/utils.py", line 1322, in mllm_load_model
    config = json.load(hf_file.open(pretrained_model_name_or_path + "/config.json"))
  File "/usr/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.10/json/__init__.py", line 341, in loads
    s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Seems something fishy in the config.json:

$ hexdump -C config.json
00000000  7b 0a 20 20 22 61 72 63  68 69 74 65 63 74 75 72  |{.  "architectur|
00000010  65 73 22 3a 20 5b 0a 20  20 20 20 22 4d 69 73 74  |es": [.    "Mist|
00000020  72 61 6c 33 46 6f 72 43  6f 6e 64 69 74 69 6f 6e  |ral3ForCondition|
00000030  61 6c 47 65 6e 65 72 61  74 69 6f 6e 22 0a 20 20  |alGeneration".  |

But I don't see the problem when checking the config.json or I misinterpret the error message.

Open Platform for Enterprise AI org
β€’
edited Jul 9

sorry, please use the main branch, Hugging Face Hub has had some updates recently, and we fixed this issue a few days ago. We will release the new version of auto-round soon.

Sign up or log in to comment