Qwen-32B overflow issue

by cicdatopea - opened Mar 20

Mar 20

Hi, Qwen32b and its variants are very special models that could easily cause int4 kernel overflow with chat template. Besides the accuracy, you may need to have a check of the generation or directly follow our recipe in OPEA space when using AutoRound.

bnjmnmarie

The Kaitchup org Mar 20

Hi,

Yes, I observed it with some models. My current strategies is to check the accuracy on IFEval to monitor generation issues. If the generation is broken, it should result in a significant drop of accuracy for this benchmark.
Do you think it is correct or did you observe some cases where the model performed well on generative benchmarks (like IFEval), while having generation issues? I don't know how it would be possible but I may miss something.

cicdatopea

Mar 20

Yes, I believe so, if I remember correctly. By default, lm-eval sets apply_chat_template to False.

Another question, I noticed you mentioned that AutoRound produced unstable IFEVAL results for Qwen2.5-72B. Could you share the exact task name? We tested several hyperparameter settings a long time ago, including auto-round-best, auto-round, and auto-round-light, and all yielded satisfactory results for leaderboard_ifeval.

bnjmnmarie

The Kaitchup org Mar 20

I didn't observe issues with the chat template in this case but I don't systematically test it. I'll work on this.

As for Qwen2.5-72B Instruct, I also get very good results when quantizing it, except with this (very) specific configuration:

nsamples = 512
iterations = 500
model_dtype = float16
symmetric quantization
auto_gptq export
group size = 128

This produced bad quantization for the 4-bit and 8-bit versions, but worked well with 2-bit and a group size of 32.
Other hyperparameter values, as you suggested, performed well.

cicdatopea

Mar 27

@bnjmnmarie Hi, do you still have the 72B int4 model that the accuracy of ifeval was notably low? I couldn't reproduce this issue in my two environments. I'd like to check whether it's related to lm-eval or something else.

bnjmnmarie

The Kaitchup org Mar 27

Yes, they are here:
kaitchup/Qwen2.5-72B-Instruct-AutoRoundGPTQ-8bit
kaitchup/Qwen2.5-72B-Instruct-AutoRoundGPTQ-4bit

cicdatopea

Mar 27

Thanks for the quick reply!

bnjmnmarie

The Kaitchup org Mar 27

(not sure it matters but for evaluation with IFEval, I use the vLLM backend)

cicdatopea

Mar 27

Ok, thanks for the information! We'll evaluate it using both the HF and vLLM backends respectively.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment