4bit Q4_K_M Scored 63.57 on MMLU Pro single shot
#4
by
xbruce22
- opened
ollama same Q4_K_M scored 65.71
Seed = 42
OLLAMA logs
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model | Dataset | Metric | Subset | Num | Score | Cat.0 |
+=======================+===========+=================+==================+=======+=========+=========+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | computer science | 10 | 0.5 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | math | 10 | 0.9 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | chemistry | 10 | 1 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | engineering | 10 | 0.6 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | law | 10 | 0.3 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | biology | 10 | 0.9 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | health | 10 | 0.7 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | physics | 10 | 0.4 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | business | 10 | 0.7 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | philosophy | 10 | 0.7 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | economics | 10 | 0.8 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | other | 10 | 0.7 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | psychology | 10 | 0.6 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | history | 10 | 0.4 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | OVERALL | 140 | 0.6571 | - |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
unsloth logs
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model | Dataset | Metric | Subset | Num | Score | Cat.0 |
+=======================+===========+=================+==================+=======+=========+=========+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | computer science | 10 | 0.5 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | math | 10 | 0.7 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | chemistry | 10 | 0.9 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | engineering | 10 | 0.6 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | law | 10 | 0 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | biology | 10 | 0.9 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | health | 10 | 0.8 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | physics | 10 | 0.7 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | business | 10 | 0.6 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | philosophy | 10 | 0.6 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | economics | 10 | 0.9 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | other | 10 | 0.7 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | psychology | 10 | 0.5 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | history | 10 | 0.5 | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro | AverageAccuracy | OVERALL | 140 | 0.6357 | - |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+