4bit Q4_K_M Scored 63.57 on MMLU Pro single shot

#4
by xbruce22 - opened

ollama same Q4_K_M scored 65.71

Seed = 42

OLLAMA logs

+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model                 | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=======================+===========+=================+==================+=======+=========+=========+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | computer science |    10 |  0.5    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | math             |    10 |  0.9    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | chemistry        |    10 |  1      | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | engineering      |    10 |  0.6    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | law              |    10 |  0.3    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | biology          |    10 |  0.9    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | health           |    10 |  0.7    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | physics          |    10 |  0.4    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | business         |    10 |  0.7    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | philosophy       |    10 |  0.7    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | economics        |    10 |  0.8    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | other            |    10 |  0.7    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | psychology       |    10 |  0.6    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | history          |    10 |  0.4    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | OVERALL          |   140 |  0.6571 | -       |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+

unsloth logs

+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model                 | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=======================+===========+=================+==================+=======+=========+=========+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | computer science |    10 |  0.5    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | math             |    10 |  0.7    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | chemistry        |    10 |  0.9    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | engineering      |    10 |  0.6    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | law              |    10 |  0      | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | biology          |    10 |  0.9    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | health           |    10 |  0.8    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | physics          |    10 |  0.7    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | business         |    10 |  0.6    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | philosophy       |    10 |  0.6    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | economics        |    10 |  0.9    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | other            |    10 |  0.7    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | psychology       |    10 |  0.5    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | history          |    10 |  0.5    | default |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+
| qwen-coder-30b-q4_K_M | mmlu_pro  | AverageAccuracy | OVERALL          |   140 |  0.6357 | -       |
+-----------------------+-----------+-----------------+------------------+-------+---------+---------+

Sign up or log in to comment