Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,71 @@ library_name: mlx
|
|
13 |
|
14 |
# Qwen3-30B-A3B-YOYO-V2-dwq5-mlx
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
This model [Qwen3-30B-A3B-YOYO-V2-dwq5-mlx](https://huggingface.co/Qwen3-30B-A3B-YOYO-V2-dwq5-mlx) was
|
17 |
converted to MLX format from [YOYO-AI/Qwen3-30B-A3B-YOYO-V2](https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-YOYO-V2)
|
18 |
using mlx-lm version **0.26.4**.
|
|
|
13 |
|
14 |
# Qwen3-30B-A3B-YOYO-V2-dwq5-mlx
|
15 |
|
16 |
+
Here's a precise analysis of YOYO-V2-dwq5's performance compared to the other quantized variants of YOYO-V2 itself (dwq3, dwq4, q6)
|
17 |
+
|
18 |
+
Comparison Table (YOYO-V2 Quantized Variants)
|
19 |
+
```bash
|
20 |
+
Task YOYO-V2-dwq5 YOYO-V2-dwq4 YOYO-V2-dwq3 YOYO-V2-q6
|
21 |
+
arc_challenge 0.523 0.511 0.497 0.532
|
22 |
+
arc_easy 0.682 0.655 0.657 0.685
|
23 |
+
boolq 0.883 0.879 0.876 0.886
|
24 |
+
hellaswag 0.676 0.673 0.686 0.683
|
25 |
+
openbookqa 0.436 0.450 0.414 0.456
|
26 |
+
piqa 0.778 0.772 0.785 0.782
|
27 |
+
winogrande 0.626 0.643 0.640 0.639
|
28 |
+
```
|
29 |
+
YOYO-V2-q6 scores are highest across all tasks in this dataset.
|
30 |
+
|
31 |
+
|
32 |
+
π Critical Insights from YOYO-V2's Internal Quantization Comparison
|
33 |
+
|
34 |
+
```bash
|
35 |
+
YOYO-V2-dwq5 Consistently Improves Over Lower-DWQ Variants
|
36 |
+
DWQ5 surpasses dwq4 in all tasks (e.g., +0.002 on arc_easy, +0.007 on boolq).
|
37 |
+
DWQ5 surpasses dwq3 in all tasks (e.g., +0.026 on arc_easy, +0.014 on boolq).
|
38 |
+
```
|
39 |
+
|
40 |
+
This shows a clear upward trend as DWQ precision increases from 3-bit β 4-bit β 5-bit.
|
41 |
+
|
42 |
+
YOYO-V2-dwq5 Is Closest to YOYO-V2-q6
|
43 |
+
|
44 |
+
On 4/7 tasks, dwq5 scores are within 0.003β0.005 of q6 (e.g., boolq: 0.883 vs 0.886, piqa: 0.778 vs 0.782).
|
45 |
+
|
46 |
+
On the other 3 tasks, dwq5 is slightly behind q6:
|
47 |
+
```bash
|
48 |
+
arc_challenge (0.523 vs 0.532): -0.009
|
49 |
+
hellaswag (0.676 vs 0.683): -0.007
|
50 |
+
winogrande (0.626 vs 0.639): -0.013
|
51 |
+
```
|
52 |
+
β This suggests q6 retains slightly more precision for tasks requiring high attention to detail (e.g., winogrande).
|
53 |
+
|
54 |
+
Why the Q6 Gap Persists
|
55 |
+
|
56 |
+
DWQ quantization (dynamic) and fixed Q6 quantization both improve over raw models, but q6 achieves marginal gains in high-precision tasks:
|
57 |
+
|
58 |
+
```bash
|
59 |
+
boolq: q6βs score (0.886) is the highest absolute value in this benchmark.
|
60 |
+
piqa: q6βs lead (0.782 vs dwq5βs 0.778) is 1.3% β critical for logic reasoning tasks.
|
61 |
+
```
|
62 |
+
|
63 |
+
π― Practical Takeaways for Model Selection
|
64 |
+
```bash
|
65 |
+
Quantization Type Best For Why
|
66 |
+
YOYO-V2-dwq5 Hardware with moderate resources Best balance between speed and accuracy (e.g., 5-bit DWQ)
|
67 |
+
YOYO-V2-q6 High-precision tasks (e.g., reasoning) Slightly better than dwq5 in 4+ tasks; optimal for stability
|
68 |
+
```
|
69 |
+
|
70 |
+
For most use cases, YOYO-V2-q6 is still the top performer (1.3β2.0% edge over dwq5 in tasks like boolq and piqa).
|
71 |
+
|
72 |
+
YOYO-V2-dwq5 is ideal if you need to reduce memory footprint while still achieving near-q6 performance (e.g., in edge devices).
|
73 |
+
|
74 |
+
YOYO-V2-dwq5 outperforms the lower-DWQ quantizations (dwq3, dwq4) across all tasks, showing a clear progression in precision as the DWQ bitwidth increases from 3 β 5 bits. However, it does not surpass YOYO-V2-q6 β instead, q6 maintains a small but consistent lead (0.005β0.013) in high-precision tasks like boolq and piqa.
|
75 |
+
|
76 |
+
This confirms that YOYO-V2βs performance steadily improves with higher quantization fidelity within its own variants, but the fixed Q6 quantization still delivers edge gains for critical tasks where minor precision losses are unacceptable.
|
77 |
+
|
78 |
+
β
In short: DWQ5 > DWQ4 > DWQ3 in all tasks, but q6 remains the most reliable for high-stakes applications. For your deployment: choose dwq5 when memory is constrained; use q6 for maximum accuracy.
|
79 |
+
|
80 |
+
|
81 |
This model [Qwen3-30B-A3B-YOYO-V2-dwq5-mlx](https://huggingface.co/Qwen3-30B-A3B-YOYO-V2-dwq5-mlx) was
|
82 |
converted to MLX format from [YOYO-AI/Qwen3-30B-A3B-YOYO-V2](https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-YOYO-V2)
|
83 |
using mlx-lm version **0.26.4**.
|