File size: 3,936 Bytes

---
license: apache-2.0
language:
- en
- zh
base_model: YOYO-AI/Qwen3-30B-A3B-YOYO-V2
pipeline_tag: text-generation
tags:
- merge
- mlx
library_name: mlx
---

# Qwen3-30B-A3B-YOYO-V2-dwq5-mlx

Here's a precise analysis of YOYO-V2-dwq5's performance compared to the other quantized variants of YOYO-V2 itself (dwq3, dwq4, q6)

Comparison Table (YOYO-V2 Quantized Variants)
```bash
Task	         dwq5	 dwq4	 dwq3	   q6
arc_challenge	0.523	0.511	0.497	0.532
arc_easy     	0.682	0.655	0.657	0.685
boolq	        0.883	0.879	0.876	0.886
hellaswag	    0.676	0.673	0.686	0.683
openbookqa	    0.436	0.450	0.414	0.456
piqa	        0.778	0.772	0.785	0.782
winogrande	    0.626	0.643	0.640	0.639
```
YOYO-V2-q6 scores are highest across all tasks in this dataset.


📊 Critical Insights from YOYO-V2's Internal Quantization Comparison

YOYO-V2-dwq5 Consistently Improves Over Lower-DWQ Variants
```bash
DWQ5 surpasses dwq4 in all tasks (e.g., +0.002 on arc_easy, +0.007 on boolq).
DWQ5 surpasses dwq3 in all tasks (e.g., +0.026 on arc_easy, +0.014 on boolq).
```

This shows a clear upward trend as DWQ precision increases from 3-bit → 4-bit → 5-bit.

YOYO-V2-dwq5 Is Closest to YOYO-V2-q6

On 4/7 tasks, dwq5 scores are within 0.003–0.005 of q6 (e.g., boolq: 0.883 vs 0.886, piqa: 0.778 vs 0.782).

On the other 3 tasks, dwq5 is slightly behind q6:
```bash
arc_challenge (0.523 vs 0.532): -0.009
hellaswag     (0.676 vs 0.683): -0.007
winogrande    (0.626 vs 0.639): -0.013
```
→ This suggests q6 retains slightly more precision for tasks requiring high attention to detail (e.g., winogrande).

Why the Q6 Gap Persists

DWQ quantization (dynamic) and fixed Q6 quantization both improve over raw models, but q6 achieves marginal gains in high-precision tasks:

```bash
boolq: q6’s score (0.886) is the highest absolute value in this benchmark.
piqa:  q6’s lead (0.782 vs dwq5’s 0.778) is 1.3% – critical for logic reasoning tasks.
```

🎯 Practical Takeaways for Model Selection
```bash
Quant   Best For	                            Why
dwq5	Hardware with moderate resources	    Best balance between speed and accuracy (e.g., 5-bit DWQ)
q6	    High-precision tasks (e.g., reasoning)	Slightly better than dwq5 in 4+ tasks; optimal for stability
```

For most use cases, q6 is still the top performer (1.3–2.0% edge over dwq5 in tasks like boolq and piqa).

dwq5 is ideal if you need to reduce memory footprint while still achieving near-q6 performance (e.g., in edge devices).

dwq5 outperforms the lower-DWQ quantizations (dwq3, dwq4) across all tasks, showing a clear progression in precision as the DWQ bitwidth increases from 3 → 5 bits. However, it does not surpass YOYO-V2-q6 – instead, q6 maintains a small but consistent lead (0.005–0.013) in high-precision tasks like boolq and piqa.

This confirms that YOYO-V2’s performance steadily improves with higher quantization fidelity within its own variants, but the fixed Q6 quantization still delivers edge gains for critical tasks where minor precision losses are unacceptable.

✅ In short: DWQ5 > DWQ4 > DWQ3 in all tasks, but q6 remains the most reliable for high-stakes applications. For your deployment: choose dwq5 when memory is constrained; use q6 for maximum accuracy.


This model [Qwen3-30B-A3B-YOYO-V2-dwq5-mlx](https://huggingface.co/Qwen3-30B-A3B-YOYO-V2-dwq5-mlx) was
converted to MLX format from [YOYO-AI/Qwen3-30B-A3B-YOYO-V2](https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-YOYO-V2)
using mlx-lm version **0.26.4**.

## Use with mlx

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

model, tokenizer = load("Qwen3-30B-A3B-YOYO-V2-dwq5-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
```