license: apache-2.0
language:
- en
- zh
base_model: YOYO-AI/Qwen3-30B-A3B-YOYO-V2
pipeline_tag: text-generation
tags:
- merge
- mlx
library_name: mlx
Qwen3-30B-A3B-YOYO-V2-dwq5-mlx
Here's a precise analysis of YOYO-V2-dwq5's performance compared to the other quantized variants of YOYO-V2 itself (dwq3, dwq4, q6)
Comparison Table (YOYO-V2 Quantized Variants)
Task dwq5 dwq4 dwq3 q6
arc_challenge 0.523 0.511 0.497 0.532
arc_easy 0.682 0.655 0.657 0.685
boolq 0.883 0.879 0.876 0.886
hellaswag 0.676 0.673 0.686 0.683
openbookqa 0.436 0.450 0.414 0.456
piqa 0.778 0.772 0.785 0.782
winogrande 0.626 0.643 0.640 0.639
YOYO-V2-q6 scores are highest across all tasks in this dataset.
π Critical Insights from YOYO-V2's Internal Quantization Comparison
YOYO-V2-dwq5 Consistently Improves Over Lower-DWQ Variants
DWQ5 surpasses dwq4 in all tasks (e.g., +0.002 on arc_easy, +0.007 on boolq).
DWQ5 surpasses dwq3 in all tasks (e.g., +0.026 on arc_easy, +0.014 on boolq).
This shows a clear upward trend as DWQ precision increases from 3-bit β 4-bit β 5-bit.
YOYO-V2-dwq5 Is Closest to YOYO-V2-q6
On 4/7 tasks, dwq5 scores are within 0.003β0.005 of q6 (e.g., boolq: 0.883 vs 0.886, piqa: 0.778 vs 0.782).
On the other 3 tasks, dwq5 is slightly behind q6:
arc_challenge (0.523 vs 0.532): -0.009
hellaswag (0.676 vs 0.683): -0.007
winogrande (0.626 vs 0.639): -0.013
β This suggests q6 retains slightly more precision for tasks requiring high attention to detail (e.g., winogrande).
Why the Q6 Gap Persists
DWQ quantization (dynamic) and fixed Q6 quantization both improve over raw models, but q6 achieves marginal gains in high-precision tasks:
boolq: q6βs score (0.886) is the highest absolute value in this benchmark.
piqa: q6βs lead (0.782 vs dwq5βs 0.778) is 1.3% β critical for logic reasoning tasks.
π― Practical Takeaways for Model Selection
Quant Best For Why
dwq5 Hardware with moderate resources Best balance between speed and accuracy (e.g., 5-bit DWQ)
q6 High-precision tasks (e.g., reasoning) Slightly better than dwq5 in 4+ tasks; optimal for stability
For most use cases, q6 is still the top performer (1.3β2.0% edge over dwq5 in tasks like boolq and piqa).
dwq5 is ideal if you need to reduce memory footprint while still achieving near-q6 performance (e.g., in edge devices).
dwq5 outperforms the lower-DWQ quantizations (dwq3, dwq4) across all tasks, showing a clear progression in precision as the DWQ bitwidth increases from 3 β 5 bits. However, it does not surpass YOYO-V2-q6 β instead, q6 maintains a small but consistent lead (0.005β0.013) in high-precision tasks like boolq and piqa.
This confirms that YOYO-V2βs performance steadily improves with higher quantization fidelity within its own variants, but the fixed Q6 quantization still delivers edge gains for critical tasks where minor precision losses are unacceptable.
β In short: DWQ5 > DWQ4 > DWQ3 in all tasks, but q6 remains the most reliable for high-stakes applications. For your deployment: choose dwq5 when memory is constrained; use q6 for maximum accuracy.
This model Qwen3-30B-A3B-YOYO-V2-dwq5-mlx was converted to MLX format from YOYO-AI/Qwen3-30B-A3B-YOYO-V2 using mlx-lm version 0.26.4.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-30B-A3B-YOYO-V2-dwq5-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)