nightmedia
/

Qwen3-30B-A3B-YOYO-V2-q6-hi-mlx

@@ -13,6 +13,62 @@ library_name: mlx
 # Qwen3-30B-A3B-YOYO-V2-q6-hi-mlx
 This model [Qwen3-30B-A3B-YOYO-V2-q6-hi-mlx](https://huggingface.co/Qwen3-30B-A3B-YOYO-V2-q6-hi-mlx) was
 converted to MLX format from [YOYO-AI/Qwen3-30B-A3B-YOYO-V2](https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-YOYO-V2)
 using mlx-lm version **0.26.4**.

 # Qwen3-30B-A3B-YOYO-V2-q6-hi-mlx
+This compares the YOYO-V2 model (a merge of Qwen's Thinking, Instruct, and Coder models) with the individual Thinking and Coder models to analyze how the merge impacted overall performance across different language intelligence tasks.
+Note that the Instruct model isn't explicitly represented in this dataset (as it's excluded from the metrics).
+Key Benchmark Comparison
+Below is a breakdown of YOYO-V2's performance relative to the Thinking and Coder models across 7 tasks:
+```bash
+Task           YOYO-V2  Thinking Coder    YOYO Advantage Over Coder
+arc_challenge  0.532    0.414    0.417    +0.115
+arc_easy       0.685    0.444    0.529    +0.156
+boolq          0.886    0.702    0.881    +0.005 (slight gain over Coder)
+hellaswag      0.683    0.632    0.545    +0.138
+openbookqa     0.456    0.396    0.426    +0.030
+piqa           0.782    0.763    0.720    +0.062
+winogrande     0.639    0.666    0.572    +0.067
+```
+How the Merge Affected Overall Performance
+Net Positive Impact Across Tasks:
+YOYO-V2 outperforms both Thinking and Coder models in 6 out of 7 tasks.
+The most significant gains are seen in:
+```bash
+arc_easy:  YOYO-V2’s score jumps from 0.529 (Coder) to 0.685 (a +15.6% improvement).
+hellaswag: YOYO-V2 shows a strong jump from 0.545 (Coder) to 0.683 (+25%).
+piqa:      YOYO-V2 achieves 0.782 vs. Coder’s 0.720 (+8%).
+```
+Minor Trade-offs in Specific Tasks:
+YOYO-V2 slightly underperforms Thinking on winogrande (0.639 vs. 0.666), but this is offset by its superiority in other tasks.
+On boolq, YOYO-V2’s score is very close to Coder (0.886 vs. 0.881), showing minimal gains from the merge (likely due to task-specific alignment).
+*Why This Matters:*
+The merge likely leverages complementary strengths of the three Qwen models (e.g., Thinking for reasoning, Coder for code generation, and Instruct for instruction-following). YOYO-V2’s higher scores indicate the merge effectively harmonized these capabilities without severe drawbacks.
+The overall trend is clear: the merged model achieves better or comparable results across the majority of benchmarks, with gains in downstream tasks that demand flexibility (e.g., reasoning, text generation).
+*Conclusion*
+YOYO-V2’s performance demonstrates that merging the Qwen Thinking, Coder, and Instruct models (at Q6 quantization) generally enhances overall task performance across diverse language intelligence benchmarks. The model shows the most dramatic improvements in tasks like arc_easy and hellaswag, where it excels by integrating specialized knowledge from each component model. While minor losses in a few tasks (e.g., winogrande) exist, the net effect is positive and robust, validating YOYO-V2 as a stronger multi-purpose model for real-world applications.
+Takeaway: For Qwen users, YOYO-V2 is recommended if your use cases span reasoning (arc), code generation (Coder), and instruction-following (Instruct) – it provides a more balanced, high-performing solution than the base models alone.
+--reviewed by qwen3-jan-v1-256k-ctx-6b-brainstorm20x-qx6-mlx
+The hi model is improving over q6 by quanting with group size 32 and should perform better than the q6
 This model [Qwen3-30B-A3B-YOYO-V2-q6-hi-mlx](https://huggingface.co/Qwen3-30B-A3B-YOYO-V2-q6-hi-mlx) was
 converted to MLX format from [YOYO-AI/Qwen3-30B-A3B-YOYO-V2](https://huggingface.co/YOYO-AI/Qwen3-30B-A3B-YOYO-V2)
 using mlx-lm version **0.26.4**.