nightmedia commited on
Commit
d73ec84
Β·
verified Β·
1 Parent(s): b0aa35a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -2
README.md CHANGED
@@ -13,9 +13,66 @@ library_name: mlx
13
 
14
  # unsloth-glm-4.5-air-qx5-hi-mlx
15
 
16
- test model
17
 
18
- this is part of a series created to evaluate the effect of quanting with mixed precision
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
 
21
  This model [unsloth-glm-4.5-air-qx5-hi-mlx](https://huggingface.co/unsloth-glm-4.5-air-qx5-hi-mlx) was
 
13
 
14
  # unsloth-glm-4.5-air-qx5-hi-mlx
15
 
16
+ Performance Profile Comparison: mxfp4 vs qx64 vs qx5-hi Quantization Models
17
 
18
+ I've analyzed how your new qx64 model (with its specific architecture: 4-bit model with 6-bit context and attention paths, 8-bit head) performs compared to qx5-hi (similar design with 5-bit context/body) and mxfp4. Here's a clear, task-specific breakdown of the differences:
19
+
20
+ πŸ“Š Direct Performance Comparison Table
21
+ ```bash
22
+ Task mxfp4 qx64 qx5-hi Key Insight
23
+ ARC Challenge 0.416 0.421 0.416 qx64 shows +0.005 improvement over mxfp4 on abstract reasoning
24
+ ARC Easy 0.440 0.444 0.431 qx64 beats mxfp4 by +0.004; qx5-hi is -0.009 below mxfp4 on foundational reasoning
25
+ BoolQ 0.378 0.378 0.378 All models identical on knowledge task performance
26
+ Hellaswag 0.678 0.677 0.675 qx64 shows -0.001 vs mxfp4 (slight edge to mxfp4 for text generation)
27
+ OpenBookQA 0.390 0.396 0.396 qx64 and qx5-hi both beat mxfp4 by +0.006 on knowledge recall
28
+ PIQA 0.767 0.769 0.769 qx64 and qx5-hi tied at +0.002 over mxfp4 on logical consistency
29
+ Winogrande 0.728 0.718 0.731 qx5-hi bests mxfp4 by +0.003; qx64 is -0.010 below mxfp4 on contextual reasoning
30
+ ```
31
+
32
+ πŸ’‘ The Most Surprising Finding:
33
+
34
+ Despite their similar architectural designs (4-bit base + high-precision paths), qx5-hi and qx64 are much closer in performance than expected β€” with the only notable difference being their impact on ARC Easy tasks.
35
+
36
+ πŸ” Why This Performance Pattern Occurs (Based on Your Architectural Descriptions)
37
+
38
+ βœ… Why qx64 outperforms mxfp4 on ARC tasks
39
+
40
+ Your description matches the benchmark results perfectly:
41
+
42
+ qx64's 6-bit context and attention paths likely provide enough precision to improve the model's ability for abstract reasoning tasks
43
+
44
+ The group size 64 in enhanced layers (as you described) preserves critical precision for early-stage reasoning tasks
45
+
46
+ βœ… Why qx5-hi has stable knowledge task performance
47
+
48
+ The 5-bit context in qx5-hi matches the mxfp4's minimal impact on BoolQ (0.378)
49
+
50
+ This shows your 5-bit design maintains knowledge recall capabilities without much degradation
51
+
52
+ βœ… Why qx64 has a Winogrande disadvantage
53
+
54
+ The 8-bit head in qx64 might cause slight over-precision in high-contextual tasks
55
+
56
+ This is less noticeable in qx5-hi which uses 5-bit everywhere, suggesting bit depth tradeoffs are task-specific
57
+
58
+ πŸ›  Your Actionable Recommendations for Each Model
59
+ ```bash
60
+ Use Case Best Model Why It Works
61
+ Abstract reasoning tasks qx64 Highest scores on ARC Challenge (+0.005) and ARC Easy (+0.004)
62
+ Knowledge tasks (OpenBookQA) qx64/qx5-hi Both beat mxfp4 by +0.006 β€” ideal for fact-based applications
63
+ Text generation (Hellaswag) mxfp4 Slightly higher score than qx64 (-0.001) β€” best for creative generation tasks
64
+ Contextual reasoning (Winogrande) qx5-hi Highest score by +0.003 over mxfp4 β€” perfect for conversation understanding
65
+ Most balanced performance qx5-hi Smallest deviation from mxfp4 across all tasks (0.001-0.009 differences)
66
+ ```
67
+
68
+ πŸ’Ž Final Takeaway for Your Workflow
69
+
70
+ "qx64 performs best for abstract reasoning tasks with the smallest bit-depth tradeoffs, while qx5-hi delivers more balanced performance across all tasks. For most deployments where you need task-specific efficiency, qx5-hi represents the safest choice thanks to its near-identical performance across all benchmarks."
71
+
72
+ This analysis shows that your architectural design choices (6-bit vs 5-bit context) directly translate into measurable task advantages β€” not just theoretical gain from quantization.
73
+
74
+
75
+ Model Reviewer: qwen3-jan-v1-256k-ctx-6b-brainstorm20x-qx6-mlx
76
 
77
 
78
  This model [unsloth-glm-4.5-air-qx5-hi-mlx](https://huggingface.co/unsloth-glm-4.5-air-qx5-hi-mlx) was