sebastavar commited on
Commit
257edfb
·
verified ·
1 Parent(s): 721298f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -70,13 +70,17 @@ Perplexity (PPL) streaming evaluation on WikiText-2; window=stride=4096, ~100k t
70
  <tbody>
71
  <tr><td>MLX 8-bit (reference)</td><td>10.75</td></tr>
72
  <tr><td>MLX 6-bit (gs=32)</td><td>10.46 (−2.7% vs 8-bit/gs64)</td></tr>
 
73
  <tr><td>MLX 4-bit (gs=32)</td><td>13.70 (+27.4% vs 8-bit/gs64, +31.0% vs 6-bit/gs32)</td></tr>
74
  </tbody>
75
  </table>
76
 
77
- Interpretation:
78
- - MLX 6-bit/gs32 edges out MLX 8-bit/gs64 slightly (better quality at lower footprint).
79
- - MLX 4-bit/gs32 shows a meaningful drop in quality; fine for tight memory, but expect more errors.
 
 
 
80
 
81
  ## Conversion details (provenance)
82
 
 
70
  <tbody>
71
  <tr><td>MLX 8-bit (reference)</td><td>10.75</td></tr>
72
  <tr><td>MLX 6-bit (gs=32)</td><td>10.46 (−2.7% vs 8-bit/gs64)</td></tr>
73
+ <tr><td><strong>MLX 5-bit (gs=32)</strong></td><td><strong>11.11 (+3.3% vs 8-bit/gs64, +6.2% vs 6-bit/gs32)</strong></td></tr>
74
  <tr><td>MLX 4-bit (gs=32)</td><td>13.70 (+27.4% vs 8-bit/gs64, +31.0% vs 6-bit/gs32)</td></tr>
75
  </tbody>
76
  </table>
77
 
78
+ **Interpretation**
79
+ - MLX 6-bit/gs32: Best of the group; edges out 8-bit/gs64 slightly at a smaller
80
+ footprint.
81
+ - MLX 5-bit/gs32: Small, consistent drop vs 6-bit/gs32 and 8-bit/gs64 (~3–6% PPL); strong “fits-16GB” option when GPU buffer limits matter.
82
+ - MLX 8-bit/gs64: Solid reference; near‑FP16 quality at a larger footprint.
83
+ - MLX 4-bit/gs32: Trades accuracy for footprint; use when RAM is constrained or throughput is the priority.
84
 
85
  ## Conversion details (provenance)
86