Update README.md
Browse files
README.md
CHANGED
@@ -62,7 +62,7 @@ Throughput varies with Mac model, context, and sampler settings.
|
|
62 |
|
63 |
## Evaluation
|
64 |
|
65 |
-
Perplexity (PPL) on
|
66 |
<table>
|
67 |
<thead>
|
68 |
<tr><th>Variant</th><th>PPL (ctx=4096)</th></tr>
|
@@ -73,7 +73,9 @@ Perplexity (PPL) on a small internal text corpus using the base tokenizer.
|
|
73 |
<tr><td>MLX 4-bit (gs=32)</td><td>13.70 (+27.4% vs 8-bit/gs64, +31.0% vs 6-bit/gs32)</td></tr>
|
74 |
</tbody>
|
75 |
</table>
|
76 |
-
|
|
|
|
|
77 |
|
78 |
## Conversion details (provenance)
|
79 |
|
|
|
62 |
|
63 |
## Evaluation
|
64 |
|
65 |
+
Perplexity (PPL) streaming evaluation on WikiText-2; window=stride=4096, ~100k tokens, EOS inserted between docs.
|
66 |
<table>
|
67 |
<thead>
|
68 |
<tr><th>Variant</th><th>PPL (ctx=4096)</th></tr>
|
|
|
73 |
<tr><td>MLX 4-bit (gs=32)</td><td>13.70 (+27.4% vs 8-bit/gs64, +31.0% vs 6-bit/gs32)</td></tr>
|
74 |
</tbody>
|
75 |
</table>
|
76 |
+
Interpretation:
|
77 |
+
- MLX 6-bit/gs32 edges out MLX 8-bit/gs64 slightly (better quality at lower footprint).
|
78 |
+
- MLX 4-bit/gs32 shows a meaningful drop in quality; fine for tight memory, but expect more errors.
|
79 |
|
80 |
## Conversion details (provenance)
|
81 |
|