Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -62,7 +62,7 @@ Throughput varies with Mac model, context, and sampler settings.
 ## Evaluation
-Perplexity (PPL) on a small internal text corpus using the base tokenizer.
 <table>
   <thead>
     <tr><th>Variant</th><th>PPL (ctx=4096)</th></tr>
@@ -73,7 +73,9 @@ Perplexity (PPL) on a small internal text corpus using the base tokenizer.
     <tr><td>MLX 4-bit (gs=32)</td><td>13.70 (+27.4% vs 8-bit/gs64, +31.0% vs 6-bit/gs32)</td></tr>
   </tbody>
 </table>
-Note: Small, domain-specific eval for quick sanity; not a benchmark suite.
 ## Conversion details (provenance)

 ## Evaluation
+Perplexity (PPL) streaming evaluation on WikiText-2; window=stride=4096, ~100k tokens, EOS inserted between docs.
 <table>
   <thead>
     <tr><th>Variant</th><th>PPL (ctx=4096)</th></tr>
     <tr><td>MLX 4-bit (gs=32)</td><td>13.70 (+27.4% vs 8-bit/gs64, +31.0% vs 6-bit/gs32)</td></tr>
   </tbody>
 </table>
+Interpretation:
+- MLX 6-bit/gs32 edges out MLX 8-bit/gs64 slightly (better quality at lower footprint).
+- MLX 4-bit/gs32 shows a meaningful drop in quality; fine for tight memory, but expect more errors.
 ## Conversion details (provenance)