Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -45,9 +45,9 @@ Once set up, you can proceed to run the model by running the snippet below:
 from mlx_lm import load, generate
 from transformers import AutoTokenizer
-model, tokenizer = load("HalleyAI/gpt-oss-20b-4bit-gs32")
-model, tokenizer = load("HalleyAI/gpt-oss-20b-4bit-gs32")
 print(generate(
     model, tokenizer,
     prompt="Explain the Chudnovsky algorithm to compute π.",
@@ -70,7 +70,7 @@ We report perplexity (PPL) on a small internal text corpus using the same tokeni
   </thead>
   <tbody>
     <tr><td>MLX Q8 (reference)</td><td>2.4986</td></tr>
-    <tr><td>MLX Q4 (gs=32)</td><td>2.5438 (+1.81% vs Q8)</td></tr>
   </tbody>
 </table>
 Note: This is a small, domain-specific eval for quick sanity; not a benchmark suite.
@@ -80,8 +80,8 @@ Note: This is a small, domain-specific eval for quick sanity; not a benchmark su
 ```bash
 python -m mlx_lm convert \
   --hf-path openai/gpt-oss-20b \
-  --mlx-path gpt-oss-20b-mlx-q4-gs32 \
-  --q-bits 4 --q-group-size 32 -q
 ```
 - Some non-expert tensors (embeddings, norms, router) remain FP16.
@@ -94,5 +94,5 @@ MoE models can be sensitive to prompt wording; prefer explicit instructions and
 - License: Apache-2.0 (inherits from base model).
 - Base model: OpenAI gpt-oss-20B.
-- Quantization: Halley AI Lab (MLX Q4, gs=32).
 - Please cite both the base model and this repository when you use the weights.

 from mlx_lm import load, generate
 from transformers import AutoTokenizer
+model, tokenizer = load("HalleyAI/gpt-oss-20b-6bit-gs32")
+model, tokenizer = load("HalleyAI/gpt-oss-20b-6bit-gs32")
 print(generate(
     model, tokenizer,
     prompt="Explain the Chudnovsky algorithm to compute π.",
   </thead>
   <tbody>
     <tr><td>MLX Q8 (reference)</td><td>2.4986</td></tr>
+    <tr><td>MLX Q4 (gs=32)</td><td> 2.4858 (~-0.51% vs Q8)</td></tr>
   </tbody>
 </table>
 Note: This is a small, domain-specific eval for quick sanity; not a benchmark suite.
 ```bash
 python -m mlx_lm convert \
   --hf-path openai/gpt-oss-20b \
+  --mlx-path gpt-oss-20b-mlx-q6-gs32 \
+  --q-bits 6 --q-group-size 32 -q
 ```
 - Some non-expert tensors (embeddings, norms, router) remain FP16.
 - License: Apache-2.0 (inherits from base model).
 - Base model: OpenAI gpt-oss-20B.
+- Quantization: Halley AI Lab (MLX Q6, gs=32).
 - Please cite both the base model and this repository when you use the weights.