Update README.md
Browse files
README.md
CHANGED
@@ -45,9 +45,9 @@ Once set up, you can proceed to run the model by running the snippet below:
|
|
45 |
from mlx_lm import load, generate
|
46 |
from transformers import AutoTokenizer
|
47 |
|
48 |
-
model, tokenizer = load("HalleyAI/gpt-oss-20b-
|
49 |
|
50 |
-
model, tokenizer = load("HalleyAI/gpt-oss-20b-
|
51 |
print(generate(
|
52 |
model, tokenizer,
|
53 |
prompt="Explain the Chudnovsky algorithm to compute π.",
|
@@ -70,7 +70,7 @@ We report perplexity (PPL) on a small internal text corpus using the same tokeni
|
|
70 |
</thead>
|
71 |
<tbody>
|
72 |
<tr><td>MLX Q8 (reference)</td><td>2.4986</td></tr>
|
73 |
-
<tr><td>MLX Q4 (gs=32)</td><td>2.
|
74 |
</tbody>
|
75 |
</table>
|
76 |
Note: This is a small, domain-specific eval for quick sanity; not a benchmark suite.
|
@@ -80,8 +80,8 @@ Note: This is a small, domain-specific eval for quick sanity; not a benchmark su
|
|
80 |
```bash
|
81 |
python -m mlx_lm convert \
|
82 |
--hf-path openai/gpt-oss-20b \
|
83 |
-
--mlx-path gpt-oss-20b-mlx-
|
84 |
-
--q-bits
|
85 |
```
|
86 |
- Some non-expert tensors (embeddings, norms, router) remain FP16.
|
87 |
|
@@ -94,5 +94,5 @@ MoE models can be sensitive to prompt wording; prefer explicit instructions and
|
|
94 |
|
95 |
- License: Apache-2.0 (inherits from base model).
|
96 |
- Base model: OpenAI gpt-oss-20B.
|
97 |
-
- Quantization: Halley AI Lab (MLX
|
98 |
- Please cite both the base model and this repository when you use the weights.
|
|
|
45 |
from mlx_lm import load, generate
|
46 |
from transformers import AutoTokenizer
|
47 |
|
48 |
+
model, tokenizer = load("HalleyAI/gpt-oss-20b-6bit-gs32")
|
49 |
|
50 |
+
model, tokenizer = load("HalleyAI/gpt-oss-20b-6bit-gs32")
|
51 |
print(generate(
|
52 |
model, tokenizer,
|
53 |
prompt="Explain the Chudnovsky algorithm to compute π.",
|
|
|
70 |
</thead>
|
71 |
<tbody>
|
72 |
<tr><td>MLX Q8 (reference)</td><td>2.4986</td></tr>
|
73 |
+
<tr><td>MLX Q4 (gs=32)</td><td> 2.4858 (~-0.51% vs Q8)</td></tr>
|
74 |
</tbody>
|
75 |
</table>
|
76 |
Note: This is a small, domain-specific eval for quick sanity; not a benchmark suite.
|
|
|
80 |
```bash
|
81 |
python -m mlx_lm convert \
|
82 |
--hf-path openai/gpt-oss-20b \
|
83 |
+
--mlx-path gpt-oss-20b-mlx-q6-gs32 \
|
84 |
+
--q-bits 6 --q-group-size 32 -q
|
85 |
```
|
86 |
- Some non-expert tensors (embeddings, norms, router) remain FP16.
|
87 |
|
|
|
94 |
|
95 |
- License: Apache-2.0 (inherits from base model).
|
96 |
- Base model: OpenAI gpt-oss-20B.
|
97 |
+
- Quantization: Halley AI Lab (MLX Q6, gs=32).
|
98 |
- Please cite both the base model and this repository when you use the weights.
|