sebastavar commited on
Commit
6792d12
·
verified ·
1 Parent(s): afdab5e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -45,9 +45,9 @@ Once set up, you can proceed to run the model by running the snippet below:
45
  from mlx_lm import load, generate
46
  from transformers import AutoTokenizer
47
 
48
- model, tokenizer = load("HalleyAI/gpt-oss-20b-4bit-gs32")
49
 
50
- model, tokenizer = load("HalleyAI/gpt-oss-20b-4bit-gs32")
51
  print(generate(
52
  model, tokenizer,
53
  prompt="Explain the Chudnovsky algorithm to compute π.",
@@ -70,7 +70,7 @@ We report perplexity (PPL) on a small internal text corpus using the same tokeni
70
  </thead>
71
  <tbody>
72
  <tr><td>MLX Q8 (reference)</td><td>2.4986</td></tr>
73
- <tr><td>MLX Q4 (gs=32)</td><td>2.5438 (+1.81% vs Q8)</td></tr>
74
  </tbody>
75
  </table>
76
  Note: This is a small, domain-specific eval for quick sanity; not a benchmark suite.
@@ -80,8 +80,8 @@ Note: This is a small, domain-specific eval for quick sanity; not a benchmark su
80
  ```bash
81
  python -m mlx_lm convert \
82
  --hf-path openai/gpt-oss-20b \
83
- --mlx-path gpt-oss-20b-mlx-q4-gs32 \
84
- --q-bits 4 --q-group-size 32 -q
85
  ```
86
  - Some non-expert tensors (embeddings, norms, router) remain FP16.
87
 
@@ -94,5 +94,5 @@ MoE models can be sensitive to prompt wording; prefer explicit instructions and
94
 
95
  - License: Apache-2.0 (inherits from base model).
96
  - Base model: OpenAI gpt-oss-20B.
97
- - Quantization: Halley AI Lab (MLX Q4, gs=32).
98
  - Please cite both the base model and this repository when you use the weights.
 
45
  from mlx_lm import load, generate
46
  from transformers import AutoTokenizer
47
 
48
+ model, tokenizer = load("HalleyAI/gpt-oss-20b-6bit-gs32")
49
 
50
+ model, tokenizer = load("HalleyAI/gpt-oss-20b-6bit-gs32")
51
  print(generate(
52
  model, tokenizer,
53
  prompt="Explain the Chudnovsky algorithm to compute π.",
 
70
  </thead>
71
  <tbody>
72
  <tr><td>MLX Q8 (reference)</td><td>2.4986</td></tr>
73
+ <tr><td>MLX Q4 (gs=32)</td><td> 2.4858 (~-0.51% vs Q8)</td></tr>
74
  </tbody>
75
  </table>
76
  Note: This is a small, domain-specific eval for quick sanity; not a benchmark suite.
 
80
  ```bash
81
  python -m mlx_lm convert \
82
  --hf-path openai/gpt-oss-20b \
83
+ --mlx-path gpt-oss-20b-mlx-q6-gs32 \
84
+ --q-bits 6 --q-group-size 32 -q
85
  ```
86
  - Some non-expert tensors (embeddings, norms, router) remain FP16.
87
 
 
94
 
95
  - License: Apache-2.0 (inherits from base model).
96
  - Base model: OpenAI gpt-oss-20B.
97
+ - Quantization: Halley AI Lab (MLX Q6, gs=32).
98
  - Please cite both the base model and this repository when you use the weights.