sebastavar commited on
Commit
88377b9
·
verified ·
1 Parent(s): 5568a63

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -15,7 +15,7 @@ tags:
15
  - openai
16
  - halleyai
17
  ---
18
- # gpt-oss-20b — MLX 4-bit (group size 32)
19
 
20
  **Summary.** This is a 6-bit (Q4) **MLX** quantization of **gpt-oss-20B** (sparse Mixture-of-Experts, MPx4). Group size is **32**.
21
  Built for **Apple Silicon** with Metal acceleration.
@@ -23,14 +23,14 @@ Built for **Apple Silicon** with Metal acceleration.
23
  - **Base model:** `openai/gpt-oss-20b` (Apache-2.0)
24
  - **Quantization:** MLX Q6, `q_group_size=32` (some tensors remain FP16 for stability)
25
  - **Files:** MLX weight shards + `config.json`; includes tokenizer files & chat template for drop-in use
26
- - **Footprint:** ~**13.11 GB** on disk
27
  - **Intended use:** local inference / research on M-series Macs
28
  - **Not intended for:** safety-critical decisions; outputs may be inaccurate or biased
29
 
30
  ## Requirements
31
  Runs on: Apple Silicon (M1 and higher) with macOS ≥ 13.5 via MLX (Metal).
32
  Not supported: Intel macOS / Linux / Windows (use a GGUF build + llama.cpp instead).
33
- Suggested RAM: 24 GB for Q4 gs=32 (works on 16 GB with smaller KV cache).
34
 
35
  ## How to use (MLX)
36
 
 
15
  - openai
16
  - halleyai
17
  ---
18
+ # gpt-oss-20b — MLX 6-bit (group size 32)
19
 
20
  **Summary.** This is a 6-bit (Q4) **MLX** quantization of **gpt-oss-20B** (sparse Mixture-of-Experts, MPx4). Group size is **32**.
21
  Built for **Apple Silicon** with Metal acceleration.
 
23
  - **Base model:** `openai/gpt-oss-20b` (Apache-2.0)
24
  - **Quantization:** MLX Q6, `q_group_size=32` (some tensors remain FP16 for stability)
25
  - **Files:** MLX weight shards + `config.json`; includes tokenizer files & chat template for drop-in use
26
+ - **Footprint:** ~**18.38 GB** on disk
27
  - **Intended use:** local inference / research on M-series Macs
28
  - **Not intended for:** safety-critical decisions; outputs may be inaccurate or biased
29
 
30
  ## Requirements
31
  Runs on: Apple Silicon (M1 and higher) with macOS ≥ 13.5 via MLX (Metal).
32
  Not supported: Intel macOS / Linux / Windows (use a GGUF build + llama.cpp instead).
33
+ Suggested RAM: 48 GB for Q6 gs=32 (works on 32 GB with smaller KV cache).
34
 
35
  ## How to use (MLX)
36