Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ tags:
 - openai
 - halleyai
 ---
-# gpt-oss-20b — MLX 4-bit (group size 32)
 **Summary.** This is a 6-bit (Q4) **MLX** quantization of **gpt-oss-20B** (sparse Mixture-of-Experts, MPx4). Group size is **32**.
 Built for **Apple Silicon** with Metal acceleration.
@@ -23,14 +23,14 @@ Built for **Apple Silicon** with Metal acceleration.
 - **Base model:** `openai/gpt-oss-20b` (Apache-2.0)
 - **Quantization:** MLX Q6, `q_group_size=32` (some tensors remain FP16 for stability)
 - **Files:** MLX weight shards + `config.json`; includes tokenizer files & chat template for drop-in use
-- **Footprint:** ~**13.11 GB** on disk
 - **Intended use:** local inference / research on M-series Macs
 - **Not intended for:** safety-critical decisions; outputs may be inaccurate or biased
 ## Requirements
 Runs on: Apple Silicon (M1 and higher) with macOS ≥ 13.5 via MLX (Metal).
 Not supported: Intel macOS / Linux / Windows (use a GGUF build + llama.cpp instead).
-Suggested RAM: 24 GB for Q4 gs=32 (works on 16 GB with smaller KV cache).
 ## How to use (MLX)

 - openai
 - halleyai
 ---
+# gpt-oss-20b — MLX 6-bit (group size 32)
 **Summary.** This is a 6-bit (Q4) **MLX** quantization of **gpt-oss-20B** (sparse Mixture-of-Experts, MPx4). Group size is **32**.
 Built for **Apple Silicon** with Metal acceleration.
 - **Base model:** `openai/gpt-oss-20b` (Apache-2.0)
 - **Quantization:** MLX Q6, `q_group_size=32` (some tensors remain FP16 for stability)
 - **Files:** MLX weight shards + `config.json`; includes tokenizer files & chat template for drop-in use
+- **Footprint:** ~**18.38 GB** on disk
 - **Intended use:** local inference / research on M-series Macs
 - **Not intended for:** safety-critical decisions; outputs may be inaccurate or biased
 ## Requirements
 Runs on: Apple Silicon (M1 and higher) with macOS ≥ 13.5 via MLX (Metal).
 Not supported: Intel macOS / Linux / Windows (use a GGUF build + llama.cpp instead).
+Suggested RAM: 48 GB for Q6 gs=32 (works on 32 GB with smaller KV cache).
 ## How to use (MLX)