Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ tags:
|
|
15 |
- openai
|
16 |
- halleyai
|
17 |
---
|
18 |
-
# gpt-oss-20b — MLX
|
19 |
|
20 |
**Summary.** This is a 6-bit (Q4) **MLX** quantization of **gpt-oss-20B** (sparse Mixture-of-Experts, MPx4). Group size is **32**.
|
21 |
Built for **Apple Silicon** with Metal acceleration.
|
@@ -23,14 +23,14 @@ Built for **Apple Silicon** with Metal acceleration.
|
|
23 |
- **Base model:** `openai/gpt-oss-20b` (Apache-2.0)
|
24 |
- **Quantization:** MLX Q6, `q_group_size=32` (some tensors remain FP16 for stability)
|
25 |
- **Files:** MLX weight shards + `config.json`; includes tokenizer files & chat template for drop-in use
|
26 |
-
- **Footprint:** ~**
|
27 |
- **Intended use:** local inference / research on M-series Macs
|
28 |
- **Not intended for:** safety-critical decisions; outputs may be inaccurate or biased
|
29 |
|
30 |
## Requirements
|
31 |
Runs on: Apple Silicon (M1 and higher) with macOS ≥ 13.5 via MLX (Metal).
|
32 |
Not supported: Intel macOS / Linux / Windows (use a GGUF build + llama.cpp instead).
|
33 |
-
Suggested RAM:
|
34 |
|
35 |
## How to use (MLX)
|
36 |
|
|
|
15 |
- openai
|
16 |
- halleyai
|
17 |
---
|
18 |
+
# gpt-oss-20b — MLX 6-bit (group size 32)
|
19 |
|
20 |
**Summary.** This is a 6-bit (Q4) **MLX** quantization of **gpt-oss-20B** (sparse Mixture-of-Experts, MPx4). Group size is **32**.
|
21 |
Built for **Apple Silicon** with Metal acceleration.
|
|
|
23 |
- **Base model:** `openai/gpt-oss-20b` (Apache-2.0)
|
24 |
- **Quantization:** MLX Q6, `q_group_size=32` (some tensors remain FP16 for stability)
|
25 |
- **Files:** MLX weight shards + `config.json`; includes tokenizer files & chat template for drop-in use
|
26 |
+
- **Footprint:** ~**18.38 GB** on disk
|
27 |
- **Intended use:** local inference / research on M-series Macs
|
28 |
- **Not intended for:** safety-critical decisions; outputs may be inaccurate or biased
|
29 |
|
30 |
## Requirements
|
31 |
Runs on: Apple Silicon (M1 and higher) with macOS ≥ 13.5 via MLX (Metal).
|
32 |
Not supported: Intel macOS / Linux / Windows (use a GGUF build + llama.cpp instead).
|
33 |
+
Suggested RAM: 48 GB for Q6 gs=32 (works on 32 GB with smaller KV cache).
|
34 |
|
35 |
## How to use (MLX)
|
36 |
|