Update README.md
Browse files
README.md
CHANGED
@@ -3,3 +3,9 @@ license: apache-2.0
|
|
3 |
base_model: Jinx-org/Jinx-gpt-oss-20b
|
4 |
base_model_relation: quantized
|
5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
base_model: Jinx-org/Jinx-gpt-oss-20b
|
4 |
base_model_relation: quantized
|
5 |
---
|
6 |
+
In these quantized versions of the model, most of the layers were shrunk to save space using **MXFP4**. The main difference is in the “gate” layers (`ffn_gate_exps.weight`) that decide which experts the model uses:
|
7 |
+
|
8 |
+
- **Q4_1 version (≈12 GB):** These gate layers were made smaller and faster, but their decisions can be slightly less precise.
|
9 |
+
- **Q8_0 version (≈15 GB):** These gate layers keep more detail, so the model makes more accurate choices, but the file is bigger and a bit slower.
|
10 |
+
|
11 |
+
All other layers are treated the same in both versions.
|