marcelone commited on
Commit
142b987
·
verified ·
1 Parent(s): a7d16ab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -3,3 +3,9 @@ license: apache-2.0
3
  base_model: Jinx-org/Jinx-gpt-oss-20b
4
  base_model_relation: quantized
5
  ---
 
 
 
 
 
 
 
3
  base_model: Jinx-org/Jinx-gpt-oss-20b
4
  base_model_relation: quantized
5
  ---
6
+ In these quantized versions of the model, most of the layers were shrunk to save space using **MXFP4**. The main difference is in the “gate” layers (`ffn_gate_exps.weight`) that decide which experts the model uses:
7
+
8
+ - **Q4_1 version (≈12 GB):** These gate layers were made smaller and faster, but their decisions can be slightly less precise.
9
+ - **Q8_0 version (≈15 GB):** These gate layers keep more detail, so the model makes more accurate choices, but the file is bigger and a bit slower.
10
+
11
+ All other layers are treated the same in both versions.