ubergarm commited on
Commit
6eb7b85
ยท
1 Parent(s): 5a9bb54

Upload IQ2_KL and add perplexity values

Browse files
Files changed (1) hide show
  1. README.md +16 -47
README.md CHANGED
@@ -59,57 +59,26 @@ Final estimate: PPL = 4.7056 +/- 0.02909
59
  <summary>๐Ÿ‘ˆ Secret Recipe</summary>
60
 
61
  ```bash
62
- #!/usr/bin/env bash
63
-
64
- custom="
65
- # 47 Repeating Layers [0-46]
66
- # Note: All ffn_down.* layers are not divisible by 256 so have limited quantization options.
67
-
68
- # Attention
69
- blk\.(0|1)\.attn_q.*=q8_0
70
- blk\.(0|1)\.attn_k.*=q8_0
71
- blk\.(0|1)\.attn_v.*=q8_0
72
- blk\.(0|1)\.attn_output.*=q8_0
73
-
74
- blk\..*\.attn_q.*=iq5_ks
75
- blk\..*\.attn_k.*=iq5_ks
76
- blk\..*\.attn_v.*=iq5_ks
77
- blk\..*\.attn_output.*=iq5_ks
78
-
79
- # First 1 Dense Layers [0]
80
- blk\..*\.ffn_down\.weight=q6_0
81
- blk\..*\.ffn_(gate|up)\.weight=iq5_ks
82
-
83
- # Shared Expert Layers [1-46]
84
- blk\..*\.ffn_down_shexp\.weight=q6_0
85
- blk\..*\.ffn_(gate|up)_shexp\.weight=iq5_ks
86
-
87
- # Routed Experts Layers [1-46]
88
- blk\..*\.ffn_down_exps\.weight=iq4_nl
89
- blk\..*\.ffn_(gate|up)_exps\.weight=iq4_kss
90
-
91
- # Non-Repeating Layers
92
- token_embd\.weight=iq4_k
93
- output\.weight=iq6_k
94
- "
95
-
96
- custom=$(
97
- echo "$custom" | grep -v '^#' | \
98
- sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
99
- )
100
-
101
- numactl -N 0 -m 0 \
102
- ./build/bin/llama-quantize \
103
- --custom-q "$custom" \
104
- --imatrix /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/imatrix-GLM-4.5-Air-BF16.dat \
105
- /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/GLM-4.5-Air-128x9.4B-BF16-00001-of-00005.gguf \
106
- /mnt/raid/models/ubergarm/GLM-4.5-Air-GGUF/GLM-4.5-Air-IQ4_KSS.gguf \
107
- IQ4_KSS \
108
- 192
109
  ```
110
 
111
  </details>
112
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  ## Quick Start
114
  If you want to disable thinking, add `/nothink` (correct, no underscore) at the *end* of your prompt.
115
 
 
59
  <summary>๐Ÿ‘ˆ Secret Recipe</summary>
60
 
61
  ```bash
62
+ echo TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ```
64
 
65
  </details>
66
 
67
+ ## IQ2_KL 43.870 GiB (3.411 BPW)
68
+ Final estimate: PPL = 5.0697 +/- 0.03166
69
+
70
+ <details>
71
+
72
+ <summary>๐Ÿ‘ˆ Secret Recipe</summary>
73
+
74
+ ```bash
75
+ echo TODO
76
+ ```
77
+
78
+ </summary>
79
+
80
+
81
+
82
  ## Quick Start
83
  If you want to disable thinking, add `/nothink` (correct, no underscore) at the *end* of your prompt.
84