Model card side-bar missing?

#1
by ubergarm - opened

Heya mradermacher, thanks for all the amazing quants. I was doing a comparison across the most likely top four V3-0324 quants in the ~230GiB size class and wanted to add your info to the table. However, for some reason the gguf details are not showing up?

I might be able to download the Q2_K and use gguf-py/gguf/gguf-reader.py or similar to print out the tensor data to add to my table.

I have some discussion on it here on the ik_llama.cpp fork.

Cheers and happy cookin'!

hf only supports non-split quants, that's probably why they don't show up. that sidebar is not provided by us, we have no influence over it.

Thanks, yeah makes sense. I wonder if it is because the split names are DeepSeek-V3-0324.i1-Q2_K.gguf.part1of5 instead of DeepSeek-V3-0324.i1-Q2_K.part1of5.gguf as maybe huggingface is webapp is looking at file extension...

I think I can get the info with:

$ python gguf-py/scripts/gguf_dump.py --markdown some_model.gguf

Cheers!

ubergarm changed discussion status to closed

I don't think hf supports split gguf files at all. Would probably be easy to improve on their side - since they only parse the header, they could go with the .gguf.part\d*1of\d+ file (which contains the header). Clearly, it's not a priority for them, and that's fine with me, too. I don't think it's the file extension alone, though, as they do not get confused by the multi-header llama format (which uses .gguf)

PS: part1of5.gguf wouldn't be correct

Ahh, thanks for the details. I guess I don't fully understand what a "split gguf" is exactly. In my testing I can run this.

$ du -h DeepSeek-V3-0324-IQ2_K_R4.gguf
227G    DeepSeek-V3-0324-IQ2_K_R4.gguf

$ ./build/bin/llama-gguf-split \
    --split \
    --split-max-size 50G \
    ./DeepSeek-V3-0324-IQ2_K_R4.gguf \
    /models/DeepSeek-V3-0324-IQ2_K_R4/DeepSeek-V3-0324-IQ2_K_R4

$ du -hc /models/DeepSeek-V3-0324-IQ2_K_R4/DeepSeek-V3-0324-IQ2_K_R4/*.gguf
46G     DeepSeek-V3-0324-IQ2_K_R4-00001-of-00005.gguf
47G     DeepSeek-V3-0324-IQ2_K_R4-00002-of-00005.gguf
47G     DeepSeek-V3-0324-IQ2_K_R4-00003-of-00005.gguf
47G     DeepSeek-V3-0324-IQ2_K_R4-00004-of-00005.gguf
43G     DeepSeek-V3-0324-IQ2_K_R4-00005-of-00005.gguf
227G    total

And huggingface works fine with this as seen in ubergarm/DeepSeek-V3-0324-GGUF as well as bartowski's and unsloth's repos as well.

Anyway, I'm just trying to see what exactly is in your quant before downloading the entire thing. I downloaded just the first part and gguf-py/scripts/gguf_dump.py isn't working, so I'll try to hexedit or find another tool that can print out the header information at least.

Or if you have the time to run this on the folder containing your splits and copy paste it here. No pressure at all, I know y'all keeping busy!

pip install 'numpy<2.0.0'
python llama.cpp/gguf-py/scripts/gguf_dump.py \
    --markdown \
    DeepSeek-V3-0324-i1-GGUF/DeepSeek-V3-0324.i1-Q2_K.gguf.part1of5

Thanks again!

@ubergarm You only need the first few megabytes of the first part to get the metadata. Just use gguf-parser-windows-amd64.exe as I did under https://huggingface.co/mradermacher/model_requests/discussions/797#67e2c2975baf8e70d6e63d99

I guess I don't fully understand what a "split gguf" is exactly.

Right - a split gguf is simply a single gguf file split into multiple parts. That is opposed to a model in multi-part format, which is multiple gguf files, which, unfortunately, is also something like a gguf split into multiple gguf files. It's just not a split gguf file, but multiple ones.

It is very confusing.

Or if you have the time to run this on the folder containing your splits and copy paste it here.

I don't have such a folder, but if you tell me what info you need, I might be able to provide it, if you can't get nico's method to work.

No pressure to look at this, I know u busy cooking! haha...

It is very confusing.

Ahh, I see now. It is literally the original .gguf binary data cut into pieces with somewhat arbitrary length. Perhaps this precedent was set by TheBloke as suggested by this gist script.

I would just download it and merge it myself but have to check with my server guy on bandwidth usage haha...

Basically I want to find out if you use different quantizations for different layers similar to how unsloth is doing it e.g. Q6_0 for attention and Q2_K for routed expert layers etc. As of now I believe bartowski is using the same smaller quantization across all the layers except keeping token embedding at Q8_0.

Assuming you have a merged gguf, this would show the information missing from the huggingface model card side-bar (as it can't handle the "split gguf"). (assuming WSL or Linux shell)

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
# curl -LsSf https://astral.sh/uv/install.sh | sh # install uv if needed
uv venv ./venv --python 3.12 --python-preference=only-managed
source venv/bin/activate
uv pip install 'numpy<2.0.0' sentencepiece pyyaml
python gguf-py/gguf/scripts/gguf_dump.py \
    --markdown \
    DeepSeek-V3-0324-i1-GGUF/DeepSeek-V3-0324.i1-Q2_K.gguf

It would print something out like this (taken from unsloth/DeepSeek-R1-GGUF/DeepSeek-R1-UD-Q2_K_XL-00001-of-00005.gguf) to show the various quantizations. The first ouput and token embedding part, block 0 (dense layers), and for example block 14 (experts) is plenty.

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
0 output.weight Output (W) (~927M) 926679040 7168 x 129280 x 1 x 1 Q6_K
1 output_norm.weight Output Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
2 token_embd.weight Token Embedding (W) (~927M) 926679040 7168 x 129280 x 1 x 1 Q4_K
T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
3 blk.0.attn_kv_a_mqa.weight Block 0 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 Q6_K
4 blk.0.attn_kv_a_norm.weight Block 0 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
5 blk.0.attn_kv_b.weight Block 0 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 Q6_K
6 blk.0.attn_norm.weight Block 0 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
7 blk.0.attn_output.weight Block 0 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 Q4_K
8 blk.0.attn_q_a.weight Block 0 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 Q4_K
9 blk.0.attn_q_a_norm.weight Block 0 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
10 blk.0.attn_q_b.weight Block 0 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 Q4_K
11 blk.0.ffn_down.weight Block 0 Feed-Forward Network "Down" (W) (~132M) 132120576 18432 x 7168 x 1 x 1 Q6_K
12 blk.0.ffn_gate.weight Block 0 Feed-Forward Network "Gate" (W) (~132M) 132120576 7168 x 18432 x 1 x 1 Q4_K
13 blk.0.ffn_norm.weight Block 0 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
14 blk.0.ffn_up.weight Block 0 Feed-Forward Network "Up" (W) (~132M) 132120576 7168 x 18432 x 1 x 1 Q4_K
T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
226 blk.14.attn_kv_a_mqa.weight Block 14 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 Q6_K
227 blk.14.attn_kv_a_norm.weight Block 14 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
228 blk.14.attn_kv_b.weight Block 14 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 Q6_K
229 blk.14.attn_norm.weight Block 14 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
230 blk.14.attn_output.weight Block 14 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 Q4_K
231 blk.14.attn_q_a.weight Block 14 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 Q4_K
232 blk.14.attn_q_a_norm.weight Block 14 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
233 blk.14.attn_q_b.weight Block 14 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 Q4_K
234 blk.14.exp_probs_b.bias Block 14 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
235 blk.14.ffn_down_exps.weight Block 14 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 Q2_K
236 blk.14.ffn_down_shexp.weight Block 14 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 Q6_K
237 blk.14.ffn_gate_exps.weight Block 14 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 Q2_K
238 blk.14.ffn_gate_inp.weight Block 14 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
239 blk.14.ffn_gate_shexp.weight Block 14 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 Q4_K
240 blk.14.ffn_norm.weight Block 14 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
241 blk.14.ffn_up_exps.weight Block 14 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 Q2_K
242 blk.14.ffn_up_shexp.weight Block 14 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 Q4_K

@ubergarm I ran it for DeepSeek-V3-0324.i1-IQ2_S.gguf

(venv) root@AI:/apool/Meta/llama.cpp# python gguf-py/gguf/scripts/gguf_dump.py
--markdown
/mradermacher/root/DeepSeek-V3-0324.i1-IQ2_S.gguf

/mradermacher/root/DeepSeek-V3-0324.i1-IQ2_S.gguf - GGUF Internal File Dump

  • Endian: LITTLE endian

Key Value Metadata Store

There are 58 key-value pairs in this file

POS TYPE Count Key Value
1 UINT32 1 GGUF.version 3
2 UINT64 1 GGUF.tensor_count 1025
3 UINT64 1 GGUF.kv_count 55
4 STRING 1 general.architecture deepseek2
5 STRING 1 general.type model
6 STRING 1 general.name DeepSeek V3 0324 Bf16
7 STRING 1 general.size_label 256x20B
8 STRING 1 general.license mit
9 UINT32 1 deepseek2.block_count 61
10 UINT32 1 deepseek2.context_length 163840
11 UINT32 1 deepseek2.embedding_length 7168
12 UINT32 1 deepseek2.feed_forward_length 18432
13 UINT32 1 deepseek2.attention.head_count 128
14 UINT32 1 deepseek2.attention.head_count_kv 128
15 FLOAT32 1 deepseek2.rope.freq_base 10000.0
16 FLOAT32 1 deepseek2.attention.layer_norm_rms_epsilon 1e-06
17 UINT32 1 deepseek2.expert_used_count 8
18 UINT32 1 deepseek2.leading_dense_block_count 3
19 UINT32 1 deepseek2.vocab_size 129280
20 UINT32 1 deepseek2.attention.q_lora_rank 1536
21 UINT32 1 deepseek2.attention.kv_lora_rank 512
22 UINT32 1 deepseek2.attention.key_length 192
23 UINT32 1 deepseek2.attention.value_length 128
24 UINT32 1 deepseek2.expert_feed_forward_length 2048
25 UINT32 1 deepseek2.expert_count 256
26 UINT32 1 deepseek2.expert_shared_count 1
27 FLOAT32 1 deepseek2.expert_weights_scale 2.5
28 BOOL 1 deepseek2.expert_weights_norm True
29 UINT32 1 deepseek2.expert_gating_func 2
30 UINT32 1 deepseek2.rope.dimension_count 64
31 STRING 1 deepseek2.rope.scaling.type yarn
32 FLOAT32 1 deepseek2.rope.scaling.factor 40.0
33 UINT32 1 deepseek2.rope.scaling.original_context_length 4096
34 FLOAT32 1 deepseek2.rope.scaling.yarn_log_multiplier 0.1
35 STRING 1 tokenizer.ggml.model gpt2
36 STRING 1 tokenizer.ggml.pre deepseek-v3
37 [STRING] 129280 tokenizer.ggml.tokens [ <|begin▁of▁sentence|>, <|end▁of▁sentence|>, <|▁pad▁|>, !, ", ... ]
38 [INT32] 129280 tokenizer.ggml.token_type [ 3, 3, 3, 1, 1, 1, 1, ... ]
39 [STRING] 127741 tokenizer.ggml.merges [ Ġ t, Ġ a, i n, Ġ Ġ, h e, ... ]
40 UINT32 1 tokenizer.ggml.bos_token_id 0
41 UINT32 1 tokenizer.ggml.eos_token_id 1
42 UINT32 1 tokenizer.ggml.padding_token_id 1
43 BOOL 1 tokenizer.ggml.add_bos_token True
44 BOOL 1 tokenizer.ggml.add_eos_token False
45 STRING 1 tokenizer.chat_template {% if not add_generation_promp...{{'<|Assistant|>'}}{% endif %}
46 UINT32 1 general.quantization_version 2
47 UINT32 1 general.file_type 28
48 STRING 1 general.url https://huggingface.co/mradermacher/DeepSeek-V3-0324-i1-GGUF
49 STRING 1 mradermacher.quantize_version 2
50 STRING 1 mradermacher.quantized_by mradermacher
51 STRING 1 mradermacher.quantized_at 2025-03-31T15:31:56+02:00
52 STRING 1 mradermacher.quantized_on nico1
53 STRING 1 general.source.url https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
54 STRING 1 mradermacher.convert_type hf
55 STRING 1 quantize.imatrix.file DeepSeek-V3-0324-i1-GGUF/imatrix.dat
56 STRING 1 quantize.imatrix.dataset imatrix-training-full-3
57 INT32 1 quantize.imatrix.entries_count 720
58 INT32 1 quantize.imatrix.chunks_count 315

Base Tensor Group : ~2B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
0 output.weight Output (W) (~927M) 926679040 7168 x 129280 x 1 x 1 Q5_K
1 output_norm.weight Output Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
2 token_embd.weight Token Embedding (W) (~927M) 926679040 7168 x 129280 x 1 x 1 IQ3_S
  • Total elements in base: ( ~2B) 1853365248
  • Percentage of total elements: 0.28%

Block 0 Tensor Group : ~583M Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
3 blk.0.attn_kv_a_mqa.weight Block 0 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
4 blk.0.attn_kv_a_norm.weight Block 0 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
5 blk.0.attn_kv_b.weight Block 0 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
6 blk.0.attn_norm.weight Block 0 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
7 blk.0.attn_output.weight Block 0 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
8 blk.0.attn_q_a.weight Block 0 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
9 blk.0.attn_q_a_norm.weight Block 0 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
10 blk.0.attn_q_b.weight Block 0 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
11 blk.0.ffn_down.weight Block 0 Feed-Forward Network "Down" (W) (~132M) 132120576 18432 x 7168 x 1 x 1 IQ3_S
12 blk.0.ffn_gate.weight Block 0 Feed-Forward Network "Gate" (W) (~132M) 132120576 7168 x 18432 x 1 x 1 IQ2_XS
13 blk.0.ffn_norm.weight Block 0 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
14 blk.0.ffn_up.weight Block 0 Feed-Forward Network "Up" (W) (~132M) 132120576 7168 x 18432 x 1 x 1 IQ2_XS
  • Total elements in blk.0: (~583M) 583483392
  • Percentage of total elements: 0.09%

Block 1 Tensor Group : ~583M Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
15 blk.1.attn_kv_a_mqa.weight Block 1 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
16 blk.1.attn_kv_a_norm.weight Block 1 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
17 blk.1.attn_kv_b.weight Block 1 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
18 blk.1.attn_norm.weight Block 1 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
19 blk.1.attn_output.weight Block 1 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
20 blk.1.attn_q_a.weight Block 1 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
21 blk.1.attn_q_a_norm.weight Block 1 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
22 blk.1.attn_q_b.weight Block 1 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
23 blk.1.ffn_down.weight Block 1 Feed-Forward Network "Down" (W) (~132M) 132120576 18432 x 7168 x 1 x 1 IQ3_S
24 blk.1.ffn_gate.weight Block 1 Feed-Forward Network "Gate" (W) (~132M) 132120576 7168 x 18432 x 1 x 1 IQ2_XS
25 blk.1.ffn_norm.weight Block 1 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
26 blk.1.ffn_up.weight Block 1 Feed-Forward Network "Up" (W) (~132M) 132120576 7168 x 18432 x 1 x 1 IQ2_XS
  • Total elements in blk.1: (~583M) 583483392
  • Percentage of total elements: 0.09%

Block 2 Tensor Group : ~583M Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
27 blk.2.attn_kv_a_mqa.weight Block 2 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
28 blk.2.attn_kv_a_norm.weight Block 2 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
29 blk.2.attn_kv_b.weight Block 2 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
30 blk.2.attn_norm.weight Block 2 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
31 blk.2.attn_output.weight Block 2 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
32 blk.2.attn_q_a.weight Block 2 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
33 blk.2.attn_q_a_norm.weight Block 2 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
34 blk.2.attn_q_b.weight Block 2 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
35 blk.2.ffn_down.weight Block 2 Feed-Forward Network "Down" (W) (~132M) 132120576 18432 x 7168 x 1 x 1 IQ3_S
36 blk.2.ffn_gate.weight Block 2 Feed-Forward Network "Gate" (W) (~132M) 132120576 7168 x 18432 x 1 x 1 IQ2_XS
37 blk.2.ffn_norm.weight Block 2 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
38 blk.2.ffn_up.weight Block 2 Feed-Forward Network "Up" (W) (~132M) 132120576 7168 x 18432 x 1 x 1 IQ2_XS
  • Total elements in blk.2: (~583M) 583483392
  • Percentage of total elements: 0.09%

Block 3 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
39 blk.3.attn_kv_a_mqa.weight Block 3 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
40 blk.3.attn_kv_a_norm.weight Block 3 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
41 blk.3.attn_kv_b.weight Block 3 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
42 blk.3.attn_norm.weight Block 3 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
43 blk.3.attn_output.weight Block 3 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
44 blk.3.attn_q_a.weight Block 3 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
45 blk.3.attn_q_a_norm.weight Block 3 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
46 blk.3.attn_q_b.weight Block 3 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
47 blk.3.exp_probs_b.bias Block 3 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
48 blk.3.ffn_down_exps.weight Block 3 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ3_S
49 blk.3.ffn_down_shexp.weight Block 3 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ3_S
50 blk.3.ffn_gate_exps.weight Block 3 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
51 blk.3.ffn_gate_inp.weight Block 3 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
52 blk.3.ffn_gate_shexp.weight Block 3 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
53 blk.3.ffn_norm.weight Block 3 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
54 blk.3.ffn_up_exps.weight Block 3 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
55 blk.3.ffn_up_shexp.weight Block 3 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.3: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 4 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
56 blk.4.attn_kv_a_mqa.weight Block 4 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
57 blk.4.attn_kv_a_norm.weight Block 4 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
58 blk.4.attn_kv_b.weight Block 4 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
59 blk.4.attn_norm.weight Block 4 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
60 blk.4.attn_output.weight Block 4 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
61 blk.4.attn_q_a.weight Block 4 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
62 blk.4.attn_q_a_norm.weight Block 4 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
63 blk.4.attn_q_b.weight Block 4 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
64 blk.4.exp_probs_b.bias Block 4 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
65 blk.4.ffn_down_exps.weight Block 4 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ3_S
66 blk.4.ffn_down_shexp.weight Block 4 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ3_S
67 blk.4.ffn_gate_exps.weight Block 4 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
68 blk.4.ffn_gate_inp.weight Block 4 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
69 blk.4.ffn_gate_shexp.weight Block 4 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
70 blk.4.ffn_norm.weight Block 4 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
71 blk.4.ffn_up_exps.weight Block 4 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
72 blk.4.ffn_up_shexp.weight Block 4 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.4: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 5 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
73 blk.5.attn_kv_a_mqa.weight Block 5 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
74 blk.5.attn_kv_a_norm.weight Block 5 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
75 blk.5.attn_kv_b.weight Block 5 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
76 blk.5.attn_norm.weight Block 5 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
77 blk.5.attn_output.weight Block 5 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
78 blk.5.attn_q_a.weight Block 5 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
79 blk.5.attn_q_a_norm.weight Block 5 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
80 blk.5.attn_q_b.weight Block 5 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
81 blk.5.exp_probs_b.bias Block 5 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
82 blk.5.ffn_down_exps.weight Block 5 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
83 blk.5.ffn_down_shexp.weight Block 5 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
84 blk.5.ffn_gate_exps.weight Block 5 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
85 blk.5.ffn_gate_inp.weight Block 5 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
86 blk.5.ffn_gate_shexp.weight Block 5 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
87 blk.5.ffn_norm.weight Block 5 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
88 blk.5.ffn_up_exps.weight Block 5 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
89 blk.5.ffn_up_shexp.weight Block 5 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.5: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 6 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
90 blk.6.attn_kv_a_mqa.weight Block 6 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
91 blk.6.attn_kv_a_norm.weight Block 6 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
92 blk.6.attn_kv_b.weight Block 6 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
93 blk.6.attn_norm.weight Block 6 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
94 blk.6.attn_output.weight Block 6 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
95 blk.6.attn_q_a.weight Block 6 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
96 blk.6.attn_q_a_norm.weight Block 6 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
97 blk.6.attn_q_b.weight Block 6 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
98 blk.6.exp_probs_b.bias Block 6 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
99 blk.6.ffn_down_exps.weight Block 6 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
100 blk.6.ffn_down_shexp.weight Block 6 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
101 blk.6.ffn_gate_exps.weight Block 6 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
102 blk.6.ffn_gate_inp.weight Block 6 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
103 blk.6.ffn_gate_shexp.weight Block 6 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
104 blk.6.ffn_norm.weight Block 6 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
105 blk.6.ffn_up_exps.weight Block 6 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
106 blk.6.ffn_up_shexp.weight Block 6 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.6: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 7 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
107 blk.7.attn_kv_a_mqa.weight Block 7 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
108 blk.7.attn_kv_a_norm.weight Block 7 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
109 blk.7.attn_kv_b.weight Block 7 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
110 blk.7.attn_norm.weight Block 7 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
111 blk.7.attn_output.weight Block 7 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
112 blk.7.attn_q_a.weight Block 7 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
113 blk.7.attn_q_a_norm.weight Block 7 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
114 blk.7.attn_q_b.weight Block 7 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
115 blk.7.exp_probs_b.bias Block 7 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
116 blk.7.ffn_down_exps.weight Block 7 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
117 blk.7.ffn_down_shexp.weight Block 7 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
118 blk.7.ffn_gate_exps.weight Block 7 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
119 blk.7.ffn_gate_inp.weight Block 7 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
120 blk.7.ffn_gate_shexp.weight Block 7 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
121 blk.7.ffn_norm.weight Block 7 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
122 blk.7.ffn_up_exps.weight Block 7 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
123 blk.7.ffn_up_shexp.weight Block 7 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.7: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 8 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
124 blk.8.attn_kv_a_mqa.weight Block 8 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
125 blk.8.attn_kv_a_norm.weight Block 8 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
126 blk.8.attn_kv_b.weight Block 8 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
127 blk.8.attn_norm.weight Block 8 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
128 blk.8.attn_output.weight Block 8 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
129 blk.8.attn_q_a.weight Block 8 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
130 blk.8.attn_q_a_norm.weight Block 8 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
131 blk.8.attn_q_b.weight Block 8 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
132 blk.8.exp_probs_b.bias Block 8 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
133 blk.8.ffn_down_exps.weight Block 8 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
134 blk.8.ffn_down_shexp.weight Block 8 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
135 blk.8.ffn_gate_exps.weight Block 8 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
136 blk.8.ffn_gate_inp.weight Block 8 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
137 blk.8.ffn_gate_shexp.weight Block 8 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
138 blk.8.ffn_norm.weight Block 8 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
139 blk.8.ffn_up_exps.weight Block 8 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
140 blk.8.ffn_up_shexp.weight Block 8 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.8: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 9 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
141 blk.9.attn_kv_a_mqa.weight Block 9 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
142 blk.9.attn_kv_a_norm.weight Block 9 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
143 blk.9.attn_kv_b.weight Block 9 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
144 blk.9.attn_norm.weight Block 9 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
145 blk.9.attn_output.weight Block 9 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
146 blk.9.attn_q_a.weight Block 9 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
147 blk.9.attn_q_a_norm.weight Block 9 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
148 blk.9.attn_q_b.weight Block 9 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
149 blk.9.exp_probs_b.bias Block 9 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
150 blk.9.ffn_down_exps.weight Block 9 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
151 blk.9.ffn_down_shexp.weight Block 9 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
152 blk.9.ffn_gate_exps.weight Block 9 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
153 blk.9.ffn_gate_inp.weight Block 9 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
154 blk.9.ffn_gate_shexp.weight Block 9 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
155 blk.9.ffn_norm.weight Block 9 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
156 blk.9.ffn_up_exps.weight Block 9 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
157 blk.9.ffn_up_shexp.weight Block 9 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.9: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 10 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
158 blk.10.attn_kv_a_mqa.weight Block 10 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
159 blk.10.attn_kv_a_norm.weight Block 10 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
160 blk.10.attn_kv_b.weight Block 10 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
161 blk.10.attn_norm.weight Block 10 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
162 blk.10.attn_output.weight Block 10 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
163 blk.10.attn_q_a.weight Block 10 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
164 blk.10.attn_q_a_norm.weight Block 10 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
165 blk.10.attn_q_b.weight Block 10 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
166 blk.10.exp_probs_b.bias Block 10 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
167 blk.10.ffn_down_exps.weight Block 10 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
168 blk.10.ffn_down_shexp.weight Block 10 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
169 blk.10.ffn_gate_exps.weight Block 10 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
170 blk.10.ffn_gate_inp.weight Block 10 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
171 blk.10.ffn_gate_shexp.weight Block 10 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
172 blk.10.ffn_norm.weight Block 10 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
173 blk.10.ffn_up_exps.weight Block 10 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
174 blk.10.ffn_up_shexp.weight Block 10 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.10: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 11 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
175 blk.11.attn_kv_a_mqa.weight Block 11 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
176 blk.11.attn_kv_a_norm.weight Block 11 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
177 blk.11.attn_kv_b.weight Block 11 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
178 blk.11.attn_norm.weight Block 11 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
179 blk.11.attn_output.weight Block 11 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
180 blk.11.attn_q_a.weight Block 11 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
181 blk.11.attn_q_a_norm.weight Block 11 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
182 blk.11.attn_q_b.weight Block 11 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
183 blk.11.exp_probs_b.bias Block 11 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
184 blk.11.ffn_down_exps.weight Block 11 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
185 blk.11.ffn_down_shexp.weight Block 11 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
186 blk.11.ffn_gate_exps.weight Block 11 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
187 blk.11.ffn_gate_inp.weight Block 11 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
188 blk.11.ffn_gate_shexp.weight Block 11 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
189 blk.11.ffn_norm.weight Block 11 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
190 blk.11.ffn_up_exps.weight Block 11 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
191 blk.11.ffn_up_shexp.weight Block 11 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.11: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 12 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
192 blk.12.attn_kv_a_mqa.weight Block 12 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
193 blk.12.attn_kv_a_norm.weight Block 12 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
194 blk.12.attn_kv_b.weight Block 12 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
195 blk.12.attn_norm.weight Block 12 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
196 blk.12.attn_output.weight Block 12 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
197 blk.12.attn_q_a.weight Block 12 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
198 blk.12.attn_q_a_norm.weight Block 12 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
199 blk.12.attn_q_b.weight Block 12 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
200 blk.12.exp_probs_b.bias Block 12 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
201 blk.12.ffn_down_exps.weight Block 12 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
202 blk.12.ffn_down_shexp.weight Block 12 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
203 blk.12.ffn_gate_exps.weight Block 12 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
204 blk.12.ffn_gate_inp.weight Block 12 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
205 blk.12.ffn_gate_shexp.weight Block 12 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
206 blk.12.ffn_norm.weight Block 12 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
207 blk.12.ffn_up_exps.weight Block 12 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
208 blk.12.ffn_up_shexp.weight Block 12 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.12: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 13 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
209 blk.13.attn_kv_a_mqa.weight Block 13 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
210 blk.13.attn_kv_a_norm.weight Block 13 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
211 blk.13.attn_kv_b.weight Block 13 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
212 blk.13.attn_norm.weight Block 13 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
213 blk.13.attn_output.weight Block 13 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
214 blk.13.attn_q_a.weight Block 13 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
215 blk.13.attn_q_a_norm.weight Block 13 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
216 blk.13.attn_q_b.weight Block 13 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
217 blk.13.exp_probs_b.bias Block 13 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
218 blk.13.ffn_down_exps.weight Block 13 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
219 blk.13.ffn_down_shexp.weight Block 13 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
220 blk.13.ffn_gate_exps.weight Block 13 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
221 blk.13.ffn_gate_inp.weight Block 13 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
222 blk.13.ffn_gate_shexp.weight Block 13 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
223 blk.13.ffn_norm.weight Block 13 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
224 blk.13.ffn_up_exps.weight Block 13 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
225 blk.13.ffn_up_shexp.weight Block 13 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.13: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 14 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
226 blk.14.attn_kv_a_mqa.weight Block 14 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
227 blk.14.attn_kv_a_norm.weight Block 14 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
228 blk.14.attn_kv_b.weight Block 14 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
229 blk.14.attn_norm.weight Block 14 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
230 blk.14.attn_output.weight Block 14 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
231 blk.14.attn_q_a.weight Block 14 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
232 blk.14.attn_q_a_norm.weight Block 14 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
233 blk.14.attn_q_b.weight Block 14 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
234 blk.14.exp_probs_b.bias Block 14 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
235 blk.14.ffn_down_exps.weight Block 14 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
236 blk.14.ffn_down_shexp.weight Block 14 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
237 blk.14.ffn_gate_exps.weight Block 14 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
238 blk.14.ffn_gate_inp.weight Block 14 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
239 blk.14.ffn_gate_shexp.weight Block 14 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
240 blk.14.ffn_norm.weight Block 14 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
241 blk.14.ffn_up_exps.weight Block 14 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
242 blk.14.ffn_up_shexp.weight Block 14 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.14: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 15 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
243 blk.15.attn_kv_a_mqa.weight Block 15 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
244 blk.15.attn_kv_a_norm.weight Block 15 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
245 blk.15.attn_kv_b.weight Block 15 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
246 blk.15.attn_norm.weight Block 15 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
247 blk.15.attn_output.weight Block 15 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
248 blk.15.attn_q_a.weight Block 15 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
249 blk.15.attn_q_a_norm.weight Block 15 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
250 blk.15.attn_q_b.weight Block 15 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
251 blk.15.exp_probs_b.bias Block 15 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
252 blk.15.ffn_down_exps.weight Block 15 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
253 blk.15.ffn_down_shexp.weight Block 15 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
254 blk.15.ffn_gate_exps.weight Block 15 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
255 blk.15.ffn_gate_inp.weight Block 15 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
256 blk.15.ffn_gate_shexp.weight Block 15 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
257 blk.15.ffn_norm.weight Block 15 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
258 blk.15.ffn_up_exps.weight Block 15 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
259 blk.15.ffn_up_shexp.weight Block 15 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.15: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 16 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
260 blk.16.attn_kv_a_mqa.weight Block 16 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
261 blk.16.attn_kv_a_norm.weight Block 16 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
262 blk.16.attn_kv_b.weight Block 16 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
263 blk.16.attn_norm.weight Block 16 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
264 blk.16.attn_output.weight Block 16 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
265 blk.16.attn_q_a.weight Block 16 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
266 blk.16.attn_q_a_norm.weight Block 16 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
267 blk.16.attn_q_b.weight Block 16 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
268 blk.16.exp_probs_b.bias Block 16 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
269 blk.16.ffn_down_exps.weight Block 16 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
270 blk.16.ffn_down_shexp.weight Block 16 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
271 blk.16.ffn_gate_exps.weight Block 16 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
272 blk.16.ffn_gate_inp.weight Block 16 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
273 blk.16.ffn_gate_shexp.weight Block 16 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
274 blk.16.ffn_norm.weight Block 16 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
275 blk.16.ffn_up_exps.weight Block 16 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
276 blk.16.ffn_up_shexp.weight Block 16 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.16: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 17 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
277 blk.17.attn_kv_a_mqa.weight Block 17 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
278 blk.17.attn_kv_a_norm.weight Block 17 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
279 blk.17.attn_kv_b.weight Block 17 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
280 blk.17.attn_norm.weight Block 17 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
281 blk.17.attn_output.weight Block 17 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
282 blk.17.attn_q_a.weight Block 17 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
283 blk.17.attn_q_a_norm.weight Block 17 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
284 blk.17.attn_q_b.weight Block 17 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
285 blk.17.exp_probs_b.bias Block 17 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
286 blk.17.ffn_down_exps.weight Block 17 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
287 blk.17.ffn_down_shexp.weight Block 17 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
288 blk.17.ffn_gate_exps.weight Block 17 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
289 blk.17.ffn_gate_inp.weight Block 17 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
290 blk.17.ffn_gate_shexp.weight Block 17 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
291 blk.17.ffn_norm.weight Block 17 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
292 blk.17.ffn_up_exps.weight Block 17 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
293 blk.17.ffn_up_shexp.weight Block 17 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.17: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 18 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
294 blk.18.attn_kv_a_mqa.weight Block 18 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
295 blk.18.attn_kv_a_norm.weight Block 18 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
296 blk.18.attn_kv_b.weight Block 18 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
297 blk.18.attn_norm.weight Block 18 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
298 blk.18.attn_output.weight Block 18 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
299 blk.18.attn_q_a.weight Block 18 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
300 blk.18.attn_q_a_norm.weight Block 18 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
301 blk.18.attn_q_b.weight Block 18 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
302 blk.18.exp_probs_b.bias Block 18 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
303 blk.18.ffn_down_exps.weight Block 18 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
304 blk.18.ffn_down_shexp.weight Block 18 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
305 blk.18.ffn_gate_exps.weight Block 18 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
306 blk.18.ffn_gate_inp.weight Block 18 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
307 blk.18.ffn_gate_shexp.weight Block 18 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
308 blk.18.ffn_norm.weight Block 18 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
309 blk.18.ffn_up_exps.weight Block 18 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
310 blk.18.ffn_up_shexp.weight Block 18 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.18: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 19 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
311 blk.19.attn_kv_a_mqa.weight Block 19 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
312 blk.19.attn_kv_a_norm.weight Block 19 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
313 blk.19.attn_kv_b.weight Block 19 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
314 blk.19.attn_norm.weight Block 19 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
315 blk.19.attn_output.weight Block 19 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
316 blk.19.attn_q_a.weight Block 19 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
317 blk.19.attn_q_a_norm.weight Block 19 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
318 blk.19.attn_q_b.weight Block 19 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
319 blk.19.exp_probs_b.bias Block 19 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
320 blk.19.ffn_down_exps.weight Block 19 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
321 blk.19.ffn_down_shexp.weight Block 19 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
322 blk.19.ffn_gate_exps.weight Block 19 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
323 blk.19.ffn_gate_inp.weight Block 19 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
324 blk.19.ffn_gate_shexp.weight Block 19 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
325 blk.19.ffn_norm.weight Block 19 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
326 blk.19.ffn_up_exps.weight Block 19 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
327 blk.19.ffn_up_shexp.weight Block 19 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.19: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 20 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
328 blk.20.attn_kv_a_mqa.weight Block 20 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
329 blk.20.attn_kv_a_norm.weight Block 20 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
330 blk.20.attn_kv_b.weight Block 20 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
331 blk.20.attn_norm.weight Block 20 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
332 blk.20.attn_output.weight Block 20 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
333 blk.20.attn_q_a.weight Block 20 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
334 blk.20.attn_q_a_norm.weight Block 20 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
335 blk.20.attn_q_b.weight Block 20 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
336 blk.20.exp_probs_b.bias Block 20 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
337 blk.20.ffn_down_exps.weight Block 20 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
338 blk.20.ffn_down_shexp.weight Block 20 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
339 blk.20.ffn_gate_exps.weight Block 20 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
340 blk.20.ffn_gate_inp.weight Block 20 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
341 blk.20.ffn_gate_shexp.weight Block 20 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
342 blk.20.ffn_norm.weight Block 20 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
343 blk.20.ffn_up_exps.weight Block 20 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
344 blk.20.ffn_up_shexp.weight Block 20 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.20: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 21 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
345 blk.21.attn_kv_a_mqa.weight Block 21 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
346 blk.21.attn_kv_a_norm.weight Block 21 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
347 blk.21.attn_kv_b.weight Block 21 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
348 blk.21.attn_norm.weight Block 21 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
349 blk.21.attn_output.weight Block 21 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
350 blk.21.attn_q_a.weight Block 21 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
351 blk.21.attn_q_a_norm.weight Block 21 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
352 blk.21.attn_q_b.weight Block 21 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
353 blk.21.exp_probs_b.bias Block 21 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
354 blk.21.ffn_down_exps.weight Block 21 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
355 blk.21.ffn_down_shexp.weight Block 21 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
356 blk.21.ffn_gate_exps.weight Block 21 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
357 blk.21.ffn_gate_inp.weight Block 21 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
358 blk.21.ffn_gate_shexp.weight Block 21 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
359 blk.21.ffn_norm.weight Block 21 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
360 blk.21.ffn_up_exps.weight Block 21 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
361 blk.21.ffn_up_shexp.weight Block 21 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.21: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 22 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
362 blk.22.attn_kv_a_mqa.weight Block 22 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
363 blk.22.attn_kv_a_norm.weight Block 22 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
364 blk.22.attn_kv_b.weight Block 22 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
365 blk.22.attn_norm.weight Block 22 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
366 blk.22.attn_output.weight Block 22 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
367 blk.22.attn_q_a.weight Block 22 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
368 blk.22.attn_q_a_norm.weight Block 22 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
369 blk.22.attn_q_b.weight Block 22 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
370 blk.22.exp_probs_b.bias Block 22 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
371 blk.22.ffn_down_exps.weight Block 22 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
372 blk.22.ffn_down_shexp.weight Block 22 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
373 blk.22.ffn_gate_exps.weight Block 22 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
374 blk.22.ffn_gate_inp.weight Block 22 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
375 blk.22.ffn_gate_shexp.weight Block 22 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
376 blk.22.ffn_norm.weight Block 22 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
377 blk.22.ffn_up_exps.weight Block 22 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
378 blk.22.ffn_up_shexp.weight Block 22 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.22: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 23 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
379 blk.23.attn_kv_a_mqa.weight Block 23 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
380 blk.23.attn_kv_a_norm.weight Block 23 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
381 blk.23.attn_kv_b.weight Block 23 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
382 blk.23.attn_norm.weight Block 23 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
383 blk.23.attn_output.weight Block 23 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
384 blk.23.attn_q_a.weight Block 23 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
385 blk.23.attn_q_a_norm.weight Block 23 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
386 blk.23.attn_q_b.weight Block 23 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
387 blk.23.exp_probs_b.bias Block 23 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
388 blk.23.ffn_down_exps.weight Block 23 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
389 blk.23.ffn_down_shexp.weight Block 23 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
390 blk.23.ffn_gate_exps.weight Block 23 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
391 blk.23.ffn_gate_inp.weight Block 23 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
392 blk.23.ffn_gate_shexp.weight Block 23 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
393 blk.23.ffn_norm.weight Block 23 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
394 blk.23.ffn_up_exps.weight Block 23 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
395 blk.23.ffn_up_shexp.weight Block 23 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.23: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 24 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
396 blk.24.attn_kv_a_mqa.weight Block 24 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
397 blk.24.attn_kv_a_norm.weight Block 24 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
398 blk.24.attn_kv_b.weight Block 24 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
399 blk.24.attn_norm.weight Block 24 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
400 blk.24.attn_output.weight Block 24 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
401 blk.24.attn_q_a.weight Block 24 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
402 blk.24.attn_q_a_norm.weight Block 24 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
403 blk.24.attn_q_b.weight Block 24 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
404 blk.24.exp_probs_b.bias Block 24 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
405 blk.24.ffn_down_exps.weight Block 24 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
406 blk.24.ffn_down_shexp.weight Block 24 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
407 blk.24.ffn_gate_exps.weight Block 24 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
408 blk.24.ffn_gate_inp.weight Block 24 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
409 blk.24.ffn_gate_shexp.weight Block 24 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
410 blk.24.ffn_norm.weight Block 24 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
411 blk.24.ffn_up_exps.weight Block 24 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
412 blk.24.ffn_up_shexp.weight Block 24 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.24: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 25 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
413 blk.25.attn_kv_a_mqa.weight Block 25 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
414 blk.25.attn_kv_a_norm.weight Block 25 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
415 blk.25.attn_kv_b.weight Block 25 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
416 blk.25.attn_norm.weight Block 25 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
417 blk.25.attn_output.weight Block 25 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
418 blk.25.attn_q_a.weight Block 25 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
419 blk.25.attn_q_a_norm.weight Block 25 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
420 blk.25.attn_q_b.weight Block 25 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
421 blk.25.exp_probs_b.bias Block 25 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
422 blk.25.ffn_down_exps.weight Block 25 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
423 blk.25.ffn_down_shexp.weight Block 25 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
424 blk.25.ffn_gate_exps.weight Block 25 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
425 blk.25.ffn_gate_inp.weight Block 25 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
426 blk.25.ffn_gate_shexp.weight Block 25 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
427 blk.25.ffn_norm.weight Block 25 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
428 blk.25.ffn_up_exps.weight Block 25 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
429 blk.25.ffn_up_shexp.weight Block 25 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.25: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 26 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
430 blk.26.attn_kv_a_mqa.weight Block 26 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
431 blk.26.attn_kv_a_norm.weight Block 26 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
432 blk.26.attn_kv_b.weight Block 26 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
433 blk.26.attn_norm.weight Block 26 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
434 blk.26.attn_output.weight Block 26 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
435 blk.26.attn_q_a.weight Block 26 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
436 blk.26.attn_q_a_norm.weight Block 26 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
437 blk.26.attn_q_b.weight Block 26 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
438 blk.26.exp_probs_b.bias Block 26 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
439 blk.26.ffn_down_exps.weight Block 26 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
440 blk.26.ffn_down_shexp.weight Block 26 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
441 blk.26.ffn_gate_exps.weight Block 26 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
442 blk.26.ffn_gate_inp.weight Block 26 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
443 blk.26.ffn_gate_shexp.weight Block 26 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
444 blk.26.ffn_norm.weight Block 26 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
445 blk.26.ffn_up_exps.weight Block 26 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
446 blk.26.ffn_up_shexp.weight Block 26 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.26: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 27 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
447 blk.27.attn_kv_a_mqa.weight Block 27 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
448 blk.27.attn_kv_a_norm.weight Block 27 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
449 blk.27.attn_kv_b.weight Block 27 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
450 blk.27.attn_norm.weight Block 27 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
451 blk.27.attn_output.weight Block 27 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
452 blk.27.attn_q_a.weight Block 27 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
453 blk.27.attn_q_a_norm.weight Block 27 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
454 blk.27.attn_q_b.weight Block 27 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
455 blk.27.exp_probs_b.bias Block 27 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
456 blk.27.ffn_down_exps.weight Block 27 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
457 blk.27.ffn_down_shexp.weight Block 27 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
458 blk.27.ffn_gate_exps.weight Block 27 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
459 blk.27.ffn_gate_inp.weight Block 27 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
460 blk.27.ffn_gate_shexp.weight Block 27 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
461 blk.27.ffn_norm.weight Block 27 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
462 blk.27.ffn_up_exps.weight Block 27 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
463 blk.27.ffn_up_shexp.weight Block 27 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.27: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 28 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
464 blk.28.attn_kv_a_mqa.weight Block 28 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
465 blk.28.attn_kv_a_norm.weight Block 28 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
466 blk.28.attn_kv_b.weight Block 28 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
467 blk.28.attn_norm.weight Block 28 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
468 blk.28.attn_output.weight Block 28 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
469 blk.28.attn_q_a.weight Block 28 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
470 blk.28.attn_q_a_norm.weight Block 28 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
471 blk.28.attn_q_b.weight Block 28 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
472 blk.28.exp_probs_b.bias Block 28 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
473 blk.28.ffn_down_exps.weight Block 28 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
474 blk.28.ffn_down_shexp.weight Block 28 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
475 blk.28.ffn_gate_exps.weight Block 28 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
476 blk.28.ffn_gate_inp.weight Block 28 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
477 blk.28.ffn_gate_shexp.weight Block 28 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
478 blk.28.ffn_norm.weight Block 28 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
479 blk.28.ffn_up_exps.weight Block 28 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
480 blk.28.ffn_up_shexp.weight Block 28 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.28: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 29 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
481 blk.29.attn_kv_a_mqa.weight Block 29 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
482 blk.29.attn_kv_a_norm.weight Block 29 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
483 blk.29.attn_kv_b.weight Block 29 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
484 blk.29.attn_norm.weight Block 29 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
485 blk.29.attn_output.weight Block 29 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
486 blk.29.attn_q_a.weight Block 29 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
487 blk.29.attn_q_a_norm.weight Block 29 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
488 blk.29.attn_q_b.weight Block 29 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
489 blk.29.exp_probs_b.bias Block 29 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
490 blk.29.ffn_down_exps.weight Block 29 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
491 blk.29.ffn_down_shexp.weight Block 29 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
492 blk.29.ffn_gate_exps.weight Block 29 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
493 blk.29.ffn_gate_inp.weight Block 29 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
494 blk.29.ffn_gate_shexp.weight Block 29 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
495 blk.29.ffn_norm.weight Block 29 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
496 blk.29.ffn_up_exps.weight Block 29 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
497 blk.29.ffn_up_shexp.weight Block 29 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.29: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 30 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
498 blk.30.attn_kv_a_mqa.weight Block 30 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
499 blk.30.attn_kv_a_norm.weight Block 30 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
500 blk.30.attn_kv_b.weight Block 30 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
501 blk.30.attn_norm.weight Block 30 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
502 blk.30.attn_output.weight Block 30 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
503 blk.30.attn_q_a.weight Block 30 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
504 blk.30.attn_q_a_norm.weight Block 30 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
505 blk.30.attn_q_b.weight Block 30 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
506 blk.30.exp_probs_b.bias Block 30 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
507 blk.30.ffn_down_exps.weight Block 30 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
508 blk.30.ffn_down_shexp.weight Block 30 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
509 blk.30.ffn_gate_exps.weight Block 30 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
510 blk.30.ffn_gate_inp.weight Block 30 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
511 blk.30.ffn_gate_shexp.weight Block 30 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
512 blk.30.ffn_norm.weight Block 30 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
513 blk.30.ffn_up_exps.weight Block 30 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
514 blk.30.ffn_up_shexp.weight Block 30 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.30: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 31 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
515 blk.31.attn_kv_a_mqa.weight Block 31 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
516 blk.31.attn_kv_a_norm.weight Block 31 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
517 blk.31.attn_kv_b.weight Block 31 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
518 blk.31.attn_norm.weight Block 31 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
519 blk.31.attn_output.weight Block 31 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
520 blk.31.attn_q_a.weight Block 31 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
521 blk.31.attn_q_a_norm.weight Block 31 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
522 blk.31.attn_q_b.weight Block 31 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
523 blk.31.exp_probs_b.bias Block 31 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
524 blk.31.ffn_down_exps.weight Block 31 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
525 blk.31.ffn_down_shexp.weight Block 31 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
526 blk.31.ffn_gate_exps.weight Block 31 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
527 blk.31.ffn_gate_inp.weight Block 31 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
528 blk.31.ffn_gate_shexp.weight Block 31 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
529 blk.31.ffn_norm.weight Block 31 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
530 blk.31.ffn_up_exps.weight Block 31 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
531 blk.31.ffn_up_shexp.weight Block 31 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.31: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 32 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
532 blk.32.attn_kv_a_mqa.weight Block 32 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
533 blk.32.attn_kv_a_norm.weight Block 32 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
534 blk.32.attn_kv_b.weight Block 32 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
535 blk.32.attn_norm.weight Block 32 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
536 blk.32.attn_output.weight Block 32 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
537 blk.32.attn_q_a.weight Block 32 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
538 blk.32.attn_q_a_norm.weight Block 32 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
539 blk.32.attn_q_b.weight Block 32 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
540 blk.32.exp_probs_b.bias Block 32 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
541 blk.32.ffn_down_exps.weight Block 32 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
542 blk.32.ffn_down_shexp.weight Block 32 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
543 blk.32.ffn_gate_exps.weight Block 32 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
544 blk.32.ffn_gate_inp.weight Block 32 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
545 blk.32.ffn_gate_shexp.weight Block 32 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
546 blk.32.ffn_norm.weight Block 32 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
547 blk.32.ffn_up_exps.weight Block 32 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
548 blk.32.ffn_up_shexp.weight Block 32 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.32: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 33 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
549 blk.33.attn_kv_a_mqa.weight Block 33 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
550 blk.33.attn_kv_a_norm.weight Block 33 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
551 blk.33.attn_kv_b.weight Block 33 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
552 blk.33.attn_norm.weight Block 33 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
553 blk.33.attn_output.weight Block 33 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
554 blk.33.attn_q_a.weight Block 33 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
555 blk.33.attn_q_a_norm.weight Block 33 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
556 blk.33.attn_q_b.weight Block 33 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
557 blk.33.exp_probs_b.bias Block 33 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
558 blk.33.ffn_down_exps.weight Block 33 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
559 blk.33.ffn_down_shexp.weight Block 33 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
560 blk.33.ffn_gate_exps.weight Block 33 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
561 blk.33.ffn_gate_inp.weight Block 33 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
562 blk.33.ffn_gate_shexp.weight Block 33 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
563 blk.33.ffn_norm.weight Block 33 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
564 blk.33.ffn_up_exps.weight Block 33 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
565 blk.33.ffn_up_shexp.weight Block 33 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.33: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 34 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
566 blk.34.attn_kv_a_mqa.weight Block 34 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
567 blk.34.attn_kv_a_norm.weight Block 34 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
568 blk.34.attn_kv_b.weight Block 34 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
569 blk.34.attn_norm.weight Block 34 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
570 blk.34.attn_output.weight Block 34 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
571 blk.34.attn_q_a.weight Block 34 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
572 blk.34.attn_q_a_norm.weight Block 34 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
573 blk.34.attn_q_b.weight Block 34 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
574 blk.34.exp_probs_b.bias Block 34 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
575 blk.34.ffn_down_exps.weight Block 34 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
576 blk.34.ffn_down_shexp.weight Block 34 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
577 blk.34.ffn_gate_exps.weight Block 34 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
578 blk.34.ffn_gate_inp.weight Block 34 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
579 blk.34.ffn_gate_shexp.weight Block 34 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
580 blk.34.ffn_norm.weight Block 34 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
581 blk.34.ffn_up_exps.weight Block 34 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
582 blk.34.ffn_up_shexp.weight Block 34 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.34: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 35 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
583 blk.35.attn_kv_a_mqa.weight Block 35 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
584 blk.35.attn_kv_a_norm.weight Block 35 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
585 blk.35.attn_kv_b.weight Block 35 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
586 blk.35.attn_norm.weight Block 35 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
587 blk.35.attn_output.weight Block 35 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
588 blk.35.attn_q_a.weight Block 35 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
589 blk.35.attn_q_a_norm.weight Block 35 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
590 blk.35.attn_q_b.weight Block 35 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
591 blk.35.exp_probs_b.bias Block 35 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
592 blk.35.ffn_down_exps.weight Block 35 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
593 blk.35.ffn_down_shexp.weight Block 35 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
594 blk.35.ffn_gate_exps.weight Block 35 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
595 blk.35.ffn_gate_inp.weight Block 35 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
596 blk.35.ffn_gate_shexp.weight Block 35 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
597 blk.35.ffn_norm.weight Block 35 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
598 blk.35.ffn_up_exps.weight Block 35 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
599 blk.35.ffn_up_shexp.weight Block 35 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.35: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 36 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
600 blk.36.attn_kv_a_mqa.weight Block 36 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
601 blk.36.attn_kv_a_norm.weight Block 36 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
602 blk.36.attn_kv_b.weight Block 36 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
603 blk.36.attn_norm.weight Block 36 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
604 blk.36.attn_output.weight Block 36 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
605 blk.36.attn_q_a.weight Block 36 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
606 blk.36.attn_q_a_norm.weight Block 36 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
607 blk.36.attn_q_b.weight Block 36 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
608 blk.36.exp_probs_b.bias Block 36 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
609 blk.36.ffn_down_exps.weight Block 36 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
610 blk.36.ffn_down_shexp.weight Block 36 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
611 blk.36.ffn_gate_exps.weight Block 36 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
612 blk.36.ffn_gate_inp.weight Block 36 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
613 blk.36.ffn_gate_shexp.weight Block 36 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
614 blk.36.ffn_norm.weight Block 36 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
615 blk.36.ffn_up_exps.weight Block 36 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
616 blk.36.ffn_up_shexp.weight Block 36 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.36: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 37 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
617 blk.37.attn_kv_a_mqa.weight Block 37 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
618 blk.37.attn_kv_a_norm.weight Block 37 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
619 blk.37.attn_kv_b.weight Block 37 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
620 blk.37.attn_norm.weight Block 37 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
621 blk.37.attn_output.weight Block 37 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
622 blk.37.attn_q_a.weight Block 37 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
623 blk.37.attn_q_a_norm.weight Block 37 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
624 blk.37.attn_q_b.weight Block 37 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
625 blk.37.exp_probs_b.bias Block 37 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
626 blk.37.ffn_down_exps.weight Block 37 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
627 blk.37.ffn_down_shexp.weight Block 37 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
628 blk.37.ffn_gate_exps.weight Block 37 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
629 blk.37.ffn_gate_inp.weight Block 37 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
630 blk.37.ffn_gate_shexp.weight Block 37 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
631 blk.37.ffn_norm.weight Block 37 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
632 blk.37.ffn_up_exps.weight Block 37 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
633 blk.37.ffn_up_shexp.weight Block 37 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.37: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 38 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
634 blk.38.attn_kv_a_mqa.weight Block 38 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
635 blk.38.attn_kv_a_norm.weight Block 38 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
636 blk.38.attn_kv_b.weight Block 38 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
637 blk.38.attn_norm.weight Block 38 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
638 blk.38.attn_output.weight Block 38 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
639 blk.38.attn_q_a.weight Block 38 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
640 blk.38.attn_q_a_norm.weight Block 38 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
641 blk.38.attn_q_b.weight Block 38 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
642 blk.38.exp_probs_b.bias Block 38 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
643 blk.38.ffn_down_exps.weight Block 38 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
644 blk.38.ffn_down_shexp.weight Block 38 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
645 blk.38.ffn_gate_exps.weight Block 38 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
646 blk.38.ffn_gate_inp.weight Block 38 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
647 blk.38.ffn_gate_shexp.weight Block 38 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
648 blk.38.ffn_norm.weight Block 38 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
649 blk.38.ffn_up_exps.weight Block 38 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
650 blk.38.ffn_up_shexp.weight Block 38 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.38: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 39 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
651 blk.39.attn_kv_a_mqa.weight Block 39 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
652 blk.39.attn_kv_a_norm.weight Block 39 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
653 blk.39.attn_kv_b.weight Block 39 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
654 blk.39.attn_norm.weight Block 39 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
655 blk.39.attn_output.weight Block 39 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
656 blk.39.attn_q_a.weight Block 39 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
657 blk.39.attn_q_a_norm.weight Block 39 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
658 blk.39.attn_q_b.weight Block 39 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
659 blk.39.exp_probs_b.bias Block 39 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
660 blk.39.ffn_down_exps.weight Block 39 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
661 blk.39.ffn_down_shexp.weight Block 39 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
662 blk.39.ffn_gate_exps.weight Block 39 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
663 blk.39.ffn_gate_inp.weight Block 39 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
664 blk.39.ffn_gate_shexp.weight Block 39 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
665 blk.39.ffn_norm.weight Block 39 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
666 blk.39.ffn_up_exps.weight Block 39 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
667 blk.39.ffn_up_shexp.weight Block 39 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.39: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 40 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
668 blk.40.attn_kv_a_mqa.weight Block 40 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
669 blk.40.attn_kv_a_norm.weight Block 40 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
670 blk.40.attn_kv_b.weight Block 40 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
671 blk.40.attn_norm.weight Block 40 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
672 blk.40.attn_output.weight Block 40 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
673 blk.40.attn_q_a.weight Block 40 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
674 blk.40.attn_q_a_norm.weight Block 40 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
675 blk.40.attn_q_b.weight Block 40 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
676 blk.40.exp_probs_b.bias Block 40 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
677 blk.40.ffn_down_exps.weight Block 40 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
678 blk.40.ffn_down_shexp.weight Block 40 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
679 blk.40.ffn_gate_exps.weight Block 40 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
680 blk.40.ffn_gate_inp.weight Block 40 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
681 blk.40.ffn_gate_shexp.weight Block 40 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
682 blk.40.ffn_norm.weight Block 40 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
683 blk.40.ffn_up_exps.weight Block 40 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
684 blk.40.ffn_up_shexp.weight Block 40 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.40: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 41 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
685 blk.41.attn_kv_a_mqa.weight Block 41 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
686 blk.41.attn_kv_a_norm.weight Block 41 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
687 blk.41.attn_kv_b.weight Block 41 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
688 blk.41.attn_norm.weight Block 41 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
689 blk.41.attn_output.weight Block 41 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
690 blk.41.attn_q_a.weight Block 41 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
691 blk.41.attn_q_a_norm.weight Block 41 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
692 blk.41.attn_q_b.weight Block 41 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
693 blk.41.exp_probs_b.bias Block 41 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
694 blk.41.ffn_down_exps.weight Block 41 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
695 blk.41.ffn_down_shexp.weight Block 41 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
696 blk.41.ffn_gate_exps.weight Block 41 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
697 blk.41.ffn_gate_inp.weight Block 41 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
698 blk.41.ffn_gate_shexp.weight Block 41 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
699 blk.41.ffn_norm.weight Block 41 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
700 blk.41.ffn_up_exps.weight Block 41 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
701 blk.41.ffn_up_shexp.weight Block 41 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.41: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 42 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
702 blk.42.attn_kv_a_mqa.weight Block 42 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
703 blk.42.attn_kv_a_norm.weight Block 42 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
704 blk.42.attn_kv_b.weight Block 42 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
705 blk.42.attn_norm.weight Block 42 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
706 blk.42.attn_output.weight Block 42 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
707 blk.42.attn_q_a.weight Block 42 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
708 blk.42.attn_q_a_norm.weight Block 42 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
709 blk.42.attn_q_b.weight Block 42 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
710 blk.42.exp_probs_b.bias Block 42 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
711 blk.42.ffn_down_exps.weight Block 42 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
712 blk.42.ffn_down_shexp.weight Block 42 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
713 blk.42.ffn_gate_exps.weight Block 42 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
714 blk.42.ffn_gate_inp.weight Block 42 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
715 blk.42.ffn_gate_shexp.weight Block 42 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
716 blk.42.ffn_norm.weight Block 42 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
717 blk.42.ffn_up_exps.weight Block 42 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
718 blk.42.ffn_up_shexp.weight Block 42 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.42: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 43 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
719 blk.43.attn_kv_a_mqa.weight Block 43 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
720 blk.43.attn_kv_a_norm.weight Block 43 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
721 blk.43.attn_kv_b.weight Block 43 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
722 blk.43.attn_norm.weight Block 43 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
723 blk.43.attn_output.weight Block 43 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
724 blk.43.attn_q_a.weight Block 43 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
725 blk.43.attn_q_a_norm.weight Block 43 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
726 blk.43.attn_q_b.weight Block 43 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
727 blk.43.exp_probs_b.bias Block 43 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
728 blk.43.ffn_down_exps.weight Block 43 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
729 blk.43.ffn_down_shexp.weight Block 43 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
730 blk.43.ffn_gate_exps.weight Block 43 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
731 blk.43.ffn_gate_inp.weight Block 43 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
732 blk.43.ffn_gate_shexp.weight Block 43 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
733 blk.43.ffn_norm.weight Block 43 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
734 blk.43.ffn_up_exps.weight Block 43 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
735 blk.43.ffn_up_shexp.weight Block 43 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.43: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 44 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
736 blk.44.attn_kv_a_mqa.weight Block 44 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
737 blk.44.attn_kv_a_norm.weight Block 44 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
738 blk.44.attn_kv_b.weight Block 44 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
739 blk.44.attn_norm.weight Block 44 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
740 blk.44.attn_output.weight Block 44 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
741 blk.44.attn_q_a.weight Block 44 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
742 blk.44.attn_q_a_norm.weight Block 44 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
743 blk.44.attn_q_b.weight Block 44 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
744 blk.44.exp_probs_b.bias Block 44 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
745 blk.44.ffn_down_exps.weight Block 44 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
746 blk.44.ffn_down_shexp.weight Block 44 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
747 blk.44.ffn_gate_exps.weight Block 44 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
748 blk.44.ffn_gate_inp.weight Block 44 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
749 blk.44.ffn_gate_shexp.weight Block 44 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
750 blk.44.ffn_norm.weight Block 44 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
751 blk.44.ffn_up_exps.weight Block 44 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
752 blk.44.ffn_up_shexp.weight Block 44 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.44: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 45 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
753 blk.45.attn_kv_a_mqa.weight Block 45 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
754 blk.45.attn_kv_a_norm.weight Block 45 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
755 blk.45.attn_kv_b.weight Block 45 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
756 blk.45.attn_norm.weight Block 45 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
757 blk.45.attn_output.weight Block 45 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
758 blk.45.attn_q_a.weight Block 45 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
759 blk.45.attn_q_a_norm.weight Block 45 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
760 blk.45.attn_q_b.weight Block 45 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
761 blk.45.exp_probs_b.bias Block 45 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
762 blk.45.ffn_down_exps.weight Block 45 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
763 blk.45.ffn_down_shexp.weight Block 45 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
764 blk.45.ffn_gate_exps.weight Block 45 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
765 blk.45.ffn_gate_inp.weight Block 45 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
766 blk.45.ffn_gate_shexp.weight Block 45 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
767 blk.45.ffn_norm.weight Block 45 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
768 blk.45.ffn_up_exps.weight Block 45 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
769 blk.45.ffn_up_shexp.weight Block 45 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.45: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 46 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
770 blk.46.attn_kv_a_mqa.weight Block 46 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
771 blk.46.attn_kv_a_norm.weight Block 46 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
772 blk.46.attn_kv_b.weight Block 46 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
773 blk.46.attn_norm.weight Block 46 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
774 blk.46.attn_output.weight Block 46 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
775 blk.46.attn_q_a.weight Block 46 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
776 blk.46.attn_q_a_norm.weight Block 46 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
777 blk.46.attn_q_b.weight Block 46 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
778 blk.46.exp_probs_b.bias Block 46 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
779 blk.46.ffn_down_exps.weight Block 46 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
780 blk.46.ffn_down_shexp.weight Block 46 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
781 blk.46.ffn_gate_exps.weight Block 46 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
782 blk.46.ffn_gate_inp.weight Block 46 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
783 blk.46.ffn_gate_shexp.weight Block 46 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
784 blk.46.ffn_norm.weight Block 46 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
785 blk.46.ffn_up_exps.weight Block 46 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
786 blk.46.ffn_up_shexp.weight Block 46 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.46: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 47 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
787 blk.47.attn_kv_a_mqa.weight Block 47 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
788 blk.47.attn_kv_a_norm.weight Block 47 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
789 blk.47.attn_kv_b.weight Block 47 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
790 blk.47.attn_norm.weight Block 47 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
791 blk.47.attn_output.weight Block 47 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
792 blk.47.attn_q_a.weight Block 47 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
793 blk.47.attn_q_a_norm.weight Block 47 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
794 blk.47.attn_q_b.weight Block 47 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
795 blk.47.exp_probs_b.bias Block 47 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
796 blk.47.ffn_down_exps.weight Block 47 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
797 blk.47.ffn_down_shexp.weight Block 47 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
798 blk.47.ffn_gate_exps.weight Block 47 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
799 blk.47.ffn_gate_inp.weight Block 47 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
800 blk.47.ffn_gate_shexp.weight Block 47 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
801 blk.47.ffn_norm.weight Block 47 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
802 blk.47.ffn_up_exps.weight Block 47 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
803 blk.47.ffn_up_shexp.weight Block 47 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.47: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 48 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
804 blk.48.attn_kv_a_mqa.weight Block 48 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
805 blk.48.attn_kv_a_norm.weight Block 48 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
806 blk.48.attn_kv_b.weight Block 48 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
807 blk.48.attn_norm.weight Block 48 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
808 blk.48.attn_output.weight Block 48 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
809 blk.48.attn_q_a.weight Block 48 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
810 blk.48.attn_q_a_norm.weight Block 48 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
811 blk.48.attn_q_b.weight Block 48 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
812 blk.48.exp_probs_b.bias Block 48 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
813 blk.48.ffn_down_exps.weight Block 48 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
814 blk.48.ffn_down_shexp.weight Block 48 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
815 blk.48.ffn_gate_exps.weight Block 48 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
816 blk.48.ffn_gate_inp.weight Block 48 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
817 blk.48.ffn_gate_shexp.weight Block 48 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
818 blk.48.ffn_norm.weight Block 48 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
819 blk.48.ffn_up_exps.weight Block 48 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
820 blk.48.ffn_up_shexp.weight Block 48 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.48: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 49 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
821 blk.49.attn_kv_a_mqa.weight Block 49 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
822 blk.49.attn_kv_a_norm.weight Block 49 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
823 blk.49.attn_kv_b.weight Block 49 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
824 blk.49.attn_norm.weight Block 49 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
825 blk.49.attn_output.weight Block 49 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
826 blk.49.attn_q_a.weight Block 49 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
827 blk.49.attn_q_a_norm.weight Block 49 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
828 blk.49.attn_q_b.weight Block 49 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
829 blk.49.exp_probs_b.bias Block 49 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
830 blk.49.ffn_down_exps.weight Block 49 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
831 blk.49.ffn_down_shexp.weight Block 49 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
832 blk.49.ffn_gate_exps.weight Block 49 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
833 blk.49.ffn_gate_inp.weight Block 49 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
834 blk.49.ffn_gate_shexp.weight Block 49 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
835 blk.49.ffn_norm.weight Block 49 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
836 blk.49.ffn_up_exps.weight Block 49 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
837 blk.49.ffn_up_shexp.weight Block 49 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.49: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 50 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
838 blk.50.attn_kv_a_mqa.weight Block 50 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
839 blk.50.attn_kv_a_norm.weight Block 50 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
840 blk.50.attn_kv_b.weight Block 50 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
841 blk.50.attn_norm.weight Block 50 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
842 blk.50.attn_output.weight Block 50 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
843 blk.50.attn_q_a.weight Block 50 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
844 blk.50.attn_q_a_norm.weight Block 50 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
845 blk.50.attn_q_b.weight Block 50 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
846 blk.50.exp_probs_b.bias Block 50 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
847 blk.50.ffn_down_exps.weight Block 50 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
848 blk.50.ffn_down_shexp.weight Block 50 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
849 blk.50.ffn_gate_exps.weight Block 50 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
850 blk.50.ffn_gate_inp.weight Block 50 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
851 blk.50.ffn_gate_shexp.weight Block 50 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
852 blk.50.ffn_norm.weight Block 50 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
853 blk.50.ffn_up_exps.weight Block 50 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
854 blk.50.ffn_up_shexp.weight Block 50 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.50: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 51 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
855 blk.51.attn_kv_a_mqa.weight Block 51 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
856 blk.51.attn_kv_a_norm.weight Block 51 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
857 blk.51.attn_kv_b.weight Block 51 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
858 blk.51.attn_norm.weight Block 51 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
859 blk.51.attn_output.weight Block 51 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
860 blk.51.attn_q_a.weight Block 51 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
861 blk.51.attn_q_a_norm.weight Block 51 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
862 blk.51.attn_q_b.weight Block 51 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
863 blk.51.exp_probs_b.bias Block 51 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
864 blk.51.ffn_down_exps.weight Block 51 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
865 blk.51.ffn_down_shexp.weight Block 51 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
866 blk.51.ffn_gate_exps.weight Block 51 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
867 blk.51.ffn_gate_inp.weight Block 51 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
868 blk.51.ffn_gate_shexp.weight Block 51 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
869 blk.51.ffn_norm.weight Block 51 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
870 blk.51.ffn_up_exps.weight Block 51 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
871 blk.51.ffn_up_shexp.weight Block 51 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.51: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 52 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
872 blk.52.attn_kv_a_mqa.weight Block 52 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
873 blk.52.attn_kv_a_norm.weight Block 52 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
874 blk.52.attn_kv_b.weight Block 52 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
875 blk.52.attn_norm.weight Block 52 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
876 blk.52.attn_output.weight Block 52 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
877 blk.52.attn_q_a.weight Block 52 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
878 blk.52.attn_q_a_norm.weight Block 52 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
879 blk.52.attn_q_b.weight Block 52 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
880 blk.52.exp_probs_b.bias Block 52 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
881 blk.52.ffn_down_exps.weight Block 52 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
882 blk.52.ffn_down_shexp.weight Block 52 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
883 blk.52.ffn_gate_exps.weight Block 52 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
884 blk.52.ffn_gate_inp.weight Block 52 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
885 blk.52.ffn_gate_shexp.weight Block 52 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
886 blk.52.ffn_norm.weight Block 52 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
887 blk.52.ffn_up_exps.weight Block 52 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
888 blk.52.ffn_up_shexp.weight Block 52 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.52: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 53 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
889 blk.53.attn_kv_a_mqa.weight Block 53 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
890 blk.53.attn_kv_a_norm.weight Block 53 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
891 blk.53.attn_kv_b.weight Block 53 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
892 blk.53.attn_norm.weight Block 53 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
893 blk.53.attn_output.weight Block 53 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
894 blk.53.attn_q_a.weight Block 53 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
895 blk.53.attn_q_a_norm.weight Block 53 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
896 blk.53.attn_q_b.weight Block 53 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
897 blk.53.exp_probs_b.bias Block 53 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
898 blk.53.ffn_down_exps.weight Block 53 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
899 blk.53.ffn_down_shexp.weight Block 53 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
900 blk.53.ffn_gate_exps.weight Block 53 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
901 blk.53.ffn_gate_inp.weight Block 53 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
902 blk.53.ffn_gate_shexp.weight Block 53 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
903 blk.53.ffn_norm.weight Block 53 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
904 blk.53.ffn_up_exps.weight Block 53 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
905 blk.53.ffn_up_shexp.weight Block 53 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.53: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 54 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
906 blk.54.attn_kv_a_mqa.weight Block 54 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
907 blk.54.attn_kv_a_norm.weight Block 54 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
908 blk.54.attn_kv_b.weight Block 54 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
909 blk.54.attn_norm.weight Block 54 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
910 blk.54.attn_output.weight Block 54 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
911 blk.54.attn_q_a.weight Block 54 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
912 blk.54.attn_q_a_norm.weight Block 54 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
913 blk.54.attn_q_b.weight Block 54 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
914 blk.54.exp_probs_b.bias Block 54 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
915 blk.54.ffn_down_exps.weight Block 54 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
916 blk.54.ffn_down_shexp.weight Block 54 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
917 blk.54.ffn_gate_exps.weight Block 54 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
918 blk.54.ffn_gate_inp.weight Block 54 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
919 blk.54.ffn_gate_shexp.weight Block 54 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
920 blk.54.ffn_norm.weight Block 54 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
921 blk.54.ffn_up_exps.weight Block 54 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
922 blk.54.ffn_up_shexp.weight Block 54 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.54: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 55 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
923 blk.55.attn_kv_a_mqa.weight Block 55 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
924 blk.55.attn_kv_a_norm.weight Block 55 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
925 blk.55.attn_kv_b.weight Block 55 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
926 blk.55.attn_norm.weight Block 55 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
927 blk.55.attn_output.weight Block 55 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
928 blk.55.attn_q_a.weight Block 55 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
929 blk.55.attn_q_a_norm.weight Block 55 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
930 blk.55.attn_q_b.weight Block 55 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
931 blk.55.exp_probs_b.bias Block 55 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
932 blk.55.ffn_down_exps.weight Block 55 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
933 blk.55.ffn_down_shexp.weight Block 55 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
934 blk.55.ffn_gate_exps.weight Block 55 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
935 blk.55.ffn_gate_inp.weight Block 55 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
936 blk.55.ffn_gate_shexp.weight Block 55 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
937 blk.55.ffn_norm.weight Block 55 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
938 blk.55.ffn_up_exps.weight Block 55 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
939 blk.55.ffn_up_shexp.weight Block 55 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.55: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 56 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
940 blk.56.attn_kv_a_mqa.weight Block 56 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
941 blk.56.attn_kv_a_norm.weight Block 56 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
942 blk.56.attn_kv_b.weight Block 56 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
943 blk.56.attn_norm.weight Block 56 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
944 blk.56.attn_output.weight Block 56 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
945 blk.56.attn_q_a.weight Block 56 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
946 blk.56.attn_q_a_norm.weight Block 56 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
947 blk.56.attn_q_b.weight Block 56 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
948 blk.56.exp_probs_b.bias Block 56 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
949 blk.56.ffn_down_exps.weight Block 56 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
950 blk.56.ffn_down_shexp.weight Block 56 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
951 blk.56.ffn_gate_exps.weight Block 56 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
952 blk.56.ffn_gate_inp.weight Block 56 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
953 blk.56.ffn_gate_shexp.weight Block 56 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
954 blk.56.ffn_norm.weight Block 56 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
955 blk.56.ffn_up_exps.weight Block 56 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
956 blk.56.ffn_up_shexp.weight Block 56 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.56: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 57 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
957 blk.57.attn_kv_a_mqa.weight Block 57 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
958 blk.57.attn_kv_a_norm.weight Block 57 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
959 blk.57.attn_kv_b.weight Block 57 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
960 blk.57.attn_norm.weight Block 57 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
961 blk.57.attn_output.weight Block 57 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
962 blk.57.attn_q_a.weight Block 57 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
963 blk.57.attn_q_a_norm.weight Block 57 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
964 blk.57.attn_q_b.weight Block 57 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
965 blk.57.exp_probs_b.bias Block 57 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
966 blk.57.ffn_down_exps.weight Block 57 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
967 blk.57.ffn_down_shexp.weight Block 57 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
968 blk.57.ffn_gate_exps.weight Block 57 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
969 blk.57.ffn_gate_inp.weight Block 57 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
970 blk.57.ffn_gate_shexp.weight Block 57 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
971 blk.57.ffn_norm.weight Block 57 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
972 blk.57.ffn_up_exps.weight Block 57 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
973 blk.57.ffn_up_shexp.weight Block 57 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.57: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 58 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
974 blk.58.attn_kv_a_mqa.weight Block 58 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
975 blk.58.attn_kv_a_norm.weight Block 58 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
976 blk.58.attn_kv_b.weight Block 58 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
977 blk.58.attn_norm.weight Block 58 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
978 blk.58.attn_output.weight Block 58 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
979 blk.58.attn_q_a.weight Block 58 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
980 blk.58.attn_q_a_norm.weight Block 58 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
981 blk.58.attn_q_b.weight Block 58 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
982 blk.58.exp_probs_b.bias Block 58 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
983 blk.58.ffn_down_exps.weight Block 58 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
984 blk.58.ffn_down_shexp.weight Block 58 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
985 blk.58.ffn_gate_exps.weight Block 58 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
986 blk.58.ffn_gate_inp.weight Block 58 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
987 blk.58.ffn_gate_shexp.weight Block 58 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
988 blk.58.ffn_norm.weight Block 58 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
989 blk.58.ffn_up_exps.weight Block 58 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
990 blk.58.ffn_up_shexp.weight Block 58 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.58: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 59 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
991 blk.59.attn_kv_a_mqa.weight Block 59 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
992 blk.59.attn_kv_a_norm.weight Block 59 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
993 blk.59.attn_kv_b.weight Block 59 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
994 blk.59.attn_norm.weight Block 59 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
995 blk.59.attn_output.weight Block 59 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
996 blk.59.attn_q_a.weight Block 59 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
997 blk.59.attn_q_a_norm.weight Block 59 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
998 blk.59.attn_q_b.weight Block 59 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
999 blk.59.exp_probs_b.bias Block 59 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
1000 blk.59.ffn_down_exps.weight Block 59 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
1001 blk.59.ffn_down_shexp.weight Block 59 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
1002 blk.59.ffn_gate_exps.weight Block 59 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
1003 blk.59.ffn_gate_inp.weight Block 59 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
1004 blk.59.ffn_gate_shexp.weight Block 59 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
1005 blk.59.ffn_norm.weight Block 59 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
1006 blk.59.ffn_up_exps.weight Block 59 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
1007 blk.59.ffn_up_shexp.weight Block 59 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.59: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Block 60 Tensor Group : ~12B Elements

T_ID Tensor Layer Name Human Friendly Tensor Layer Name Elements Shape Type
1008 blk.60.attn_kv_a_mqa.weight Block 60 Attn_Kv_A_Mqa (W) ( ~4M) 4128768 7168 x 576 x 1 x 1 IQ2_XS
1009 blk.60.attn_kv_a_norm.weight Block 60 Attn_Kv_A_Norm (W) ( 512) 512 512 x 1 x 1 x 1 F32
1010 blk.60.attn_kv_b.weight Block 60 Attn_Kv_B (W) ( ~17M) 16777216 512 x 32768 x 1 x 1 IQ2_XS
1011 blk.60.attn_norm.weight Block 60 Attention Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
1012 blk.60.attn_output.weight Block 60 Attention Output (W) (~117M) 117440512 16384 x 7168 x 1 x 1 IQ3_S
1013 blk.60.attn_q_a.weight Block 60 Attn_Q_A (W) ( ~11M) 11010048 7168 x 1536 x 1 x 1 IQ2_XS
1014 blk.60.attn_q_a_norm.weight Block 60 Attn_Q_A_Norm (W) ( ~2K) 1536 1536 x 1 x 1 x 1 F32
1015 blk.60.attn_q_b.weight Block 60 Attn_Q_B (W) ( ~38M) 37748736 1536 x 24576 x 1 x 1 IQ2_XS
1016 blk.60.exp_probs_b.bias Block 60 Exp_Probs_B (B) ( 256) 256 256 x 1 x 1 x 1 F32
1017 blk.60.ffn_down_exps.weight Block 60 Ffn_Down_Exps (W) ( ~4B) 3758096384 2048 x 7168 x 256 x 1 IQ2_XS
1018 blk.60.ffn_down_shexp.weight Block 60 Ffn_Down_Shexp (W) ( ~15M) 14680064 2048 x 7168 x 1 x 1 IQ2_XS
1019 blk.60.ffn_gate_exps.weight Block 60 Ffn_Gate_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
1020 blk.60.ffn_gate_inp.weight Block 60 Expert-Routing Layer For The Feed-Forward Network In Mixture Of Expert Models (W) ( ~2M) 1835008 7168 x 256 x 1 x 1 F32
1021 blk.60.ffn_gate_shexp.weight Block 60 Ffn_Gate_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
1022 blk.60.ffn_norm.weight Block 60 Feed-Forward Network Normalization (W) ( ~7K) 7168 7168 x 1 x 1 x 1 F32
1023 blk.60.ffn_up_exps.weight Block 60 Ffn_Up_Exps (W) ( ~4B) 3758096384 7168 x 2048 x 256 x 1 IQ2_XS
1024 blk.60.ffn_up_shexp.weight Block 60 Ffn_Up_Shexp (W) ( ~15M) 14680064 7168 x 2048 x 1 x 1 IQ2_XS
  • Total elements in blk.60: (~12B) 11507286272
  • Percentage of total elements: 1.71%

Perhaps this precedent was set by

No, it's forced by the 50GB file size limit on hf. I don't know how thebloke split his files, but we split them so you could load mmap the parts directly. I think it's the most common format on hf as well. We tried hard to provide the newer format (even trying to patch gguf-split, but the usage of C++ iostreams makes it pretty much impossible). We simply don't have the resources for this format on most servers.

It's not an issue for most models and users, fortunately, but your use case of course is such an example.

As of now I believe bartowski is using the same smaller quantization across all the layers except keeping token embedding at Q8_0.

How could they even differ? If bartowski's quants are in a nicer format for you, there should not be a reason not to use those. The mix should be the same (unless I am mistaken).

It would print something out like this

Good that nico apparently did it. I could have provided the info from the quants (as json), but not the extra info the tool prints.

Heya really appreciate it both of you! Okay yes I can see exactly which quant was used for each layer completely! That helps me to compare across the available quants in this size class. Maybe some time I can try to compare perplexity across each model to get a rough estimate of "Perplexity per GiB" or something haha...

How could they even differ? If bartowski's quants are in a nicer format for you, there should not be a reason not to use those. The mix should be the same (unless I am mistaken).

So I made a comparison chart here and yes your mix has the same ratios as bartowski. You also seem to use a high quality imatrix mix. unsloth has a custom fork they use which changes a few layers to be higher quality. I'm using ik_llama.cpp fork and a convenient bash script to map each layer to a desired quantization level.

I'm still experimenting on how the various mixes perform in terms of quality (perplexity) and speed (llama-bench) for prompt processing and token generation. Amazing how fast this stuff is moving! Thanks for all your help and all the quants!

So I made a comparison chart here

Awesome. Thanks lot for collecting and visualizing all this data!

your mix has the same ratios as bartowski.

That was expected as we use the standardized llama.cpp mix and so does bartowski.

You also seem to use a high quality imatrix mix.

Yes we do. Awesome that you figured this out. Our matrix training is superior compared to bartowski. Our imatrix dataset is around double the size with the first half of it beeing bartowski's imatrix dataset while the other half consists of proprietary high-quality data covering all common use cases of LLMs that are missing in bartowski's imatrix dataset like story writing and roleplay. mradermacher put a lot of effort into creating the best imatrix dataset possible last spring before we scaled up our quantization throughput our current almost industrial scale. We also train our imatrix in F16 for all models other than r1 for which we use Q8 while many other quarters use less percussion for imatrix computation. We are perfectionists and value quality above almost everything. @ubergarm Did you actually measure any real-world difference between our and bartowski's imatrix quants? I don't see you would see one unless you test all kind of different real world use cases unless just having a larger imatrix and so doing more imatrix training has a measurable effect.

I'm still experimenting on how the various mixes perform in terms of quality (perplexity) and speed (llama-bench) for prompt processing and token generation. Amazing how fast this stuff is moving!

I highly recommend you meassure kv-divergence, top token probability and same token probability instead of perplexity to get much better data.

Thanks for all your help and all the quants!

No problem. Glad I was able to help. If you need anything else please just let me know.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment