Spestly commited on
Commit
75c9202
·
verified ·
0 Parent(s):

Duplicate from open-neo/Kyro-n1-7B

Browse files
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - Qwen/Qwen2.5-7B-Instruct
5
+ library_name: transformers
6
+ language:
7
+ - en
8
+ - zh
9
+ - fr
10
+ - es
11
+ - pt
12
+ - de
13
+ - it
14
+ - ru
15
+ - ja
16
+ - ko
17
+ - vi
18
+ - th
19
+ - ar
20
+ - fa
21
+ - he
22
+ - tr
23
+ - cs
24
+ - pl
25
+ - hi
26
+ - bn
27
+ - ur
28
+ - id
29
+ - ms
30
+ - lo
31
+ - my
32
+ - ceb
33
+ - km
34
+ - tl
35
+ - nl
36
+ tags:
37
+ - trl
38
+ - Reasoning
39
+ - open-llm
40
+ - synthetic-data
41
+ - Deepseek-R1
42
+ - Qwen2.5
43
+ - fine-tune
44
+ - unsloth
45
+ - Conversational
46
+ - Agentic
47
+ ---
48
+ # **Kyro-n1: A powerful family of models made for reasoning**
49
+ > [!IMPORTANT]
50
+ > This model uses some features from **AIDC-AI/Marco-o1** tokenizer and this model is a Qwen2.5-7B fine-tune.
51
+
52
+ Kyro-n1 is a lightweight and fast reasoning model based on **Qwen/Qwen2.5-7B-Instruct**. We have further increased the quality of reasoning in certain aspects such as maths and science, but in this version, our main goal was maths and reasoning in general conversations. We intend to expand on this in future models. The whole purpose of Kyro is so that almost every device can run a reasoning model no matter their compute. This is why we are releasing 3B, 7B and 14B variants to achieve this goal.
53
+
54
+ ## **Model Details**
55
+ - Developed by: [Spestly (Open-Neo)](https://x.com/Spestly) & [Kazex (Open-Neo)](https://x.com/32GIGABYTES_YT)
56
+ - Type: Causal Language Models
57
+ - Training Stage: Pretraining & Post-training
58
+ - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
59
+ - Number of Parameters: 7.61B
60
+ - Number of Paramaters (Non-Embedding): 6.53B
61
+ - Number of Layers: 28
62
+ - Number of Attention Heads (GQA): 28 for Q and 4 for KV
63
+ - Context Length: Full 131,072 tokens and generation 8192 tokens
64
+
65
+ ## **Model Downloads**
66
+
67
+ ### Kyro-n1 Models
68
+
69
+ <div align="center">
70
+
71
+ | | **Training Data** | **Params** | **Input modalities** | **Output modalities** | **Context length** | **Download Link** |
72
+ |--------------|------------------------------------|---------|------------------|----------------------|----------------|----------------|
73
+ | **Kyro (text only)** | A new mix of publicly available online data. | **3B** | Multilingual Text | Multilingual Text and code | 128k | [🤗 HuggingFace](https://huggingface.co/open-neo/Kyro-n1-3B) |
74
+ | | | **7B** | Multilingual Text | Multilingual Text and code | 128k | [🤗 HuggingFace](https://huggingface.co/open-neo/Kyro-n1-7B) |
75
+ | | | **14B** | Multilingual Text | Multilingual Text and code | 128k | [🤗 HuggingFace](https://huggingface.co/open-neo/Kyro-n1-14B) |
76
+
77
+ </div>
78
+
79
+ ### Kyro-1 Models
80
+
81
+ ## **Usage**
82
+
83
+ The code of Kyro-n1 (Qwen2.5) has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
84
+
85
+ With `transformers<4.37.0`, you will encounter the following error:
86
+ ```
87
+ KeyError: 'qwen2'
88
+ ```
89
+
90
+ ### **Quickstart**
91
+
92
+ ```python
93
+ from transformers import AutoModelForCausalLM, AutoTokenizer
94
+ model_name = "open-neo/Kyro-n1-7B"
95
+ model = AutoModelForCausalLM.from_pretrained(
96
+ model_name,
97
+ torch_dtype="auto",
98
+ device_map="auto"
99
+ )
100
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
101
+ prompt = "What do you think about CRISPR and its effect on the future of humanity?"
102
+ messages = [
103
+ {"role": "user", "content": prompt}
104
+ ]
105
+ text = tokenizer.apply_chat_template(
106
+ messages,
107
+ tokenize=False,
108
+ add_generation_prompt=True
109
+ )
110
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
111
+ generated_ids = model.generate(
112
+ **model_inputs,
113
+ max_new_tokens=2048
114
+ )
115
+ generated_ids = [
116
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
117
+ ]
118
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
119
+ ```
120
+
121
+ ## Citation
122
+
123
+ If you find our work helpful, feel free to give us a cite.
124
+
125
+ ```
126
+ @misc{qwen2.5,
127
+ title = {Qwen2.5: A Party of Foundation Models},
128
+ url = {https://qwenlm.github.io/blog/qwen2.5/},
129
+ author = {Qwen Team},
130
+ month = {September},
131
+ year = {2024}
132
+ }
133
+ @article{qwen2,
134
+ title={Qwen2 Technical Report},
135
+ author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
136
+ journal={arXiv preprint arXiv:2407.10671},
137
+ year={2024}
138
+ }
139
+ @misc{kyro-n1,
140
+ title={Kyro-n1: A powerful family of models made for reasoning},
141
+ author={Aayan Mishra and Krish Thumar},
142
+ howpublished={https://huggingface.co/collections/open-neo/kyro-n1-67ab2e7bbc76a9aab3030c21},
143
+ year={2025}
144
+ }
145
+ ```
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit",
3
+ "architectures": [
4
+ "Qwen2ForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "eos_token_id": 151645,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 3584,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 18944,
12
+ "max_position_embeddings": 32768,
13
+ "max_window_layers": 28,
14
+ "model_type": "qwen2",
15
+ "num_attention_heads": 28,
16
+ "num_hidden_layers": 28,
17
+ "num_key_value_heads": 4,
18
+ "pad_token_id": 151654,
19
+ "rms_norm_eps": 1e-06,
20
+ "rope_scaling": null,
21
+ "rope_theta": 1000000.0,
22
+ "sliding_window": null,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "float16",
25
+ "transformers_version": "4.48.3",
26
+ "unsloth_fixed": true,
27
+ "unsloth_version": "2025.2.9",
28
+ "use_cache": true,
29
+ "use_sliding_window": false,
30
+ "vocab_size": 152064
31
+ }
generation_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "max_length": 32768,
9
+ "pad_token_id": 151654,
10
+ "repetition_penalty": 1.05,
11
+ "temperature": 0.7,
12
+ "top_k": 20,
13
+ "top_p": 0.8,
14
+ "transformers_version": "4.48.3"
15
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model-00001-of-00004.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54820097a442a8f04bfc58c966214e9ce3c6573d1ffc64c4aaa5f01874d3001e
3
+ size 4877685654
pytorch_model-00002-of-00004.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0749547ade7a6e0c88801e7747b48d445e3ff3446ba1fa8775386c88bbc44c39
3
+ size 4932779624
pytorch_model-00003-of-00004.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:864ddf5652d571bf2578810584bdcff38b6ad634ba9956214a2e5ad7e7fecb65
3
+ size 4330891098
pytorch_model-00004-of-00004.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d0aafa7a2169dd45ef9a10fe2cff298d57c97e347cf27d037fdbcc6b08d9f92
3
+ size 1089996165
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,346 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 15231233024
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00004-of-00004.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00004.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
13
+ "model.layers.0.self_attn.k_proj.bias": "pytorch_model-00001-of-00004.bin",
14
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
15
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
16
+ "model.layers.0.self_attn.q_proj.bias": "pytorch_model-00001-of-00004.bin",
17
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
18
+ "model.layers.0.self_attn.v_proj.bias": "pytorch_model-00001-of-00004.bin",
19
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
20
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
21
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
22
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
23
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
24
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
25
+ "model.layers.1.self_attn.k_proj.bias": "pytorch_model-00001-of-00004.bin",
26
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
27
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
28
+ "model.layers.1.self_attn.q_proj.bias": "pytorch_model-00001-of-00004.bin",
29
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
30
+ "model.layers.1.self_attn.v_proj.bias": "pytorch_model-00001-of-00004.bin",
31
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
32
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
33
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
34
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
35
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
36
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
37
+ "model.layers.10.self_attn.k_proj.bias": "pytorch_model-00002-of-00004.bin",
38
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
39
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
40
+ "model.layers.10.self_attn.q_proj.bias": "pytorch_model-00002-of-00004.bin",
41
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
42
+ "model.layers.10.self_attn.v_proj.bias": "pytorch_model-00002-of-00004.bin",
43
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
44
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
45
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
46
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
47
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
48
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
49
+ "model.layers.11.self_attn.k_proj.bias": "pytorch_model-00002-of-00004.bin",
50
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
51
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
52
+ "model.layers.11.self_attn.q_proj.bias": "pytorch_model-00002-of-00004.bin",
53
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
54
+ "model.layers.11.self_attn.v_proj.bias": "pytorch_model-00002-of-00004.bin",
55
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
56
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
57
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
58
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
59
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
60
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
61
+ "model.layers.12.self_attn.k_proj.bias": "pytorch_model-00002-of-00004.bin",
62
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
63
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
64
+ "model.layers.12.self_attn.q_proj.bias": "pytorch_model-00002-of-00004.bin",
65
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
66
+ "model.layers.12.self_attn.v_proj.bias": "pytorch_model-00002-of-00004.bin",
67
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
68
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
69
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
70
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
71
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
72
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
73
+ "model.layers.13.self_attn.k_proj.bias": "pytorch_model-00002-of-00004.bin",
74
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
75
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
76
+ "model.layers.13.self_attn.q_proj.bias": "pytorch_model-00002-of-00004.bin",
77
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
78
+ "model.layers.13.self_attn.v_proj.bias": "pytorch_model-00002-of-00004.bin",
79
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
80
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
81
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
82
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
83
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
84
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
85
+ "model.layers.14.self_attn.k_proj.bias": "pytorch_model-00002-of-00004.bin",
86
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
87
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
88
+ "model.layers.14.self_attn.q_proj.bias": "pytorch_model-00002-of-00004.bin",
89
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
90
+ "model.layers.14.self_attn.v_proj.bias": "pytorch_model-00002-of-00004.bin",
91
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
92
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
93
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
94
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
95
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
96
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
97
+ "model.layers.15.self_attn.k_proj.bias": "pytorch_model-00002-of-00004.bin",
98
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
99
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
100
+ "model.layers.15.self_attn.q_proj.bias": "pytorch_model-00002-of-00004.bin",
101
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
102
+ "model.layers.15.self_attn.v_proj.bias": "pytorch_model-00002-of-00004.bin",
103
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
104
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
105
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
106
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
107
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
108
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
109
+ "model.layers.16.self_attn.k_proj.bias": "pytorch_model-00002-of-00004.bin",
110
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
111
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
112
+ "model.layers.16.self_attn.q_proj.bias": "pytorch_model-00002-of-00004.bin",
113
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
114
+ "model.layers.16.self_attn.v_proj.bias": "pytorch_model-00002-of-00004.bin",
115
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
116
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
117
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
118
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
119
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
120
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
121
+ "model.layers.17.self_attn.k_proj.bias": "pytorch_model-00002-of-00004.bin",
122
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
123
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
124
+ "model.layers.17.self_attn.q_proj.bias": "pytorch_model-00002-of-00004.bin",
125
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
126
+ "model.layers.17.self_attn.v_proj.bias": "pytorch_model-00002-of-00004.bin",
127
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
128
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
129
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
130
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
131
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
132
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
133
+ "model.layers.18.self_attn.k_proj.bias": "pytorch_model-00002-of-00004.bin",
134
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
135
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
136
+ "model.layers.18.self_attn.q_proj.bias": "pytorch_model-00002-of-00004.bin",
137
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
138
+ "model.layers.18.self_attn.v_proj.bias": "pytorch_model-00002-of-00004.bin",
139
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
140
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
141
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
142
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
143
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
144
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
145
+ "model.layers.19.self_attn.k_proj.bias": "pytorch_model-00003-of-00004.bin",
146
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
147
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
148
+ "model.layers.19.self_attn.q_proj.bias": "pytorch_model-00003-of-00004.bin",
149
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
150
+ "model.layers.19.self_attn.v_proj.bias": "pytorch_model-00003-of-00004.bin",
151
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
152
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
153
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
154
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
155
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
156
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
157
+ "model.layers.2.self_attn.k_proj.bias": "pytorch_model-00001-of-00004.bin",
158
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
159
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
160
+ "model.layers.2.self_attn.q_proj.bias": "pytorch_model-00001-of-00004.bin",
161
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
162
+ "model.layers.2.self_attn.v_proj.bias": "pytorch_model-00001-of-00004.bin",
163
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
164
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
165
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
166
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
167
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
168
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
169
+ "model.layers.20.self_attn.k_proj.bias": "pytorch_model-00003-of-00004.bin",
170
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
171
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
172
+ "model.layers.20.self_attn.q_proj.bias": "pytorch_model-00003-of-00004.bin",
173
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
174
+ "model.layers.20.self_attn.v_proj.bias": "pytorch_model-00003-of-00004.bin",
175
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
176
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
177
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
178
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
179
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
180
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
181
+ "model.layers.21.self_attn.k_proj.bias": "pytorch_model-00003-of-00004.bin",
182
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
183
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
184
+ "model.layers.21.self_attn.q_proj.bias": "pytorch_model-00003-of-00004.bin",
185
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
186
+ "model.layers.21.self_attn.v_proj.bias": "pytorch_model-00003-of-00004.bin",
187
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
188
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
189
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
190
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
191
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
192
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
193
+ "model.layers.22.self_attn.k_proj.bias": "pytorch_model-00003-of-00004.bin",
194
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
195
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
196
+ "model.layers.22.self_attn.q_proj.bias": "pytorch_model-00003-of-00004.bin",
197
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
198
+ "model.layers.22.self_attn.v_proj.bias": "pytorch_model-00003-of-00004.bin",
199
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
200
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
201
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
202
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
203
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
204
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
205
+ "model.layers.23.self_attn.k_proj.bias": "pytorch_model-00003-of-00004.bin",
206
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
207
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
208
+ "model.layers.23.self_attn.q_proj.bias": "pytorch_model-00003-of-00004.bin",
209
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
210
+ "model.layers.23.self_attn.v_proj.bias": "pytorch_model-00003-of-00004.bin",
211
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
212
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
213
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
214
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
215
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
216
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
217
+ "model.layers.24.self_attn.k_proj.bias": "pytorch_model-00003-of-00004.bin",
218
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
219
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
220
+ "model.layers.24.self_attn.q_proj.bias": "pytorch_model-00003-of-00004.bin",
221
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
222
+ "model.layers.24.self_attn.v_proj.bias": "pytorch_model-00003-of-00004.bin",
223
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
224
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
225
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
226
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
227
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
228
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
229
+ "model.layers.25.self_attn.k_proj.bias": "pytorch_model-00003-of-00004.bin",
230
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
231
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
232
+ "model.layers.25.self_attn.q_proj.bias": "pytorch_model-00003-of-00004.bin",
233
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
234
+ "model.layers.25.self_attn.v_proj.bias": "pytorch_model-00003-of-00004.bin",
235
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
236
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
237
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
238
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
239
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
240
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
241
+ "model.layers.26.self_attn.k_proj.bias": "pytorch_model-00003-of-00004.bin",
242
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
243
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
244
+ "model.layers.26.self_attn.q_proj.bias": "pytorch_model-00003-of-00004.bin",
245
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
246
+ "model.layers.26.self_attn.v_proj.bias": "pytorch_model-00003-of-00004.bin",
247
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
248
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00004.bin",
249
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00004.bin",
250
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00004.bin",
251
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00004.bin",
252
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00004.bin",
253
+ "model.layers.27.self_attn.k_proj.bias": "pytorch_model-00003-of-00004.bin",
254
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00004.bin",
255
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00004.bin",
256
+ "model.layers.27.self_attn.q_proj.bias": "pytorch_model-00003-of-00004.bin",
257
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00004.bin",
258
+ "model.layers.27.self_attn.v_proj.bias": "pytorch_model-00003-of-00004.bin",
259
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00004.bin",
260
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
261
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
262
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
263
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
264
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
265
+ "model.layers.3.self_attn.k_proj.bias": "pytorch_model-00001-of-00004.bin",
266
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
267
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
268
+ "model.layers.3.self_attn.q_proj.bias": "pytorch_model-00001-of-00004.bin",
269
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
270
+ "model.layers.3.self_attn.v_proj.bias": "pytorch_model-00001-of-00004.bin",
271
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
272
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
273
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
274
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
275
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
276
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
277
+ "model.layers.4.self_attn.k_proj.bias": "pytorch_model-00001-of-00004.bin",
278
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
279
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
280
+ "model.layers.4.self_attn.q_proj.bias": "pytorch_model-00001-of-00004.bin",
281
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
282
+ "model.layers.4.self_attn.v_proj.bias": "pytorch_model-00001-of-00004.bin",
283
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
284
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
285
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
286
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
287
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
288
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
289
+ "model.layers.5.self_attn.k_proj.bias": "pytorch_model-00001-of-00004.bin",
290
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
291
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
292
+ "model.layers.5.self_attn.q_proj.bias": "pytorch_model-00001-of-00004.bin",
293
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
294
+ "model.layers.5.self_attn.v_proj.bias": "pytorch_model-00001-of-00004.bin",
295
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
296
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
297
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
298
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
299
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
300
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
301
+ "model.layers.6.self_attn.k_proj.bias": "pytorch_model-00001-of-00004.bin",
302
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
303
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
304
+ "model.layers.6.self_attn.q_proj.bias": "pytorch_model-00001-of-00004.bin",
305
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
306
+ "model.layers.6.self_attn.v_proj.bias": "pytorch_model-00001-of-00004.bin",
307
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
308
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00004.bin",
309
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00004.bin",
310
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00004.bin",
311
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00004.bin",
312
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00004.bin",
313
+ "model.layers.7.self_attn.k_proj.bias": "pytorch_model-00001-of-00004.bin",
314
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
315
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
316
+ "model.layers.7.self_attn.q_proj.bias": "pytorch_model-00001-of-00004.bin",
317
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
318
+ "model.layers.7.self_attn.v_proj.bias": "pytorch_model-00001-of-00004.bin",
319
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
320
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
321
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
322
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
323
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
324
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
325
+ "model.layers.8.self_attn.k_proj.bias": "pytorch_model-00001-of-00004.bin",
326
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00004.bin",
327
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00004.bin",
328
+ "model.layers.8.self_attn.q_proj.bias": "pytorch_model-00001-of-00004.bin",
329
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00004.bin",
330
+ "model.layers.8.self_attn.v_proj.bias": "pytorch_model-00001-of-00004.bin",
331
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00004.bin",
332
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00004.bin",
333
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00004.bin",
334
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00004.bin",
335
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00004.bin",
336
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00004.bin",
337
+ "model.layers.9.self_attn.k_proj.bias": "pytorch_model-00002-of-00004.bin",
338
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00004.bin",
339
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00004.bin",
340
+ "model.layers.9.self_attn.q_proj.bias": "pytorch_model-00002-of-00004.bin",
341
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00004.bin",
342
+ "model.layers.9.self_attn.v_proj.bias": "pytorch_model-00002-of-00004.bin",
343
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00004.bin",
344
+ "model.norm.weight": "pytorch_model-00003-of-00004.bin"
345
+ }
346
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|vision_pad|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ \"<|im_start|>system\\n\\nYou are a well-trained AI assistant. Your name is Kyro-n1, created by Open-Neo.\\n\\n## IMPORTANT!!!!!!!\\nWhen answering questions, your reasoning should be enclosed within <Thought>, and your output should be inside <Output>.\\n<Thought> should be in English whenever possible, but there are two exceptions: one is when quoting from the original text, and the other is when writing mathematical expressions, which should use markdown format. The output inside <Output> should follow the language of the user's input.\\n\\n<|im_end|>\\n\" }}{% endif %}{{ \"<|im_start|>\" + message['role'] + \"\\n\" + message['content'] + \"<|im_end|>\\n\" }}{% endfor %}{% if add_generation_prompt %}{{ \"<|im_start|>assistant\\n\" }}{% endif %}",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 32768,
204
+ "pad_token": "<|vision_pad|>",
205
+ "padding_side": "left",
206
+ "split_special_tokens": false,
207
+ "tokenizer_class": "Qwen2Tokenizer",
208
+ "unk_token": null
209
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff