Holy-fox commited on
Commit
59474cd
·
verified ·
1 Parent(s): afd3e27

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +131 -16
README.md CHANGED
@@ -1,35 +1,150 @@
1
  ---
2
  base_model:
3
  - cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
 
 
 
 
 
 
4
  library_name: transformers
5
  tags:
6
  - mergekit
7
  - merge
8
-
 
 
 
9
  ---
10
- # SKYCAVE-R1-32B-v0.1
11
 
12
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
 
 
 
 
 
 
 
 
13
 
14
- ## Merge Details
15
- ### Merge Method
16
 
17
- This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese](https://huggingface.co/cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese) as a base.
18
 
19
- ### Models Merged
 
20
 
21
- The following models were included in the merge:
22
- * SKYCAVE_element_QR_jp
23
- * SKYCAVE_element_Sky_jp
24
- * SKYCAVE_element_R1_jp_02
25
- * SKYCAVE_element_R1_jp_03
26
- * SKYCAVE_element_R1_jp_01
27
 
28
- ### Configuration
 
 
 
 
 
 
 
 
 
 
29
 
30
- The following YAML configuration was used to produce this model:
 
 
 
 
 
 
 
 
 
 
 
31
 
 
 
 
 
 
32
  ```yaml
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  merge_method: model_stock
34
  base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
35
  models:
@@ -40,4 +155,4 @@ models:
40
  - model: SKYCAVE_element_Sky_jp
41
  dtype: bfloat16
42
  name: SKYCAVE-R1-32B-v0.1
43
- ```
 
1
  ---
2
  base_model:
3
  - cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
4
+ - karakuri-ai/karakuri-lm-32b-thinking-2501-exp
5
+ - Saxo/Linkbricks-Horizon-AI-Japanese-Base-32B
6
+ - FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
7
+ - TeamDelta/ABEJA-Qwen2.5-32B-base-jp-v0.1
8
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
9
+ - NovaSky-AI/Sky-T1-32B-Flash
10
  library_name: transformers
11
  tags:
12
  - mergekit
13
  - merge
14
+ license: apache-2.0
15
+ language:
16
+ - en
17
+ - ja
18
  ---
 
19
 
20
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/65f01b5235c5424c262c8be8/CxkLHJy9597WodmOOlWwc.jpeg)
21
+
22
+ ## 概要
23
+ このモデルは[nitky/RoguePlanet-DeepSeek-R1-Qwen-32B](https://huggingface.co/nitky/RoguePlanet-DeepSeek-R1-Qwen-32B)にインスパイアを受け、作成されたモデルです。
24
+ <think></tnink>タグが出力されることは確認しています。
25
+ 日本語モデルとしてもよい性能を出せることも確認しています。
26
+
27
+ ## How To Use
28
+ ```python
29
+ from transformers import AutoModelForCausalLM, AutoTokenizer
30
 
31
+ model_name = "DataPilot/SKYCAVE-R1-32B-v0.1"
 
32
 
33
+ tokenizer_name = ""
34
 
35
+ if tokenizer_name == "":
36
+ tokenizer_name = model_name
37
 
38
+ model = AutoModelForCausalLM.from_pretrained(
39
+ model_name,
40
+ torch_dtype="auto",
41
+ device_map="auto"
42
+ )
43
+ tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
44
 
45
+ prompt = "メタデータを解析し、自己進化をするAIであるnurture intelligenceが実現した未来の日常生活の姿を教えてください。"
46
+ messages = [
47
+ {"role": "system", "content": "あなたは優秀な日本語アシスタントであり長考モデルです。問題解決をするための思考をした上で回答を行ってください。"},
48
+ {"role": "user", "content": prompt}
49
+ ]
50
+ text = tokenizer.apply_chat_template(
51
+ messages,
52
+ tokenize=False,
53
+ add_generation_prompt=True
54
+ )
55
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
56
 
57
+ generated_ids = model.generate(
58
+ **model_inputs,
59
+ max_new_tokens=4096
60
+ )
61
+ generated_ids = [
62
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
63
+ ]
64
+
65
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
66
+
67
+ print(response)もで売®
68
+ ```
69
 
70
+ ## 謝辞
71
+ このモデルの作成者皆様と、計算資源を貸していただいたVOLTMINDに感謝します。
72
+ モデル作成にアドバイスをしていただいたnitkyさんにも感謝申し上げます。
73
+
74
+ ## mergekit config
75
  ```yaml
76
+ merge_method: slerp
77
+ base_model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
78
+ models:
79
+ - model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
80
+ - model: Saxo/Linkbricks-Horizon-AI-Japanese-Base-32B
81
+ parameters:
82
+ t: 0.35
83
+ dtype: bfloat16
84
+ name: SKYCAVE_element_QwQ_jp
85
+
86
+ ---
87
+
88
+ merge_method: slerp
89
+ base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
90
+ models:
91
+ - model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
92
+ - model: SKYCAVE_element_QwQ_jp
93
+ parameters:
94
+ t: 0.4
95
+ dtype: bfloat16
96
+ name: SKYCAVE_element_QR_jp
97
+
98
+ ---
99
+
100
+ merge_method: slerp
101
+ base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
102
+ models:
103
+ - model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
104
+ - model: FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
105
+ parameters:
106
+ t: 0.5
107
+ dtype: bfloat16
108
+ name: SKYCAVE_element_R1_jp_01
109
+
110
+ ---
111
+
112
+ merge_method: slerp
113
+ base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
114
+ models:
115
+ - model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
116
+ - model: TeamDelta/ABEJA-Qwen2.5-32B-base-jp-v0.1
117
+ parameters:
118
+ t: 0.5
119
+ dtype: bfloat16
120
+ name: SKYCAVE_element_R1_jp_02
121
+
122
+ ---
123
+
124
+ merge_method: slerp
125
+ base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
126
+ models:
127
+ - model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
128
+ - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
129
+ parameters:
130
+ t: 0.6
131
+ dtype: bfloat16
132
+ name: SKYCAVE_element_R1_jp_03
133
+
134
+ ---
135
+
136
+ merge_method: slerp
137
+ base_model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
138
+ models:
139
+ - model: karakuri-ai/karakuri-lm-32b-thinking-2501-exp
140
+ - model: NovaSky-AI/Sky-T1-32B-Flash
141
+ parameters:
142
+ t: 0.4
143
+ dtype: bfloat16
144
+ name: SKYCAVE_element_Sky_jp
145
+
146
+ ---
147
+
148
  merge_method: model_stock
149
  base_model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
150
  models:
 
155
  - model: SKYCAVE_element_Sky_jp
156
  dtype: bfloat16
157
  name: SKYCAVE-R1-32B-v0.1
158
+ ```