roylin1003 commited on
Commit
16a62f9
·
verified ·
1 Parent(s): 8508770

Upload LoRA adapter files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,226 +1,202 @@
1
  ---
2
- license: apache-2.0
3
  base_model: Qwen/Qwen2.5-7B-Instruct
4
- library_name: transformers
5
- tags:
6
- - translation
7
- - chinese
8
- - indonesian
9
- - qwen
10
- - lora
11
- - fine-tuned
12
- - traditional-chinese
13
- - news
14
 
15
- model-index:
16
- - name: Royal_ZhTW-ID_finetuned_101
17
- results: []
18
 
19
- language:
20
- - zh
21
- - id
22
 
23
 
24
 
25
- pipeline_tag: text2text-generation
26
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- # Qwen2.5-7B Traditional Chinese Indonesian Translation Model
29
 
30
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) specifically optimized for Traditional Chinese ↔ Indonesian translation tasks.
31
 
32
- ## Model Description
33
 
34
- This model specializes in translating between Traditional Chinese and Indonesian, trained on Taiwan news corpus. It's particularly effective for news, formal documents, and general text translation between these language pairs.
35
 
36
- ### Key Features
37
- - 🌏 **Bidirectional Translation**: Traditional Chinese ↔ Indonesian
38
- - 📰 **News Domain Optimized**: Trained on Taiwan news corpus
39
- - ⚡ **Efficient Fine-tuning**: Uses LoRA (Low-Rank Adaptation) for faster training
40
- - 🎯 **Specialized Vocabulary**: Enhanced for Taiwan-specific terms and Indonesian equivalents
 
 
41
 
42
  ## Training Details
43
 
44
- ### Base Model
45
- - **Base Model**: Qwen/Qwen2.5-7B-Instruct
46
- - **Model Type**: Causal Language Model with Translation Capabilities
47
-
48
- ### Fine-tuning Configuration
49
- - **Method**: LoRA (Low-Rank Adaptation)
50
- - **LoRA Rank**: 8
51
- - **LoRA Alpha**: 32
52
- - **Learning Rate**: 2e-4
53
- - **Training Epochs**: 3
54
- - **Max Samples**: 1,000 (initial validation)
55
- - **Template**: Qwen conversation format
56
-
57
- ### Dataset
58
- - **Source**: Taiwan NEWS in Traditional Chinese with Indonesian translations
59
- - **Editor**: Chang, Yo Han
60
- - **Domain**: News articles and formal text
61
- - **Language Pair**: Traditional Chinese (zh-TW) ↔ Indonesian (id)
62
- - **Note**: Dataset is proprietary and not publicly available on HuggingFace
63
-
64
- ## Usage
65
-
66
- ### Installation
67
- ```bash
68
- pip install transformers torch
69
- ```
70
-
71
- ### Basic Usage
72
- ```python
73
- from transformers import AutoModelForCausalLM, AutoTokenizer
74
- import torch
75
-
76
- # Load model and tokenizer
77
- model_name = "roylin1003/Royal_ZhTW-ID_finetuned_101"
78
- model = AutoModelForCausalLM.from_pretrained(
79
- model_name,
80
- torch_dtype=torch.float16,
81
- device_map="auto"
82
- )
83
- tokenizer = AutoTokenizer.from_pretrained(model_name)
84
-
85
- # Translation function
86
- def translate_text(text, source_lang="zh", target_lang="id"):
87
- if source_lang == "zh" and target_lang == "id":
88
- prompt = f"請將以下中文翻譯成印尼文:{text}"
89
- elif source_lang == "id" and target_lang == "zh":
90
- prompt = f"Terjemahkan teks bahasa Indonesia berikut ke bahasa Tionghoa: {text}"
91
-
92
- messages = [
93
- {"role": "user", "content": prompt}
94
- ]
95
-
96
- text = tokenizer.apply_chat_template(
97
- messages,
98
- tokenize=False,
99
- add_generation_prompt=True
100
- )
101
-
102
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
103
-
104
- generated_ids = model.generate(
105
- **model_inputs,
106
- max_new_tokens=512,
107
- do_sample=True,
108
- temperature=0.7,
109
- pad_token_id=tokenizer.eos_token_id
110
- )
111
-
112
- generated_ids = [
113
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
114
- ]
115
-
116
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
117
- return response
118
-
119
- # Example usage
120
- chinese_text = "台灣的科技產業發展迅速,特別是在半導體領域。"
121
- indonesian_translation = translate_text(chinese_text, "zh", "id")
122
- print(f"Chinese: {chinese_text}")
123
- print(f"Indonesian: {indonesian_translation}")
124
-
125
- indonesian_text = "Indonesia adalah negara kepulauan terbesar di dunia."
126
- chinese_translation = translate_text(indonesian_text, "id", "zh")
127
- print(f"Indonesian: {indonesian_text}")
128
- print(f"Chinese: {chinese_translation}")
129
- ```
130
-
131
- ### Advanced Usage with Custom Parameters
132
- ```python
133
- def translate_with_options(text, source_lang="zh", target_lang="id", temperature=0.7, max_tokens=512):
134
- # ... (same setup as above)
135
-
136
- generated_ids = model.generate(
137
- **model_inputs,
138
- max_new_tokens=max_tokens,
139
- do_sample=True,
140
- temperature=temperature,
141
- top_p=0.9,
142
- repetition_penalty=1.1,
143
- pad_token_id=tokenizer.eos_token_id
144
- )
145
-
146
- # ... (same decoding as above)
147
- return response
148
- ```
149
-
150
- ## Model Performance
151
-
152
- ### Training Metrics
153
- - **Training Loss**: Converged after 3 epochs
154
- - **Learning Rate**: 2e-4 with linear decay
155
- - **Batch Size**: Optimized for available GPU memory
156
-
157
- ### Evaluation
158
- This model has been trained on a curated dataset of Taiwan news articles with Indonesian translations. Performance evaluation is ongoing.
159
-
160
- ## Limitations and Considerations
161
-
162
- ### Known Limitations
163
- - **Domain Specificity**: Optimized for news and formal text; may not perform as well on casual conversation
164
- - **Training Data Size**: Initial training used 1,000 samples for quick validation
165
- - **Cultural Context**: May require additional fine-tuning for region-specific terminology
166
-
167
- ### Recommended Use Cases
168
- - 📰 News article translation
169
- - 📄 Formal document translation
170
- - 🏢 Business communication between Taiwan and Indonesia
171
- - 📚 Educational content translation
172
-
173
- ### Not Recommended For
174
- - Real-time conversation (use specialized conversational models)
175
- - Medical or legal documents (requires domain-specific models)
176
- - Creative writing (may lack stylistic nuance)
177
-
178
- ## Training Infrastructure
179
-
180
- ### Hardware Requirements
181
- - **Minimum**: GPU with 16GB VRAM
182
- - **Recommended**: GPU with 24GB+ VRAM for optimal performance
183
- - **Training Time**: Approximately 2-3 hours on modern GPUs
184
-
185
- ### Software Dependencies
186
- ```
187
- transformers>=4.36.0
188
- torch>=2.0.0
189
- peft>=0.7.0
190
- datasets>=2.15.0
191
- ```
192
-
193
- ## Citation
194
-
195
- If you use this model in your research or applications, please cite:
196
-
197
- ```bibtex
198
- @misc{Royal_ZhTW-ID_finetuned_101,
199
- title={Qwen2.5-7B Traditional Chinese-Indonesian Translation Model},
200
- author={Roy Lin},
201
- year={2024},
202
- howpublished={\url{https://huggingface.co/roylin1003/Royal_ZhTW-ID_finetuned_101}},
203
- note={Fine-tuned on Taiwan news corpus edited by Chang, Yo Han}
204
- }
205
- ```
206
-
207
- ## Acknowledgments
208
-
209
- - **Base Model**: Thanks to the Qwen team for the excellent Qwen2.5-7B-Instruct model
210
- - **Dataset**: Taiwan news corpus with Indonesian translations edited by Chang, Yo Han
211
- - **Framework**: Built using Hugging Face Transformers and PEFT libraries
212
-
213
- ## License
214
-
215
- This model is released under the Apache 2.0 License, consistent with the base Qwen2.5-7B-Instruct model.
216
-
217
- ## Contact
218
-
219
- For questions, issues, or collaborations, please open an issue in this repository or contact [your contact information].
220
 
221
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
222
 
223
- **Model Version**: 1.0
224
- **Last Updated**: [Current Date]
225
- **Status**: Initial Release - Validation Phase
226
- ---
 
1
  ---
 
2
  base_model: Qwen/Qwen2.5-7B-Instruct
3
+ library_name: peft
4
+ ---
 
 
 
 
 
 
 
 
5
 
6
+ # Model Card for Model ID
 
 
7
 
8
+ <!-- Provide a quick summary of what the model is/does. -->
 
 
9
 
10
 
11
 
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
 
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
+ [More Information Needed]
63
 
64
+ ### Recommendations
65
 
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
 
76
  ## Training Details
77
 
78
+ ### Training Data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
 
202
+ - PEFT 0.15.2
 
 
 
adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen2.5-7B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 8,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "v_proj",
28
+ "k_proj",
29
+ "gate_proj",
30
+ "q_proj",
31
+ "down_proj",
32
+ "o_proj",
33
+ "up_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51dcb43b9960d8a9c682afd0bd188035a057ba3fa09bcdfea674958151b84dd8
3
+ size 80792096
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a08eb0e13acb36a2b14ded8699fa03e315d5e3c62fe06ac9737f13f5d709eac2
3
+ size 161810747
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e06eb521f4bc09fafc1ed3c99e0294e570a0582bcbe0b52111436c2f9220ecff
3
+ size 14645
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c116af82c2f3174e078bbcd016eaac335b84bb334076fdde7b60a3390b2cb757
3
+ size 1465
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 131072,
204
+ "pad_token": "<|endoftext|>",
205
+ "padding_side": "right",
206
+ "split_special_tokens": false,
207
+ "tokenizer_class": "Qwen2Tokenizer",
208
+ "unk_token": null
209
+ }
trainer_state.json ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 3.0,
5
+ "eval_steps": 500,
6
+ "global_step": 84,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.17857142857142858,
13
+ "grad_norm": 1.3622558116912842,
14
+ "learning_rate": 0.00019825664732332884,
15
+ "loss": 1.4496,
16
+ "num_input_tokens_seen": 15664,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 0.35714285714285715,
21
+ "grad_norm": 0.8986290097236633,
22
+ "learning_rate": 0.00019308737486442045,
23
+ "loss": 1.051,
24
+ "num_input_tokens_seen": 30528,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 0.5357142857142857,
29
+ "grad_norm": 1.14683997631073,
30
+ "learning_rate": 0.00018467241992282843,
31
+ "loss": 0.9827,
32
+ "num_input_tokens_seen": 43392,
33
+ "step": 15
34
+ },
35
+ {
36
+ "epoch": 0.7142857142857143,
37
+ "grad_norm": 0.8037276864051819,
38
+ "learning_rate": 0.00017330518718298264,
39
+ "loss": 1.0249,
40
+ "num_input_tokens_seen": 60784,
41
+ "step": 20
42
+ },
43
+ {
44
+ "epoch": 0.8928571428571429,
45
+ "grad_norm": 0.8258629441261292,
46
+ "learning_rate": 0.00015938201855735014,
47
+ "loss": 0.9651,
48
+ "num_input_tokens_seen": 76080,
49
+ "step": 25
50
+ },
51
+ {
52
+ "epoch": 1.0714285714285714,
53
+ "grad_norm": 0.9029279947280884,
54
+ "learning_rate": 0.00014338837391175582,
55
+ "loss": 0.9298,
56
+ "num_input_tokens_seen": 90224,
57
+ "step": 30
58
+ },
59
+ {
60
+ "epoch": 1.25,
61
+ "grad_norm": 0.8744432926177979,
62
+ "learning_rate": 0.00012588190451025207,
63
+ "loss": 0.7694,
64
+ "num_input_tokens_seen": 107072,
65
+ "step": 35
66
+ },
67
+ {
68
+ "epoch": 1.4285714285714286,
69
+ "grad_norm": 1.3319528102874756,
70
+ "learning_rate": 0.00010747300935864243,
71
+ "loss": 0.6823,
72
+ "num_input_tokens_seen": 123216,
73
+ "step": 40
74
+ },
75
+ {
76
+ "epoch": 1.6071428571428572,
77
+ "grad_norm": 1.1126036643981934,
78
+ "learning_rate": 8.880355238966923e-05,
79
+ "loss": 0.5681,
80
+ "num_input_tokens_seen": 138416,
81
+ "step": 45
82
+ },
83
+ {
84
+ "epoch": 1.7857142857142856,
85
+ "grad_norm": 1.8243435621261597,
86
+ "learning_rate": 7.052448255890957e-05,
87
+ "loss": 0.6239,
88
+ "num_input_tokens_seen": 152784,
89
+ "step": 50
90
+ },
91
+ {
92
+ "epoch": 1.9642857142857144,
93
+ "grad_norm": 1.5611317157745361,
94
+ "learning_rate": 5.32731371726938e-05,
95
+ "loss": 0.6441,
96
+ "num_input_tokens_seen": 167264,
97
+ "step": 55
98
+ },
99
+ {
100
+ "epoch": 2.142857142857143,
101
+ "grad_norm": 1.1920043230056763,
102
+ "learning_rate": 3.7651019814126654e-05,
103
+ "loss": 0.5202,
104
+ "num_input_tokens_seen": 181488,
105
+ "step": 60
106
+ },
107
+ {
108
+ "epoch": 2.3214285714285716,
109
+ "grad_norm": 1.1858474016189575,
110
+ "learning_rate": 2.420282768545469e-05,
111
+ "loss": 0.443,
112
+ "num_input_tokens_seen": 198512,
113
+ "step": 65
114
+ },
115
+ {
116
+ "epoch": 2.5,
117
+ "grad_norm": 1.2865468263626099,
118
+ "learning_rate": 1.339745962155613e-05,
119
+ "loss": 0.4209,
120
+ "num_input_tokens_seen": 213040,
121
+ "step": 70
122
+ },
123
+ {
124
+ "epoch": 2.678571428571429,
125
+ "grad_norm": 1.2891031503677368,
126
+ "learning_rate": 5.611666969163243e-06,
127
+ "loss": 0.4818,
128
+ "num_input_tokens_seen": 229312,
129
+ "step": 75
130
+ },
131
+ {
132
+ "epoch": 2.857142857142857,
133
+ "grad_norm": 1.2530392408370972,
134
+ "learning_rate": 1.1169173774871478e-06,
135
+ "loss": 0.5292,
136
+ "num_input_tokens_seen": 244944,
137
+ "step": 80
138
+ }
139
+ ],
140
+ "logging_steps": 5,
141
+ "max_steps": 84,
142
+ "num_input_tokens_seen": 257136,
143
+ "num_train_epochs": 3,
144
+ "save_steps": 100,
145
+ "stateful_callbacks": {
146
+ "TrainerControl": {
147
+ "args": {
148
+ "should_epoch_stop": false,
149
+ "should_evaluate": false,
150
+ "should_log": false,
151
+ "should_save": true,
152
+ "should_training_stop": true
153
+ },
154
+ "attributes": {}
155
+ }
156
+ },
157
+ "total_flos": 1.0939806209654784e+16,
158
+ "train_batch_size": 2,
159
+ "trial_name": null,
160
+ "trial_params": null
161
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c68e266ff6102efbfeb979fc3ee99077812f4582306ac4bb0790f31cc7d63db
3
+ size 6161
vocab.json ADDED
The diff for this file is too large to render. See raw diff