ldp72 commited on
Commit
6b138d5
·
verified ·
1 Parent(s): f5d116a

docs: add README.md

Browse files
Files changed (1) hide show
  1. README.md +252 -12
README.md CHANGED
@@ -1,13 +1,21 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
-
11
 
12
  ## Model Details
13
 
@@ -15,15 +23,16 @@ tags: []
15
 
16
  <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
  - **Developed by:** [More Information Needed]
21
  - **Funded by [optional]:** [More Information Needed]
22
  - **Shared by [optional]:** [More Information Needed]
23
  - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
  - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
 
27
 
28
  ### Model Sources [optional]
29
 
@@ -41,7 +50,30 @@ This is the model card of a 🤗 transformers model that has been pushed on the
41
 
42
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ### Downstream Use [optional]
47
 
@@ -75,11 +107,182 @@ Use the code below to get started with the model.
75
 
76
  ## Training Details
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ### Training Data
79
 
80
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
  ### Training Procedure
85
 
@@ -89,10 +292,47 @@ Use the code below to get started with the model.
89
 
90
  [More Information Needed]
91
 
92
-
93
  #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  #### Speeds, Sizes, Times [optional]
98
 
@@ -196,4 +436,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
1
  ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ base_model:
5
+ - Qwen/Qwen2.5-0.5B
6
+ datasets: []
7
+ languages:
8
+ - en
9
+ metrics: []
10
+ pipeline_tag: text-generation
11
+
12
  ---
13
 
14
+ # Model Card for ldp72/Test-Qwen-Marcel.5-0.5B-it
15
 
16
  <!-- Provide a quick summary of what the model is/does. -->
17
 
18
+ This model was finetuned by performing instruct tuning on Telco domain datatsets.
19
 
20
  ## Model Details
21
 
 
23
 
24
  <!-- Provide a longer summary of what this model is. -->
25
 
26
+
27
 
28
  - **Developed by:** [More Information Needed]
29
  - **Funded by [optional]:** [More Information Needed]
30
  - **Shared by [optional]:** [More Information Needed]
31
  - **Model type:** [More Information Needed]
32
+ - **Language(s) (NLP):** English
33
  - **License:** [More Information Needed]
34
+ - **Finetuned from model [optional]:** ['Qwen/Qwen2.5-0.5B']
35
+ - **Date [optional]:** 2025-07-16 14:40:15
36
 
37
  ### Model Sources [optional]
38
 
 
50
 
51
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
52
 
53
+
54
+ This model can be used with the `transformers` library using `pipeline` abstraction as follows:
55
+
56
+ ```python
57
+ import torch
58
+ from transformers import pipeline
59
+
60
+ model_id = "ldp72/Test-Qwen-Marcel.5-0.5B-it"
61
+ pipe = pipeline(
62
+ "text-generation",
63
+ model=model_id,
64
+ torch_dtype=torch.bfloat16,
65
+ device_map="auto",
66
+ )
67
+ messages = [
68
+ {"role": "system", "content": "You are chatbot specialized on Telco domain."},
69
+ {"role": "user", "content": "Can you give a sample of your specialized knowledge?"},
70
+ ]
71
+ outputs = pipe(
72
+ messages,
73
+ max_new_tokens=256,
74
+ )
75
+ print(outputs[0]["generated_text"][-1])
76
+ ```
77
 
78
  ### Downstream Use [optional]
79
 
 
107
 
108
  ## Training Details
109
 
110
+ This model was finetuned with [Orange internal fine tuning tools](https://gitlab.tech.orange/NEPAL/knowledge/orangelm/lm-adaptation/) with the Docker Image tagged `0.1.1` in the [registry](https://gitlab.tech.orange/NEPAL/knowledge/orangelm/lm-adaptation/container_registry/84664) and the following configuration file:
111
+
112
+ ```yaml
113
+ data:
114
+ dataset_name:
115
+ train:
116
+ - path: telco-lm/arxiv-abstract-generation-telco-instructions
117
+ revision: legacy
118
+ - path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
119
+ revision: legacy
120
+ - path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
121
+ revision: legacy
122
+ - path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
123
+ revision: legacy
124
+ - path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
125
+ revision: legacy
126
+ - path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
127
+ revision: legacy
128
+ - path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
129
+ revision: legacy
130
+ - path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
131
+ revision: legacy
132
+ - path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
133
+ revision: legacy
134
+ - path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
135
+ revision: legacy
136
+ - path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
137
+ revision: legacy
138
+ - path: telco-lm/teleqna-mcqa-cot-telco-instructions
139
+ revision: legacy
140
+ - path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
141
+ revision: legacy
142
+ validation_abstract_generation:
143
+ - path: telco-lm/arxiv-abstract-generation-telco-instructions
144
+ revision: legacy
145
+ split: validation
146
+ validation_general:
147
+ - path: telco-lm/slim-orca-multi-task-general-instructions
148
+ revision: legacy
149
+ split: validation
150
+ validation_synthetic:
151
+ - path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
152
+ revision: legacy
153
+ split: validation
154
+ - path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
155
+ revision: legacy
156
+ split: validation
157
+ - path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
158
+ revision: legacy
159
+ split: validation
160
+ - path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
161
+ revision: legacy
162
+ split: validation
163
+ - path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
164
+ revision: legacy
165
+ split: validation
166
+ - path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
167
+ revision: legacy
168
+ split: validation
169
+ - path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
170
+ revision: legacy
171
+ split: validation
172
+ - path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
173
+ revision: legacy
174
+ split: validation
175
+ - path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
176
+ revision: legacy
177
+ split: validation
178
+ - path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
179
+ revision: legacy
180
+ split: validation
181
+ validation_telco_qa:
182
+ - path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
183
+ revision: legacy
184
+ split: validation
185
+ validation_telco_qcm:
186
+ - path: telco-lm/teleqna-mcqa-cot-telco-instructions
187
+ revision: legacy
188
+ split: validation
189
+ debug: true
190
+ implementation_name: instructions
191
+ description:
192
+ contributors:
193
+ - email: [email protected]
194
+ first_name: Loïc
195
+ last_name: Fosse
196
+ - email: [email protected]
197
+ first_name: Lionel
198
+ last_name: Delphin-Poulat
199
+ - email: [email protected]
200
+ first_name: Ismaël
201
+ last_name: Rousseau
202
+ domain: Telco
203
+ languages:
204
+ - en
205
+ model_name: ldp72/Test-Qwen-Marcel.5-0.5B-it
206
+ image:
207
+ version: 0.1.1
208
+ model:
209
+ attn_implementation: flash_attention_2
210
+ chat_template_tokenizer: Qwen/Qwen2.5-0.5B-Instruct
211
+ model_name_or_path: Qwen/Qwen2.5-0.5B
212
+ trust_remote_code: true
213
+ training:
214
+ bf16: true
215
+ dataloader_num_workers: 4
216
+ dataloader_persistent_workers: true
217
+ dataloader_pin_memory: true
218
+ dataloader_prefetch_factor: 2
219
+ disable_tqdm: true
220
+ eval_accumulation_steps: 1
221
+ eval_steps: 10
222
+ eval_strategy: steps
223
+ fp16: false
224
+ gradient_accumulation_steps: 2
225
+ gradient_checkpointing: true
226
+ group_by_length: false
227
+ learning_rate: 2.0e-05
228
+ log_level: debug
229
+ logging_dir: /outputs/Telco-Qwen2.5-0.5B-it-profiling-nodeepspeed-1gpu-2/logs
230
+ logging_steps: 10
231
+ lr_scheduler_type: cosine
232
+ max_grad_norm: 1.0
233
+ max_steps: -1
234
+ num_train_epochs: 2
235
+ optim: paged_adamw_32bit
236
+ output_dir: /outputs/Telco-Qwen2.5-0.5B-it-profiling-nodeepspeed-1gpu-2
237
+ per_device_eval_batch_size: 2
238
+ per_device_train_batch_size: 2
239
+ push_to_hub: false
240
+ report_to: tensorboard
241
+ save_steps: 0
242
+ save_strategy: epoch
243
+ save_total_limit: 1
244
+ seed: 42
245
+ torch_compile: false
246
+ training_type: instruct-tuning
247
+ use_liger_kernel: false
248
+ warmup_ratio: 0.05
249
+ weight_decay: 0.1
250
+ ```
251
+
252
  ### Training Data
253
 
254
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
255
 
256
+ This model was trained on the following datasets:
257
+
258
+ ```yaml
259
+ - path: telco-lm/arxiv-abstract-generation-telco-instructions
260
+ revision: legacy
261
+ - path: telco-lm/synthetic-dsp.stackexchange.com-multi-task-telco-instructions
262
+ revision: legacy
263
+ - path: telco-lm/synthetic-networkengineering.stackexchange.com-multi-task-telco-instructions
264
+ revision: legacy
265
+ - path: telco-lm/synthetic-security.stackexchange.com-multi-task-telco-instructions
266
+ revision: legacy
267
+ - path: telco-lm/synthetic-technical-3gpp-multi-task-telco-instructions
268
+ revision: legacy
269
+ - path: telco-lm/synthetic-technical-5gamericas-multi-task-telco-instructions
270
+ revision: legacy
271
+ - path: telco-lm/synthetic-technical-huawei-multi-task-telco-instructions
272
+ revision: legacy
273
+ - path: telco-lm/synthetic-technical-itu-multi-task-telco-instructions
274
+ revision: legacy
275
+ - path: telco-lm/synthetic-technical-mef-multi-task-telco-instructions
276
+ revision: legacy
277
+ - path: telco-lm/synthetic-technical-ngmn-multi-task-telco-instructions
278
+ revision: legacy
279
+ - path: telco-lm/synthetic-technical-rfc-multi-task-telco-instructions
280
+ revision: legacy
281
+ - path: telco-lm/teleqna-mcqa-cot-telco-instructions
282
+ revision: legacy
283
+ - path: telco-lm/tii-huawei-qa-open-qa-telco-instructions
284
+ revision: legacy
285
+ ```
286
 
287
  ### Training Procedure
288
 
 
292
 
293
  [More Information Needed]
294
 
 
295
  #### Training Hyperparameters
296
 
297
+ - **Training regime:** This model was trained with the following hyperparameters for `SFTTrainer`,other parameters were set as default:
298
+
299
+ ```yaml
300
+ bf16: true
301
+ dataloader_num_workers: 4
302
+ dataloader_persistent_workers: true
303
+ dataloader_pin_memory: true
304
+ dataloader_prefetch_factor: 2
305
+ disable_tqdm: true
306
+ eval_accumulation_steps: 1
307
+ eval_steps: 10
308
+ eval_strategy: steps
309
+ fp16: false
310
+ gradient_accumulation_steps: 2
311
+ gradient_checkpointing: true
312
+ group_by_length: false
313
+ learning_rate: 2.0e-05
314
+ log_level: debug
315
+ logging_dir: /outputs/Telco-Qwen2.5-0.5B-it-profiling-nodeepspeed-1gpu-2/logs
316
+ logging_steps: 10
317
+ lr_scheduler_type: cosine
318
+ max_grad_norm: 1.0
319
+ max_steps: -1
320
+ num_train_epochs: 2
321
+ optim: paged_adamw_32bit
322
+ output_dir: /outputs/Telco-Qwen2.5-0.5B-it-profiling-nodeepspeed-1gpu-2
323
+ per_device_eval_batch_size: 2
324
+ per_device_train_batch_size: 2
325
+ push_to_hub: false
326
+ report_to: tensorboard
327
+ save_steps: 0
328
+ save_strategy: epoch
329
+ save_total_limit: 1
330
+ seed: 42
331
+ torch_compile: false
332
+ use_liger_kernel: false
333
+ warmup_ratio: 0.05
334
+ weight_decay: 0.1
335
+ ``` <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
336
 
337
  #### Speeds, Sizes, Times [optional]
338
 
 
436
 
437
  ## Model Card Contact
438
 
439
+ Thanks to [Loïc Fosse](mailto:[email protected]), [Lionel Delphin-Poulat](mailto:[email protected]), [Ismaël Rousseau](mailto:[email protected]) for adding this model.