nicolay-r commited on
Commit
d790ea4
·
verified ·
1 Parent(s): 09037f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -109
README.md CHANGED
@@ -21,36 +21,23 @@ pipeline_tag: text-generation
21
 
22
  # Model Card for Model ID
23
 
24
- <!-- Provide a quick summary of what the model is/does. -->
25
-
26
-
27
  ## Model Details
28
 
29
  ### Model Description
30
 
31
- <!-- Provide a longer summary of what this model is. -->
32
-
33
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
34
-
35
- - **Developed by:** [More Information Needed]
36
- - **Funded by [optional]:** [More Information Needed]
37
- - **Shared by [optional]:** [More Information Needed]
38
- - **Model type:** [More Information Needed]
39
- - **Language(s) (NLP):** [More Information Needed]
40
- - **License:** [More Information Needed]
41
- - **Finetuned from model [optional]:** [More Information Needed]
42
 
43
  ### Model Sources [optional]
 
44
 
45
- <!-- Provide the basic links for the model. -->
46
-
47
- - **Repository:** [More Information Needed]
48
- - **Paper [optional]:** [More Information Needed]
49
- - **Demo [optional]:** [More Information Needed]
50
-
51
- ## Uses
52
 
53
- ### Direct Use
54
 
55
  We use [bulk-chain](https://github.com/nicolay-r/bulk-chain) for inference with the Qwen2 provider based on `transformers` **pipelines API**.
56
 
@@ -88,135 +75,84 @@ for record in content_it:
88
  print(record["summary"])
89
  ```
90
 
91
-
92
- ### Out-of-Scope Use
93
-
94
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
95
-
96
- [More Information Needed]
97
-
98
- ## Bias, Risks, and Limitations
99
-
100
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
101
-
102
- [More Information Needed]
103
-
104
- ### Recommendations
105
-
106
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
107
-
108
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
109
-
110
- ## How to Get Started with the Model
111
-
112
- Use the code below to get started with the model.
113
-
114
- [More Information Needed]
115
-
116
  ## Training Details
117
 
118
  ### Training Data
119
 
120
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
121
-
122
- [More Information Needed]
 
 
123
 
124
  ### Training Procedure
125
 
126
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
127
 
128
- #### Preprocessing [optional]
 
129
 
130
- [More Information Needed]
131
 
 
 
132
 
133
  #### Training Hyperparameters
134
 
135
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
136
 
137
- #### Speeds, Sizes, Times [optional]
138
 
139
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
140
 
141
- [More Information Needed]
142
 
143
  ## Evaluation
144
 
145
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e62d11d27a8292c3637f86/LRJU_B0rKfk3celyQ9zXq.png)
146
-
147
-
148
- ### Testing Data, Factors & Metrics
149
 
150
  #### Testing Data
151
 
152
- <!-- This should link to a Dataset Card if possible. -->
153
-
154
- [More Information Needed]
155
-
156
- #### Factors
157
-
158
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
159
-
160
- [More Information Needed]
161
 
162
  #### Metrics
163
 
164
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
165
-
166
- [More Information Needed]
167
 
168
  ### Results
169
 
170
- [More Information Needed]
171
-
172
- #### Summary
173
-
174
-
175
-
176
- ## Technical Specifications [optional]
177
 
178
- ### Model Architecture and Objective
179
 
180
- [More Information Needed]
181
 
182
- ### Compute Infrastructure
183
-
184
- [More Information Needed]
185
 
186
  #### Hardware
187
 
188
- [More Information Needed]
 
 
 
 
 
189
 
190
  #### Software
191
 
192
- [More Information Needed]
 
193
 
194
  ## Citation [optional]
195
 
196
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
197
-
198
  **BibTeX:**
199
 
200
- [More Information Needed]
201
-
202
- **APA:**
203
-
204
- [More Information Needed]
205
-
206
- ## Glossary [optional]
207
-
208
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
209
-
210
- [More Information Needed]
211
-
212
- ## More Information [optional]
213
-
214
- [More Information Needed]
215
-
216
- ## Model Card Authors [optional]
217
 
218
- [More Information Needed]
219
 
220
- ## Model Card Contact
221
 
222
- [More Information Needed]
 
21
 
22
  # Model Card for Model ID
23
 
 
 
 
24
  ## Model Details
25
 
26
  ### Model Description
27
 
28
+ - **Model type:** Decoder-based Model
29
+ - **Language(s) (NLP):** Supported by Qwen2.5 + fine-tuned on summarries written in `en`, `fr`, `pt`, `es`
30
+ - **License:** MIT
31
+ - **Finetuned from model [optional]:** https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct
 
 
 
 
 
 
 
32
 
33
  ### Model Sources [optional]
34
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TXGaz39o73nBucEQw12gbad7Tw11j2Ol?usp=sharing)
35
 
36
+ - **Repository:** https://github.com/nicolay-r/distil-tuning-llm
37
+ - **Paper [optional]:** **TBA**
38
+ - **Demo [optional]:** https://colab.research.google.com/drive/1TXGaz39o73nBucEQw12gbad7Tw11j2Ol?usp=sharing
 
 
 
 
39
 
40
+ ## Usage
41
 
42
  We use [bulk-chain](https://github.com/nicolay-r/bulk-chain) for inference with the Qwen2 provider based on `transformers` **pipelines API**.
43
 
 
75
  print(record["summary"])
76
  ```
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ## Training Details
79
 
80
  ### Training Data
81
 
82
+ * **MultiClinSum**
83
+ * We use the [following script](https://github.com/nicolay-r/distill-tuning-llm/blob/main/resources/download_dataset.sh) for downloading datasets.
84
+ * **Web**: https://temu.bsc.es/multiclinsum
85
+ * **Data**: https://zenodo.org/records/15463353
86
+ * **BioASQ**: http://bioasq.org/
87
 
88
  ### Training Procedure
89
 
90
+ The training procedure involves:
91
+ 1. Preparation of the `rationale` for summaries distillation.
92
+ 2. Launch of the **fine-tuning** process.
93
 
94
+ **Fine-tuning:** Please follow this script for using `MultiClinSum` dataset for fine-tuning at GoogleColab A100 (40GB VRAM) + 80GB RAM:
95
+ * https://github.com/nicolay-r/distil-tuning-llm/blob/master/distil_ft_qwen25_05b_A100-40GB_80GB_std.sh
96
 
97
+ #### Preprocessing [optional]
98
 
99
+ Refer to the following script for the `fine-tuning` pre-processing:
100
+ * https://github.com/nicolay-r/distil-tuning-llm/blob/master/resources/make_dataset_mult.py
101
 
102
  #### Training Hyperparameters
103
 
104
+ We refer to the original parameters here:
105
+ * https://github.com/QwenLM/Qwen2.5-VL/tree/main/qwen-vl-finetune
106
+ And use the following script:
107
+ * https://github.com/nicolay-r/distil-tuning-llm/blob/master/distil_ft_qwen25_05b_A100-40GB_80GB_std.sh
108
 
 
109
 
110
+ #### Speeds, Sizes, Times [optional]
111
 
112
+ The fine-tuning procedure for `3` epochs takes around `~1 hour` using the GoogleColab A100.
113
 
114
  ## Evaluation
115
 
 
 
 
 
116
 
117
  #### Testing Data
118
 
119
+ We use evaluation split of the 20 documents out of the small portion the available training data across all the languages: `en`, `fr`, `pt`, `es`
 
 
 
 
 
 
 
 
120
 
121
  #### Metrics
122
 
123
+ In this evaluation we use onle `rouge` score.
 
 
124
 
125
  ### Results
126
 
127
+ We launch 3 individual fine-tuning processes for `distil` and `standard` versions to showcase results variation among multiple runs.
 
 
 
 
 
 
128
 
129
+ > **Figure**: the obtained results for this model correspond to the `standard` version 🟠
130
 
131
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e62d11d27a8292c3637f86/6wZ_klTgm-SmvZCGJOaC5.png)
132
 
133
+ #### Summary
 
 
134
 
135
  #### Hardware
136
 
137
+ We experiment with model inference and launching using GoolgeColab Notebook service and related resources:
138
+ * Fine-tuning: A100 (40GB)
139
+ * Inference: T4 (16GB)
140
+
141
+ Follow the Google Codalab Notebook at the repository:
142
+ * https://github.com/nicolay-r/distil-tuning-llm
143
 
144
  #### Software
145
 
146
+ This is an official repository for this card:
147
+ * https://github.com/nicolay-r/distil-tuning-llm
148
 
149
  ## Citation [optional]
150
 
 
 
151
  **BibTeX:**
152
 
153
+ > **TO BE ADDED**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
 
 
155
 
156
+ ## Model Card Authors
157
 
158
+ Nicolay Rusnachenko