Update README.md
Browse files
README.md
CHANGED
@@ -4,19 +4,17 @@ language:
|
|
4 |
- en
|
5 |
license: apache-2.0
|
6 |
base_model: pszemraj/tFINE-850m-24x24-v0.4-flan_aug
|
7 |
-
tags:
|
8 |
-
- generated_from_trainer
|
9 |
metrics:
|
10 |
- rouge
|
11 |
-
|
12 |
-
-
|
13 |
-
|
14 |
---
|
15 |
|
16 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
17 |
should probably proofread and complete it, then remove this comment. -->
|
18 |
|
19 |
-
# tFINE-850m-24x24-v0.
|
20 |
|
21 |
This model is a fine-tuned version of [pszemraj/tFINE-850m-24x24-v0.4-flan_aug](https://huggingface.co/pszemraj/tFINE-850m-24x24-v0.4-flan_aug) on the pszemraj/infinity-instruct-7m-T2T_en dataset.
|
22 |
It achieves the following results on the evaluation set:
|
@@ -28,6 +26,26 @@ It achieves the following results on the evaluation set:
|
|
28 |
- Gen Len: 441.475
|
29 |
- Num Input Tokens Seen: 435513684
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
## Quick eval
|
32 |
|
33 |
Quick eval for: `pszemraj/tFINE-850m-24x24-v0.5-instruct-L1`
|
@@ -49,36 +67,3 @@ hf (pretrained=pszemraj/tFINE-850m-24x24-v0.5-instruct-L1,trust_remote_code=True
|
|
49 |
|tinyMMLU | 0|none | 0|acc_norm |↑ |0.3021|± | N/A|
|
50 |
|winogrande | 1|none | 0|acc |↑ |0.4925|± |0.0141|
|
51 |
|
52 |
-
## Training procedure
|
53 |
-
|
54 |
-
### Training hyperparameters
|
55 |
-
|
56 |
-
The following hyperparameters were used during training:
|
57 |
-
- learning_rate: 3e-05
|
58 |
-
- train_batch_size: 4
|
59 |
-
- eval_batch_size: 4
|
60 |
-
- seed: 776444
|
61 |
-
- gradient_accumulation_steps: 32
|
62 |
-
- total_train_batch_size: 128
|
63 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
64 |
-
- lr_scheduler_type: constant_with_warmup
|
65 |
-
- lr_scheduler_warmup_ratio: 0.05
|
66 |
-
- num_epochs: 1.0
|
67 |
-
|
68 |
-
### Training results
|
69 |
-
|
70 |
-
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len | Input Tokens Seen |
|
71 |
-
|:-------------:|:------:|:-----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|:-----------------:|
|
72 |
-
| 1.8808 | 0.0807 | 1000 | 1.7883 | 24.1946 | 12.2099 | 20.4185 | 22.251 | 636.465 | 35147692 |
|
73 |
-
| 1.6545 | 0.1613 | 2000 | 1.5985 | 28.9492 | 15.3233 | 23.871 | 26.9919 | 577.04 | 70510224 |
|
74 |
-
| 1.5522 | 0.2420 | 3000 | 1.4907 | 30.4033 | 16.1354 | 24.7244 | 28.5037 | 537.77 | 105707144 |
|
75 |
-
| 1.5059 | 0.3227 | 4000 | 1.4204 | 34.0294 | 19.2608 | 27.9322 | 32.3166 | 522.495 | 140722844 |
|
76 |
-
| 1.4346 | 0.4034 | 5000 | 1.3636 | 34.4104 | 19.4149 | 28.1022 | 32.7299 | 494.68 | 175639924 |
|
77 |
-
| 1.3912 | 0.4840 | 6000 | 1.3159 | 36.5059 | 21.2447 | 30.116 | 34.7303 | 469.885 | 210409328 |
|
78 |
-
| 1.3148 | 0.5647 | 7000 | 1.2807 | 37.0123 | 21.3666 | 30.11 | 35.0891 | 458.28 | 245601908 |
|
79 |
-
| 1.2859 | 0.6454 | 8000 | 1.2492 | 37.05 | 21.0468 | 29.7988 | 35.1882 | 452.495 | 280866724 |
|
80 |
-
| 1.298 | 0.7260 | 9000 | 1.2211 | 36.6966 | 20.8189 | 29.7115 | 34.7528 | 464.37 | 316042068 |
|
81 |
-
| 1.2834 | 0.8067 | 10000 | 1.1979 | 37.7181 | 20.9926 | 30.3857 | 35.8681 | 446.26 | 351056548 |
|
82 |
-
| 1.2577 | 0.8874 | 11000 | 1.1752 | 39.3539 | 23.0123 | 31.9005 | 37.4941 | 424.445 | 386471860 |
|
83 |
-
| 1.193 | 0.9680 | 12000 | 1.1526 | 40.1804 | 23.1008 | 32.3484 | 38.2103 | 422.225 | 421585440 |
|
84 |
-
|
|
|
4 |
- en
|
5 |
license: apache-2.0
|
6 |
base_model: pszemraj/tFINE-850m-24x24-v0.4-flan_aug
|
|
|
|
|
7 |
metrics:
|
8 |
- rouge
|
9 |
+
datasets:
|
10 |
+
- pszemraj/infinity-instruct-7m-T2T_en
|
11 |
+
pipeline_tag: text2text-generation
|
12 |
---
|
13 |
|
14 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
should probably proofread and complete it, then remove this comment. -->
|
16 |
|
17 |
+
# tFINE-850m-24x24-v0.5-instruct-L1
|
18 |
|
19 |
This model is a fine-tuned version of [pszemraj/tFINE-850m-24x24-v0.4-flan_aug](https://huggingface.co/pszemraj/tFINE-850m-24x24-v0.4-flan_aug) on the pszemraj/infinity-instruct-7m-T2T_en dataset.
|
20 |
It achieves the following results on the evaluation set:
|
|
|
26 |
- Gen Len: 441.475
|
27 |
- Num Input Tokens Seen: 435513684
|
28 |
|
29 |
+
## usage
|
30 |
+
|
31 |
+
```py
|
32 |
+
from transformers import pipeline
|
33 |
+
|
34 |
+
pipe = pipeline(
|
35 |
+
"text2text-generation", model="pszemraj/tFINE-850m-24x24-v0.5-instruct-L1"
|
36 |
+
)
|
37 |
+
prompt = "write a python script to download a file from a url and save as a local file using requests. explain how it works"
|
38 |
+
res = pipe(
|
39 |
+
prompt,
|
40 |
+
max_new_tokens=192,
|
41 |
+
top_k=4,
|
42 |
+
penalty_alpha=0.6,
|
43 |
+
renormalize_logits=True,
|
44 |
+
no_repeat_ngram_size=5,
|
45 |
+
)
|
46 |
+
print(res[0]["generated_text"])
|
47 |
+
```
|
48 |
+
|
49 |
## Quick eval
|
50 |
|
51 |
Quick eval for: `pszemraj/tFINE-850m-24x24-v0.5-instruct-L1`
|
|
|
67 |
|tinyMMLU | 0|none | 0|acc_norm |↑ |0.3021|± | N/A|
|
68 |
|winogrande | 1|none | 0|acc |↑ |0.4925|± |0.0141|
|
69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|