thiomajid commited on
Commit
8772d56
·
verified ·
1 Parent(s): d5b724d

Model save

Browse files
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: HuggingFaceTB/SmolLM2-135M-Instruct
5
+ tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: SmolHausaLM-135M
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # SmolHausaLM-135M
16
+
17
+ This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) on the None dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 9.3567
20
+
21
+ ## Model description
22
+
23
+ More information needed
24
+
25
+ ## Intended uses & limitations
26
+
27
+ More information needed
28
+
29
+ ## Training and evaluation data
30
+
31
+ More information needed
32
+
33
+ ## Training procedure
34
+
35
+ ### Training hyperparameters
36
+
37
+ The following hyperparameters were used during training:
38
+ - learning_rate: 0.0002
39
+ - train_batch_size: 10
40
+ - eval_batch_size: 10
41
+ - seed: 42
42
+ - gradient_accumulation_steps: 5
43
+ - total_train_batch_size: 50
44
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
+ - lr_scheduler_type: cosine
46
+ - lr_scheduler_warmup_ratio: 0.1
47
+ - num_epochs: 5
48
+
49
+ ### Training results
50
+
51
+ | Training Loss | Epoch | Step | Validation Loss |
52
+ |:-------------:|:------:|:----:|:---------------:|
53
+ | 44.2818 | 0.3526 | 100 | 9.5091 |
54
+ | 37.501 | 0.7052 | 200 | 8.9090 |
55
+ | 36.3613 | 1.0599 | 300 | 8.8260 |
56
+ | 34.3793 | 1.4126 | 400 | 8.9936 |
57
+ | 34.1721 | 1.7652 | 500 | 8.9672 |
58
+ | 33.3041 | 2.1199 | 600 | 9.0472 |
59
+ | 31.0766 | 2.4725 | 700 | 9.0407 |
60
+ | 30.7626 | 2.8251 | 800 | 9.1113 |
61
+ | 28.3702 | 3.1798 | 900 | 9.2313 |
62
+ | 25.6234 | 3.5324 | 1000 | 9.2606 |
63
+ | 25.4011 | 3.8850 | 1100 | 9.2470 |
64
+ | 22.6147 | 4.2398 | 1200 | 9.3353 |
65
+ | 21.252 | 4.5924 | 1300 | 9.3529 |
66
+ | 21.2066 | 4.9450 | 1400 | 9.3567 |
67
+
68
+
69
+ ### Framework versions
70
+
71
+ - Transformers 4.47.0
72
+ - Pytorch 2.5.1+cu121
73
+ - Datasets 3.3.1
74
+ - Tokenizers 0.21.0
events.out.tfevents.1741221792.9b95f613af10.2322.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:218004d0c0d5b4499b6b84a5659d0028737fbd6aa04d80573d1aebec4e2aba2f
3
- size 12370
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fd852391794fa2575c576a801e898e4333c0de094b3c1ae1cf6f266a5fe83ae
3
+ size 12724
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 2,
6
+ "transformers_version": "4.47.0"
7
+ }