Model save
Browse files
README.md
ADDED
@@ -0,0 +1,161 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
base_model: PowerInfer/SmallThinker-3B-Preview
|
4 |
+
tags:
|
5 |
+
- generated_from_trainer
|
6 |
+
model-index:
|
7 |
+
- name: smartmind-cyberone-20250410_x2
|
8 |
+
results: []
|
9 |
+
---
|
10 |
+
|
11 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
12 |
+
should probably proofread and complete it, then remove this comment. -->
|
13 |
+
|
14 |
+
# smartmind-cyberone-20250410_x2
|
15 |
+
|
16 |
+
This model is a fine-tuned version of [PowerInfer/SmallThinker-3B-Preview](https://huggingface.co/PowerInfer/SmallThinker-3B-Preview) on an unknown dataset.
|
17 |
+
It achieves the following results on the evaluation set:
|
18 |
+
- Loss: 0.0159
|
19 |
+
|
20 |
+
## Model description
|
21 |
+
|
22 |
+
More information needed
|
23 |
+
|
24 |
+
## Intended uses & limitations
|
25 |
+
|
26 |
+
More information needed
|
27 |
+
|
28 |
+
## Training and evaluation data
|
29 |
+
|
30 |
+
More information needed
|
31 |
+
|
32 |
+
## Training procedure
|
33 |
+
|
34 |
+
### Training hyperparameters
|
35 |
+
|
36 |
+
The following hyperparameters were used during training:
|
37 |
+
- learning_rate: 1e-05
|
38 |
+
- train_batch_size: 8
|
39 |
+
- eval_batch_size: 8
|
40 |
+
- seed: 42
|
41 |
+
- distributed_type: multi-GPU
|
42 |
+
- gradient_accumulation_steps: 8
|
43 |
+
- total_train_batch_size: 64
|
44 |
+
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
45 |
+
- lr_scheduler_type: cosine_with_restarts
|
46 |
+
- lr_scheduler_warmup_ratio: 0.1
|
47 |
+
- num_epochs: 5
|
48 |
+
- mixed_precision_training: Native AMP
|
49 |
+
|
50 |
+
### Training results
|
51 |
+
|
52 |
+
| Training Loss | Epoch | Step | Validation Loss |
|
53 |
+
|:-------------:|:------:|:-----:|:---------------:|
|
54 |
+
| 0.6761 | 0.0499 | 276 | 0.2245 |
|
55 |
+
| 0.2072 | 0.0998 | 552 | 0.1757 |
|
56 |
+
| 0.1812 | 0.1498 | 828 | 0.1140 |
|
57 |
+
| 0.1469 | 0.1997 | 1104 | 0.1493 |
|
58 |
+
| 0.1224 | 0.2496 | 1380 | 0.0789 |
|
59 |
+
| 0.1142 | 0.2995 | 1656 | 0.1227 |
|
60 |
+
| 0.1194 | 0.3494 | 1932 | 0.0812 |
|
61 |
+
| 0.1048 | 0.3994 | 2208 | 0.0452 |
|
62 |
+
| 0.1145 | 0.4493 | 2484 | 0.0593 |
|
63 |
+
| 0.0943 | 0.4992 | 2760 | 0.0880 |
|
64 |
+
| 0.1149 | 0.5491 | 3036 | 0.2158 |
|
65 |
+
| 0.2192 | 0.5990 | 3312 | 0.1650 |
|
66 |
+
| 0.123 | 0.6490 | 3588 | 0.1046 |
|
67 |
+
| 0.1071 | 0.6989 | 3864 | 0.0775 |
|
68 |
+
| 0.0936 | 0.7488 | 4140 | 0.1638 |
|
69 |
+
| 0.0867 | 0.7987 | 4416 | 0.0447 |
|
70 |
+
| 0.0832 | 0.8486 | 4692 | 0.0624 |
|
71 |
+
| 0.1466 | 0.8986 | 4968 | 0.3147 |
|
72 |
+
| 0.0932 | 0.9485 | 5244 | 0.0552 |
|
73 |
+
| 0.0897 | 0.9984 | 5520 | 0.0408 |
|
74 |
+
| 0.0694 | 1.0485 | 5796 | 0.0458 |
|
75 |
+
| 0.0714 | 1.0984 | 6072 | 0.0582 |
|
76 |
+
| 0.0737 | 1.1483 | 6348 | 0.0550 |
|
77 |
+
| 0.0796 | 1.1982 | 6624 | 0.0386 |
|
78 |
+
| 0.0621 | 1.2482 | 6900 | 0.0586 |
|
79 |
+
| 0.0578 | 1.2981 | 7176 | 0.0283 |
|
80 |
+
| 0.0539 | 1.3480 | 7452 | 0.0320 |
|
81 |
+
| 0.0491 | 1.3979 | 7728 | 0.0518 |
|
82 |
+
| 0.0448 | 1.4478 | 8004 | 0.0360 |
|
83 |
+
| 0.0475 | 1.4978 | 8280 | 0.0403 |
|
84 |
+
| 0.0411 | 1.5477 | 8556 | 0.0217 |
|
85 |
+
| 0.0382 | 1.5976 | 8832 | 0.0255 |
|
86 |
+
| 0.0453 | 1.6475 | 9108 | 0.0215 |
|
87 |
+
| 0.0424 | 1.6974 | 9384 | 0.0250 |
|
88 |
+
| 0.039 | 1.7473 | 9660 | 0.0247 |
|
89 |
+
| 0.0393 | 1.7973 | 9936 | 0.0230 |
|
90 |
+
| 0.0384 | 1.8472 | 10212 | 0.0200 |
|
91 |
+
| 0.032 | 1.8971 | 10488 | 0.0210 |
|
92 |
+
| 0.0352 | 1.9470 | 10764 | 0.0234 |
|
93 |
+
| 0.0346 | 1.9969 | 11040 | 0.0228 |
|
94 |
+
| 0.0331 | 2.0470 | 11316 | 0.0276 |
|
95 |
+
| 0.0314 | 2.0969 | 11592 | 0.0219 |
|
96 |
+
| 0.0355 | 2.1469 | 11868 | 0.0208 |
|
97 |
+
| 0.0271 | 2.1968 | 12144 | 0.0235 |
|
98 |
+
| 0.0258 | 2.2467 | 12420 | 0.0197 |
|
99 |
+
| 0.0286 | 2.2966 | 12696 | 0.0191 |
|
100 |
+
| 0.0284 | 2.3465 | 12972 | 0.0203 |
|
101 |
+
| 0.0251 | 2.3965 | 13248 | 0.0177 |
|
102 |
+
| 0.0273 | 2.4464 | 13524 | 0.0171 |
|
103 |
+
| 0.0244 | 2.4963 | 13800 | 0.0157 |
|
104 |
+
| 0.0247 | 2.5462 | 14076 | 0.0150 |
|
105 |
+
| 0.0256 | 2.5961 | 14352 | 0.0149 |
|
106 |
+
| 0.0227 | 2.6461 | 14628 | 0.0156 |
|
107 |
+
| 0.0257 | 2.6960 | 14904 | 0.0155 |
|
108 |
+
| 0.0217 | 2.7459 | 15180 | 0.0156 |
|
109 |
+
| 0.0243 | 2.7958 | 15456 | 0.0688 |
|
110 |
+
| 0.047 | 2.8457 | 15732 | 0.0269 |
|
111 |
+
| 0.0511 | 2.8957 | 16008 | 0.0220 |
|
112 |
+
| 0.0526 | 2.9456 | 16284 | 0.0311 |
|
113 |
+
| 0.0441 | 2.9955 | 16560 | 0.0264 |
|
114 |
+
| 0.0383 | 3.0456 | 16836 | 0.0263 |
|
115 |
+
| 0.0333 | 3.0955 | 17112 | 0.0239 |
|
116 |
+
| 0.0484 | 3.1454 | 17388 | 0.0328 |
|
117 |
+
| 0.0431 | 3.1953 | 17664 | 0.0268 |
|
118 |
+
| 0.0394 | 3.2453 | 17940 | 0.0409 |
|
119 |
+
| 0.0406 | 3.2952 | 18216 | 0.0388 |
|
120 |
+
| 0.038 | 3.3451 | 18492 | 0.0312 |
|
121 |
+
| 0.0391 | 3.3950 | 18768 | 0.0261 |
|
122 |
+
| 0.0361 | 3.4449 | 19044 | 0.0259 |
|
123 |
+
| 0.0485 | 3.4949 | 19320 | 0.0393 |
|
124 |
+
| 0.0394 | 3.5448 | 19596 | 0.0564 |
|
125 |
+
| 0.0391 | 3.5947 | 19872 | 0.0466 |
|
126 |
+
| 0.0388 | 3.6446 | 20148 | 0.0571 |
|
127 |
+
| 0.0326 | 3.6945 | 20424 | 0.0354 |
|
128 |
+
| 0.0428 | 3.7445 | 20700 | 0.0282 |
|
129 |
+
| 0.0342 | 3.7944 | 20976 | 0.0212 |
|
130 |
+
| 0.0389 | 3.8443 | 21252 | 0.0304 |
|
131 |
+
| 0.0369 | 3.8942 | 21528 | 0.0273 |
|
132 |
+
| 0.0298 | 3.9441 | 21804 | 0.0215 |
|
133 |
+
| 0.027 | 3.9941 | 22080 | 0.0234 |
|
134 |
+
| 0.0334 | 4.0441 | 22356 | 0.0218 |
|
135 |
+
| 0.0316 | 4.0941 | 22632 | 0.0241 |
|
136 |
+
| 0.0296 | 4.1440 | 22908 | 0.0228 |
|
137 |
+
| 0.0324 | 4.1939 | 23184 | 0.0183 |
|
138 |
+
| 0.0286 | 4.2438 | 23460 | 0.0196 |
|
139 |
+
| 0.0213 | 4.2937 | 23736 | 0.0219 |
|
140 |
+
| 0.0299 | 4.3437 | 24012 | 0.0226 |
|
141 |
+
| 0.0253 | 4.3936 | 24288 | 0.0223 |
|
142 |
+
| 0.0222 | 4.4435 | 24564 | 0.0186 |
|
143 |
+
| 0.0228 | 4.4934 | 24840 | 0.0209 |
|
144 |
+
| 0.0265 | 4.5433 | 25116 | 0.0166 |
|
145 |
+
| 0.0224 | 4.5932 | 25392 | 0.0196 |
|
146 |
+
| 0.0257 | 4.6432 | 25668 | 0.0198 |
|
147 |
+
| 0.0278 | 4.6931 | 25944 | 0.0178 |
|
148 |
+
| 0.0236 | 4.7430 | 26220 | 0.0174 |
|
149 |
+
| 0.0225 | 4.7929 | 26496 | 0.0165 |
|
150 |
+
| 0.024 | 4.8428 | 26772 | 0.0163 |
|
151 |
+
| 0.0244 | 4.8928 | 27048 | 0.0159 |
|
152 |
+
| 0.0233 | 4.9427 | 27324 | 0.0159 |
|
153 |
+
| 0.0252 | 4.9926 | 27600 | 0.0159 |
|
154 |
+
|
155 |
+
|
156 |
+
### Framework versions
|
157 |
+
|
158 |
+
- Transformers 4.50.3
|
159 |
+
- Pytorch 2.5.1+cu124
|
160 |
+
- Datasets 3.5.0
|
161 |
+
- Tokenizers 0.21.1
|
model-00001-of-00002.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4957559960
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f6437f81ccce9231a80b1d8375be0699678e913d7b12f299a1397d5dcbdb2f83
|
3 |
size 4957559960
|
model-00002-of-00002.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1214374880
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:900c5548d86c7f54d2a3868dfdff1516b2aeb2fd326dd32c44ebb43a9e43912f
|
3 |
size 1214374880
|
runs/Apr11_00-05-25_d13628689784/events.out.tfevents.1744329964.d13628689784.187347.0
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f99311a509831022b6689214825a30e94a04a2ef1b9cf65110ff4dce640f32dd
|
3 |
+
size 55223
|