zephyr-7b-sft-full-100ep

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the vipinkatara/SFT_data223 dataset. It achieves the following results on the evaluation set:

Loss: 0.0012

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
0.474	1.0	23	0.2754
0.0863	2.0	46	0.0708
0.0644	3.0	69	0.0597
0.0582	4.0	92	0.0562
0.0557	5.0	115	0.0542
0.0572	6.0	138	0.0571
0.0569	7.0	161	0.0550
0.0551	8.0	184	0.0540
0.055	9.0	207	0.0530
0.0921	10.0	230	0.4721
0.1954	11.0	253	0.2910
0.0926	12.0	276	0.0766
0.0558	13.0	299	0.0519
0.0538	14.0	322	0.0493
0.0517	15.0	345	0.0479
0.0489	16.0	368	0.0466
0.0466	17.0	391	0.0416
0.0418	18.0	414	0.0349
0.0347	19.0	437	0.0298
0.0304	20.0	460	0.0256
0.0252	21.0	483	0.0192
0.0201	22.0	506	0.0128
0.0128	23.0	529	0.0107
0.0095	24.0	552	0.0054
0.0062	25.0	575	0.0038
0.005	26.0	598	0.0029
0.0038	27.0	621	0.0024
0.0032	28.0	644	0.0022
0.0028	29.0	667	0.0019
0.0026	30.0	690	0.0018
0.0022	31.0	713	0.0016
0.002	32.0	736	0.0015
0.0019	33.0	759	0.0015
0.0018	34.0	782	0.0015
0.0018	35.0	805	0.0014
0.0018	36.0	828	0.0014
0.0017	37.0	851	0.0014
0.0017	38.0	874	0.0014
0.0021	39.0	897	0.0020
0.0023	40.0	920	0.0018
0.0019	41.0	943	0.0017
0.0019	42.0	966	0.0016
0.0018	43.0	989	0.0015
0.0017	44.0	1012	0.0014
0.0017	45.0	1035	0.0014
0.0016	46.0	1058	0.0015
0.0019	47.0	1081	0.0014
0.0017	48.0	1104	0.0015
0.0017	49.0	1127	0.0015
0.0036	50.0	1150	0.0039
0.0029	51.0	1173	0.0031
0.0021	52.0	1196	0.0018
0.0017	53.0	1219	0.0015
0.0017	54.0	1242	0.0014
0.0016	55.0	1265	0.0014
0.0015	56.0	1288	0.0013
0.0015	57.0	1311	0.0013
0.0014	58.0	1334	0.0013
0.0014	59.0	1357	0.0013
0.0014	60.0	1380	0.0013
0.0013	61.0	1403	0.0013
0.0014	62.0	1426	0.0012
0.0013	63.0	1449	0.0012
0.0013	64.0	1472	0.0012
0.0014	65.0	1495	0.0012
0.0013	66.0	1518	0.0012
0.0013	67.0	1541	0.0012
0.0013	68.0	1564	0.0012
0.0014	69.0	1587	0.0012
0.0013	70.0	1610	0.0012
0.0014	71.0	1633	0.0012
0.0014	72.0	1656	0.0012
0.0013	73.0	1679	0.0012
0.0013	74.0	1702	0.0012
0.0013	75.0	1725	0.0012
0.0013	76.0	1748	0.0012
0.0013	77.0	1771	0.0012
0.0012	78.0	1794	0.0012
0.0013	79.0	1817	0.0012
0.0012	80.0	1840	0.0012
0.0013	81.0	1863	0.0012
0.0013	82.0	1886	0.0012
0.0013	83.0	1909	0.0012
0.0012	84.0	1932	0.0012
0.0012	85.0	1955	0.0012
0.0013	86.0	1978	0.0012
0.0012	87.0	2001	0.0012
0.0013	88.0	2024	0.0012
0.0012	89.0	2047	0.0012
0.0013	90.0	2070	0.0012
0.0011	91.0	2093	0.0012
0.0012	92.0	2116	0.0012
0.0012	93.0	2139	0.0012
0.0013	94.0	2162	0.0012
0.0012	95.0	2185	0.0012
0.0013	96.0	2208	0.0012
0.0012	97.0	2231	0.0012
0.0011	98.0	2254	0.0012
0.0012	99.0	2277	0.0012
0.0012	100.0	2300	0.0012

Framework versions

Transformers 4.40.2
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

vipinkatara
/

zephyr-7b-sft-full-100ep

zephyr-7b-sft-full-100ep

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for vipinkatara/zephyr-7b-sft-full-100ep

Dataset used to train vipinkatara/zephyr-7b-sft-full-100ep

Evaluation results