reward

This model is a fine-tuned version of allenai/Llama-3.1-Tulu-3-8B-SFT on the persona-math-filtered-64-llama-factory_tulu-3-sft-personas-math-filtered_llama-3.1-tulu-3-8b-sft_64_1_train dataset. It achieves the following results on the evaluation set:

Loss: 0.4257
Accuracy: 0.7768

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 32
total_train_batch_size: 256
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.9289	0.0168	5	0.8277	0.5503
0.7032	0.0337	10	0.6998	0.6352
0.735	0.0505	15	0.6130	0.644
0.6398	0.0674	20	0.5657	0.6743
0.5123	0.0842	25	0.5579	0.695
0.5098	0.1011	30	0.5404	0.7025
0.5597	0.1179	35	0.5175	0.7133
0.4819	0.1347	40	0.5116	0.7248
0.4874	0.1516	45	0.5042	0.7285
0.5318	0.1684	50	0.5086	0.7292
0.4955	0.1853	55	0.5065	0.7282
0.4956	0.2021	60	0.4871	0.7405
0.5021	0.2189	65	0.4891	0.741
0.5192	0.2358	70	0.5081	0.728
0.4748	0.2526	75	0.4904	0.7352
0.4881	0.2695	80	0.4838	0.7395
0.5092	0.2863	85	0.4938	0.7345
0.4971	0.3032	90	0.4835	0.7372
0.4878	0.32	95	0.4705	0.7472
0.4762	0.3368	100	0.4720	0.7365
0.4511	0.3537	105	0.4958	0.733
0.5213	0.3705	110	0.4826	0.7412
0.4569	0.3874	115	0.4830	0.7455
0.4919	0.4042	120	0.4627	0.7498
0.4853	0.4211	125	0.4565	0.7508
0.4638	0.4379	130	0.4577	0.748
0.4941	0.4547	135	0.4549	0.75
0.4661	0.4716	140	0.4552	0.7578
0.4886	0.4884	145	0.4508	0.755
0.4433	0.5053	150	0.4468	0.7655
0.4819	0.5221	155	0.4552	0.7555
0.4794	0.5389	160	0.4604	0.7565
0.4272	0.5558	165	0.4549	0.757
0.4615	0.5726	170	0.4579	0.7612
0.4417	0.5895	175	0.4460	0.758
0.4275	0.6063	180	0.4453	0.7652
0.4303	0.6232	185	0.4468	0.7628
0.4286	0.64	190	0.4397	0.7715
0.4655	0.6568	195	0.4369	0.7675
0.386	0.6737	200	0.4416	0.7618
0.4129	0.6905	205	0.4336	0.767
0.3851	0.7074	210	0.4335	0.77
0.4516	0.7242	215	0.4339	0.7742
0.3995	0.7411	220	0.4313	0.7715
0.3488	0.7579	225	0.4322	0.7698
0.4874	0.7747	230	0.4299	0.7732
0.4217	0.7916	235	0.4288	0.7708
0.4295	0.8084	240	0.4299	0.771
0.4777	0.8253	245	0.4318	0.7678
0.4612	0.8421	250	0.4271	0.772
0.4576	0.8589	255	0.4309	0.771
0.3921	0.8758	260	0.4333	0.7722
0.4372	0.8926	265	0.4302	0.7722
0.5449	0.9095	270	0.4335	0.7695
0.4428	0.9263	275	0.4311	0.7728
0.4395	0.9432	280	0.4287	0.7745
0.4674	0.96	285	0.4262	0.776
0.4225	0.9768	290	0.4257	0.7765
0.4262	0.9937	295	0.4258	0.7762

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

graf
/

Llama-3.1-Tulu-3-8B-SFT-MATH-RM

reward

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for graf/Llama-3.1-Tulu-3-8B-SFT-MATH-RM

Evaluation results