reward

This model is a fine-tuned version of allenai/Llama-3.1-Tulu-3-8B-SFT on the persona-math-filtered-64-llama-factory_tulu-3-sft-personas-math-filtered_llama-3.1-tulu-3-8b-sft_64_1_train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4257
  • Accuracy: 0.7768

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 256
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.9289 0.0168 5 0.8277 0.5503
0.7032 0.0337 10 0.6998 0.6352
0.735 0.0505 15 0.6130 0.644
0.6398 0.0674 20 0.5657 0.6743
0.5123 0.0842 25 0.5579 0.695
0.5098 0.1011 30 0.5404 0.7025
0.5597 0.1179 35 0.5175 0.7133
0.4819 0.1347 40 0.5116 0.7248
0.4874 0.1516 45 0.5042 0.7285
0.5318 0.1684 50 0.5086 0.7292
0.4955 0.1853 55 0.5065 0.7282
0.4956 0.2021 60 0.4871 0.7405
0.5021 0.2189 65 0.4891 0.741
0.5192 0.2358 70 0.5081 0.728
0.4748 0.2526 75 0.4904 0.7352
0.4881 0.2695 80 0.4838 0.7395
0.5092 0.2863 85 0.4938 0.7345
0.4971 0.3032 90 0.4835 0.7372
0.4878 0.32 95 0.4705 0.7472
0.4762 0.3368 100 0.4720 0.7365
0.4511 0.3537 105 0.4958 0.733
0.5213 0.3705 110 0.4826 0.7412
0.4569 0.3874 115 0.4830 0.7455
0.4919 0.4042 120 0.4627 0.7498
0.4853 0.4211 125 0.4565 0.7508
0.4638 0.4379 130 0.4577 0.748
0.4941 0.4547 135 0.4549 0.75
0.4661 0.4716 140 0.4552 0.7578
0.4886 0.4884 145 0.4508 0.755
0.4433 0.5053 150 0.4468 0.7655
0.4819 0.5221 155 0.4552 0.7555
0.4794 0.5389 160 0.4604 0.7565
0.4272 0.5558 165 0.4549 0.757
0.4615 0.5726 170 0.4579 0.7612
0.4417 0.5895 175 0.4460 0.758
0.4275 0.6063 180 0.4453 0.7652
0.4303 0.6232 185 0.4468 0.7628
0.4286 0.64 190 0.4397 0.7715
0.4655 0.6568 195 0.4369 0.7675
0.386 0.6737 200 0.4416 0.7618
0.4129 0.6905 205 0.4336 0.767
0.3851 0.7074 210 0.4335 0.77
0.4516 0.7242 215 0.4339 0.7742
0.3995 0.7411 220 0.4313 0.7715
0.3488 0.7579 225 0.4322 0.7698
0.4874 0.7747 230 0.4299 0.7732
0.4217 0.7916 235 0.4288 0.7708
0.4295 0.8084 240 0.4299 0.771
0.4777 0.8253 245 0.4318 0.7678
0.4612 0.8421 250 0.4271 0.772
0.4576 0.8589 255 0.4309 0.771
0.3921 0.8758 260 0.4333 0.7722
0.4372 0.8926 265 0.4302 0.7722
0.5449 0.9095 270 0.4335 0.7695
0.4428 0.9263 275 0.4311 0.7728
0.4395 0.9432 280 0.4287 0.7745
0.4674 0.96 285 0.4262 0.776
0.4225 0.9768 290 0.4257 0.7765
0.4262 0.9937 295 0.4258 0.7762

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for graf/Llama-3.1-Tulu-3-8B-SFT-MATH-RM

Finetuned
(12)
this model