de_childes_42

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6072

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.5021 2000 7.5052
7.4063 3.0041 4000 6.4338
7.4063 4.5062 6000 6.3051
6.0672 6.0083 8000 6.1663
6.0672 7.5103 10000 6.0952
5.8691 9.0124 12000 6.0122
5.8691 10.5145 14000 5.9512
5.7209 12.0165 16000 5.8691
5.7209 13.5186 18000 5.8529
5.6105 15.0207 20000 5.7965
5.6105 16.5227 22000 5.7404
5.5302 18.0248 24000 5.7424
5.5302 19.5268 26000 5.7256
5.4587 21.0289 28000 5.6831
5.4587 22.5310 30000 5.3966
5.0899 24.0330 32000 4.7042
5.0899 25.5351 34000 4.2317
4.0988 27.0372 36000 3.9093
4.0988 28.5494 38000 3.7496
3.555 30.0514 40000 3.5961
3.555 31.5535 42000 3.4542
3.2522 33.0556 44000 3.3300
3.2522 34.5576 46000 3.2830
3.0484 36.0597 48000 3.1864
3.0484 37.5618 50000 3.1189
2.9026 39.0638 52000 3.0475
2.9026 40.5659 54000 2.9933
2.7874 42.0680 56000 2.9411
2.7874 43.5700 58000 2.9355
2.7001 45.0721 60000 2.8913
2.7001 46.5742 62000 2.8601
2.6298 48.0762 64000 2.8227
2.6298 49.5783 66000 2.8202
2.5722 51.0804 68000 2.7874
2.5722 52.5824 70000 2.7716
2.523 54.0845 72000 2.7363
2.523 55.5866 74000 2.7212
2.4788 57.0886 76000 2.6944
2.4788 58.5907 78000 2.6761
2.4466 60.0928 80000 2.6705
2.4466 61.5948 82000 2.6551
2.4122 63.0969 84000 2.6368
2.4122 64.5989 86000 2.6424
2.3832 66.1010 88000 2.6345
2.3832 67.6031 90000 2.6295
2.3592 69.1051 92000 2.6247
2.3592 70.6072 94000 2.6303
2.347 72.1093 96000 2.5866
2.347 73.6113 98000 2.6067
2.3336 75.1134 100000 2.6072

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
1
Safetensors
Model size
14.9M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support