berel_finetuned_on_shuffled_HB_5_epochs_general
This model is a fine-tuned version of dicta-il/BEREL on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.1112
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
3.2835 | 0.0568 | 500 | 2.9421 |
3.3772 | 0.1136 | 1000 | 2.9866 |
3.3727 | 0.1703 | 1500 | 2.9256 |
3.2209 | 0.2271 | 2000 | 2.9915 |
3.2895 | 0.2839 | 2500 | 2.9442 |
3.3112 | 0.3407 | 3000 | nan |
3.2918 | 0.3975 | 3500 | nan |
3.3729 | 0.4542 | 4000 | 2.9266 |
3.1557 | 0.5110 | 4500 | 2.9193 |
3.1619 | 0.5678 | 5000 | 2.9046 |
3.1945 | 0.6246 | 5500 | 2.8833 |
3.1312 | 0.6814 | 6000 | 2.9394 |
3.1166 | 0.7381 | 6500 | 2.8623 |
3.165 | 0.7949 | 7000 | 2.8393 |
3.0928 | 0.8517 | 7500 | 2.8569 |
3.2128 | 0.9085 | 8000 | 2.8377 |
3.1341 | 0.9653 | 8500 | 2.8303 |
3.0263 | 1.0220 | 9000 | 2.8088 |
2.933 | 1.0788 | 9500 | 2.8215 |
2.9292 | 1.1356 | 10000 | 2.8168 |
2.8925 | 1.1924 | 10500 | 2.7949 |
2.9547 | 1.2491 | 11000 | 2.7194 |
2.9336 | 1.3059 | 11500 | 2.7711 |
2.9214 | 1.3627 | 12000 | 2.7376 |
2.8601 | 1.4195 | 12500 | 2.7111 |
2.9139 | 1.4763 | 13000 | 2.6846 |
2.8249 | 1.5330 | 13500 | 2.6879 |
2.9928 | 1.5898 | 14000 | 2.6686 |
2.8314 | 1.6466 | 14500 | 2.6756 |
2.8097 | 1.7034 | 15000 | 2.6882 |
2.823 | 1.7602 | 15500 | 2.6336 |
2.7235 | 1.8169 | 16000 | 2.6166 |
2.7604 | 1.8737 | 16500 | 2.5789 |
2.7409 | 1.9305 | 17000 | 2.6046 |
2.7862 | 1.9873 | 17500 | 2.5653 |
2.6522 | 2.0441 | 18000 | 2.6171 |
2.6612 | 2.1008 | 18500 | 2.5746 |
2.6138 | 2.1576 | 19000 | 2.5711 |
2.6186 | 2.2144 | 19500 | 2.5215 |
2.5693 | 2.2712 | 20000 | 2.5314 |
2.5585 | 2.3280 | 20500 | 2.5588 |
2.6205 | 2.3847 | 21000 | 2.4909 |
2.6314 | 2.4415 | 21500 | 2.5318 |
2.5944 | 2.4983 | 22000 | nan |
2.5221 | 2.5551 | 22500 | 2.4318 |
2.5108 | 2.6119 | 23000 | 2.4725 |
2.574 | 2.6686 | 23500 | 2.4445 |
2.5431 | 2.7254 | 24000 | 2.4214 |
2.4977 | 2.7822 | 24500 | 2.4160 |
2.4943 | 2.8390 | 25000 | 2.3928 |
2.5035 | 2.8958 | 25500 | 2.3815 |
2.5333 | 2.9525 | 26000 | 2.3599 |
2.3777 | 3.0093 | 26500 | 2.4151 |
2.3205 | 3.0661 | 27000 | 2.3583 |
2.3466 | 3.1229 | 27500 | 2.3626 |
2.2953 | 3.1797 | 28000 | 2.3573 |
2.3163 | 3.2364 | 28500 | 2.3592 |
2.3055 | 3.2932 | 29000 | 2.3623 |
2.3273 | 3.3500 | 29500 | 2.2868 |
2.3209 | 3.4068 | 30000 | 2.2934 |
2.3117 | 3.4635 | 30500 | 2.2482 |
2.2615 | 3.5203 | 31000 | 2.2738 |
2.3249 | 3.5771 | 31500 | 2.2386 |
2.3591 | 3.6339 | 32000 | 2.2758 |
2.3237 | 3.6907 | 32500 | 2.2356 |
2.2351 | 3.7474 | 33000 | 2.2332 |
2.257 | 3.8042 | 33500 | 2.2135 |
2.1823 | 3.8610 | 34000 | 2.1985 |
2.2236 | 3.9178 | 34500 | 2.2021 |
2.198 | 3.9746 | 35000 | 2.2069 |
2.128 | 4.0313 | 35500 | 2.2150 |
2.1297 | 4.0881 | 36000 | 2.1890 |
2.102 | 4.1449 | 36500 | 2.1949 |
2.085 | 4.2017 | 37000 | 2.1873 |
2.0684 | 4.2585 | 37500 | 2.1404 |
2.0664 | 4.3152 | 38000 | 2.1746 |
2.1125 | 4.3720 | 38500 | 2.1720 |
1.9952 | 4.4288 | 39000 | 2.1245 |
2.0541 | 4.4856 | 39500 | 2.1524 |
2.0807 | 4.5424 | 40000 | 2.1696 |
2.1062 | 4.5991 | 40500 | 2.1358 |
2.0677 | 4.6559 | 41000 | 2.1094 |
2.0119 | 4.7127 | 41500 | 2.1044 |
2.0637 | 4.7695 | 42000 | 2.1001 |
2.0567 | 4.8263 | 42500 | 2.0984 |
2.0102 | 4.8830 | 43000 | 2.1232 |
2.0036 | 4.9398 | 43500 | 2.1187 |
2.0 | 4.9966 | 44000 | 2.1112 |
Framework versions
- Transformers 4.47.1
- Pytorch 2.5.1+cu118
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for martijn75/berel_finetuned_on_shuffled_HB_5_epochs_general
Base model
dicta-il/BEREL