opt-babylm2-clean-spacy-earlystop_no-multi-adj-strict-bpe_seed-1024_1e-3
This model was trained from scratch on the kanishka/babylm2-clean-spacy_no-multi-adj-strict dataset. It achieves the following results on the evaluation set:
- Loss: 2.6983
- Accuracy: 0.4771
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 1024
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
4.0297 | 1.0 | 2152 | 3.8585 | 0.3570 |
3.3906 | 2.0 | 4304 | 3.3435 | 0.4046 |
3.0817 | 3.0 | 6456 | 3.1232 | 0.4269 |
2.9257 | 4.0 | 8608 | 3.0129 | 0.4385 |
2.8295 | 5.0 | 10760 | 2.9414 | 0.4454 |
2.7596 | 6.0 | 12912 | 2.9054 | 0.4488 |
2.7142 | 7.0 | 15064 | 2.8777 | 0.4523 |
2.6808 | 8.0 | 17216 | 2.8535 | 0.4549 |
2.6562 | 9.0 | 19368 | 2.8391 | 0.4567 |
2.6344 | 10.0 | 21520 | 2.8294 | 0.4577 |
2.6175 | 11.0 | 23672 | 2.8214 | 0.4591 |
2.601 | 12.0 | 25824 | 2.8116 | 0.4598 |
2.5881 | 13.0 | 27976 | 2.8084 | 0.4603 |
2.5985 | 14.0 | 30128 | 2.8016 | 0.4608 |
2.5886 | 15.0 | 32280 | 2.7969 | 0.4613 |
2.5566 | 16.0 | 34432 | 2.7678 | 0.4649 |
2.5056 | 17.0 | 36584 | 2.7411 | 0.4693 |
2.4459 | 18.0 | 38736 | 2.7160 | 0.4727 |
2.3774 | 19.0 | 40888 | 2.7009 | 0.4754 |
2.2986 | 19.9911 | 43020 | 2.6983 | 0.4771 |
Framework versions
- Transformers 4.48.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.1
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Dataset used to train kanishka/opt-babylm2-clean-spacy-earlystop_no-multi-adj-strict-bpe_seed-1024_1e-3
Evaluation results
- Accuracy on kanishka/babylm2-clean-spacy_no-multi-adj-strictself-reported0.477