bert-small-amharic-32k-bs256-512
This model is a fine-tuned version of yosefw/bert-small-amharic-32k-bs256 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.7523
- Model Preparation Time: 0.0015
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- lr_scheduler_warmup_steps: 1000
- num_epochs: 8
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Model Preparation Time |
---|---|---|---|---|
6.5635 | 0.1249 | 1038 | 6.3157 | 0.0015 |
4.5695 | 0.2498 | 2076 | 3.0682 | 0.0015 |
3.3142 | 0.3746 | 3114 | 3.0046 | 0.0015 |
3.2145 | 0.4995 | 4152 | 2.9563 | 0.0015 |
3.1627 | 0.6244 | 5190 | 2.9269 | 0.0015 |
3.1338 | 0.7493 | 6228 | 2.9018 | 0.0015 |
3.1077 | 0.8742 | 7266 | 2.8866 | 0.0015 |
3.0927 | 0.9990 | 8304 | 2.8645 | 0.0015 |
3.0652 | 1.1239 | 9342 | 2.8488 | 0.0015 |
3.0561 | 1.2488 | 10380 | 2.8409 | 0.0015 |
3.0406 | 1.3737 | 11418 | 2.8180 | 0.0015 |
3.0321 | 1.4986 | 12456 | 2.8307 | 0.0015 |
3.0236 | 1.6234 | 13494 | 2.8197 | 0.0015 |
3.0214 | 1.7483 | 14532 | 2.8094 | 0.0015 |
3.0175 | 1.8732 | 15570 | 2.8162 | 0.0015 |
3.0137 | 1.9981 | 16608 | 2.8030 | 0.0015 |
3.0069 | 2.1230 | 17646 | 2.7887 | 0.0015 |
2.9971 | 2.2478 | 18684 | 2.8004 | 0.0015 |
2.9984 | 2.3727 | 19722 | 2.7993 | 0.0015 |
2.9914 | 2.4976 | 20760 | 2.7964 | 0.0015 |
2.9955 | 2.6225 | 21798 | 2.7952 | 0.0015 |
2.9901 | 2.7474 | 22836 | 2.7855 | 0.0015 |
2.9909 | 2.8722 | 23874 | 2.7835 | 0.0015 |
2.9877 | 2.9971 | 24912 | 2.7880 | 0.0015 |
2.9854 | 3.1220 | 25950 | 2.7848 | 0.0015 |
2.9805 | 3.2469 | 26988 | 2.7963 | 0.0015 |
2.982 | 3.3718 | 28026 | 2.7766 | 0.0015 |
2.9791 | 3.4966 | 29064 | 2.7786 | 0.0015 |
2.9728 | 3.6215 | 30102 | 2.7843 | 0.0015 |
2.9785 | 3.7464 | 31140 | 2.7845 | 0.0015 |
2.9771 | 3.8713 | 32178 | 2.7848 | 0.0015 |
2.972 | 3.9962 | 33216 | 2.7849 | 0.0015 |
2.9689 | 4.1210 | 34254 | 2.7828 | 0.0015 |
2.9693 | 4.2459 | 35292 | 2.7717 | 0.0015 |
2.9703 | 4.3708 | 36330 | 2.7692 | 0.0015 |
2.9657 | 4.4957 | 37368 | 2.7813 | 0.0015 |
2.9685 | 4.6205 | 38406 | 2.7689 | 0.0015 |
2.9639 | 4.7454 | 39444 | 2.7629 | 0.0015 |
2.9645 | 4.8703 | 40482 | 2.7701 | 0.0015 |
2.9641 | 4.9952 | 41520 | 2.7744 | 0.0015 |
2.9624 | 5.1201 | 42558 | 2.7638 | 0.0015 |
2.962 | 5.2449 | 43596 | 2.7696 | 0.0015 |
2.9583 | 5.3698 | 44634 | 2.7597 | 0.0015 |
2.9571 | 5.4947 | 45672 | 2.7595 | 0.0015 |
2.9576 | 5.6196 | 46710 | 2.7667 | 0.0015 |
2.9607 | 5.7445 | 47748 | 2.7659 | 0.0015 |
2.9557 | 5.8693 | 48786 | 2.7637 | 0.0015 |
2.9583 | 5.9942 | 49824 | 2.7651 | 0.0015 |
2.9568 | 6.1191 | 50862 | 2.7644 | 0.0015 |
2.9521 | 6.2440 | 51900 | 2.7519 | 0.0015 |
2.9518 | 6.3689 | 52938 | 2.7613 | 0.0015 |
2.9543 | 6.4937 | 53976 | 2.7574 | 0.0015 |
2.9574 | 6.6186 | 55014 | 2.7585 | 0.0015 |
2.957 | 6.7435 | 56052 | 2.7580 | 0.0015 |
2.9503 | 6.8684 | 57090 | 2.7650 | 0.0015 |
2.9537 | 6.9933 | 58128 | 2.7642 | 0.0015 |
2.9463 | 7.1181 | 59166 | 2.7654 | 0.0015 |
2.9519 | 7.2430 | 60204 | 2.7536 | 0.0015 |
2.9503 | 7.3679 | 61242 | 2.7640 | 0.0015 |
2.9483 | 7.4928 | 62280 | 2.7520 | 0.0015 |
2.9478 | 7.6177 | 63318 | 2.7520 | 0.0015 |
2.9478 | 7.7425 | 64356 | 2.7560 | 0.0015 |
2.9472 | 7.8674 | 65394 | 2.7561 | 0.0015 |
2.9476 | 7.9923 | 66432 | 2.7576 | 0.0015 |
Framework versions
- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 3.4.1
- Tokenizers 0.21.1
- Downloads last month
- 47
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support