bert-small-amharic-32k-bs256-512

This model is a fine-tuned version of yosefw/bert-small-amharic-32k-bs256 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.7523
  • Model Preparation Time: 0.0015

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 8
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Model Preparation Time
6.5635 0.1249 1038 6.3157 0.0015
4.5695 0.2498 2076 3.0682 0.0015
3.3142 0.3746 3114 3.0046 0.0015
3.2145 0.4995 4152 2.9563 0.0015
3.1627 0.6244 5190 2.9269 0.0015
3.1338 0.7493 6228 2.9018 0.0015
3.1077 0.8742 7266 2.8866 0.0015
3.0927 0.9990 8304 2.8645 0.0015
3.0652 1.1239 9342 2.8488 0.0015
3.0561 1.2488 10380 2.8409 0.0015
3.0406 1.3737 11418 2.8180 0.0015
3.0321 1.4986 12456 2.8307 0.0015
3.0236 1.6234 13494 2.8197 0.0015
3.0214 1.7483 14532 2.8094 0.0015
3.0175 1.8732 15570 2.8162 0.0015
3.0137 1.9981 16608 2.8030 0.0015
3.0069 2.1230 17646 2.7887 0.0015
2.9971 2.2478 18684 2.8004 0.0015
2.9984 2.3727 19722 2.7993 0.0015
2.9914 2.4976 20760 2.7964 0.0015
2.9955 2.6225 21798 2.7952 0.0015
2.9901 2.7474 22836 2.7855 0.0015
2.9909 2.8722 23874 2.7835 0.0015
2.9877 2.9971 24912 2.7880 0.0015
2.9854 3.1220 25950 2.7848 0.0015
2.9805 3.2469 26988 2.7963 0.0015
2.982 3.3718 28026 2.7766 0.0015
2.9791 3.4966 29064 2.7786 0.0015
2.9728 3.6215 30102 2.7843 0.0015
2.9785 3.7464 31140 2.7845 0.0015
2.9771 3.8713 32178 2.7848 0.0015
2.972 3.9962 33216 2.7849 0.0015
2.9689 4.1210 34254 2.7828 0.0015
2.9693 4.2459 35292 2.7717 0.0015
2.9703 4.3708 36330 2.7692 0.0015
2.9657 4.4957 37368 2.7813 0.0015
2.9685 4.6205 38406 2.7689 0.0015
2.9639 4.7454 39444 2.7629 0.0015
2.9645 4.8703 40482 2.7701 0.0015
2.9641 4.9952 41520 2.7744 0.0015
2.9624 5.1201 42558 2.7638 0.0015
2.962 5.2449 43596 2.7696 0.0015
2.9583 5.3698 44634 2.7597 0.0015
2.9571 5.4947 45672 2.7595 0.0015
2.9576 5.6196 46710 2.7667 0.0015
2.9607 5.7445 47748 2.7659 0.0015
2.9557 5.8693 48786 2.7637 0.0015
2.9583 5.9942 49824 2.7651 0.0015
2.9568 6.1191 50862 2.7644 0.0015
2.9521 6.2440 51900 2.7519 0.0015
2.9518 6.3689 52938 2.7613 0.0015
2.9543 6.4937 53976 2.7574 0.0015
2.9574 6.6186 55014 2.7585 0.0015
2.957 6.7435 56052 2.7580 0.0015
2.9503 6.8684 57090 2.7650 0.0015
2.9537 6.9933 58128 2.7642 0.0015
2.9463 7.1181 59166 2.7654 0.0015
2.9519 7.2430 60204 2.7536 0.0015
2.9503 7.3679 61242 2.7640 0.0015
2.9483 7.4928 62280 2.7520 0.0015
2.9478 7.6177 63318 2.7520 0.0015
2.9478 7.7425 64356 2.7560 0.0015
2.9472 7.8674 65394 2.7561 0.0015
2.9476 7.9923 66432 2.7576 0.0015

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.4.1
  • Tokenizers 0.21.1
Downloads last month
47
Safetensors
Model size
29.6M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yosefw/bert-small-amharic-32k-bs256-512

Finetuned
(1)
this model
Finetunes
2 models