bert-small-amharic-32k-bs256-512

This model is a fine-tuned version of yosefw/bert-small-amharic-32k-bs256 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.7523
Model Preparation Time: 0.0015

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
lr_scheduler_warmup_steps: 1000
num_epochs: 8
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time
6.5635	0.1249	1038	6.3157	0.0015
4.5695	0.2498	2076	3.0682	0.0015
3.3142	0.3746	3114	3.0046	0.0015
3.2145	0.4995	4152	2.9563	0.0015
3.1627	0.6244	5190	2.9269	0.0015
3.1338	0.7493	6228	2.9018	0.0015
3.1077	0.8742	7266	2.8866	0.0015
3.0927	0.9990	8304	2.8645	0.0015
3.0652	1.1239	9342	2.8488	0.0015
3.0561	1.2488	10380	2.8409	0.0015
3.0406	1.3737	11418	2.8180	0.0015
3.0321	1.4986	12456	2.8307	0.0015
3.0236	1.6234	13494	2.8197	0.0015
3.0214	1.7483	14532	2.8094	0.0015
3.0175	1.8732	15570	2.8162	0.0015
3.0137	1.9981	16608	2.8030	0.0015
3.0069	2.1230	17646	2.7887	0.0015
2.9971	2.2478	18684	2.8004	0.0015
2.9984	2.3727	19722	2.7993	0.0015
2.9914	2.4976	20760	2.7964	0.0015
2.9955	2.6225	21798	2.7952	0.0015
2.9901	2.7474	22836	2.7855	0.0015
2.9909	2.8722	23874	2.7835	0.0015
2.9877	2.9971	24912	2.7880	0.0015
2.9854	3.1220	25950	2.7848	0.0015
2.9805	3.2469	26988	2.7963	0.0015
2.982	3.3718	28026	2.7766	0.0015
2.9791	3.4966	29064	2.7786	0.0015
2.9728	3.6215	30102	2.7843	0.0015
2.9785	3.7464	31140	2.7845	0.0015
2.9771	3.8713	32178	2.7848	0.0015
2.972	3.9962	33216	2.7849	0.0015
2.9689	4.1210	34254	2.7828	0.0015
2.9693	4.2459	35292	2.7717	0.0015
2.9703	4.3708	36330	2.7692	0.0015
2.9657	4.4957	37368	2.7813	0.0015
2.9685	4.6205	38406	2.7689	0.0015
2.9639	4.7454	39444	2.7629	0.0015
2.9645	4.8703	40482	2.7701	0.0015
2.9641	4.9952	41520	2.7744	0.0015
2.9624	5.1201	42558	2.7638	0.0015
2.962	5.2449	43596	2.7696	0.0015
2.9583	5.3698	44634	2.7597	0.0015
2.9571	5.4947	45672	2.7595	0.0015
2.9576	5.6196	46710	2.7667	0.0015
2.9607	5.7445	47748	2.7659	0.0015
2.9557	5.8693	48786	2.7637	0.0015
2.9583	5.9942	49824	2.7651	0.0015
2.9568	6.1191	50862	2.7644	0.0015
2.9521	6.2440	51900	2.7519	0.0015
2.9518	6.3689	52938	2.7613	0.0015
2.9543	6.4937	53976	2.7574	0.0015
2.9574	6.6186	55014	2.7585	0.0015
2.957	6.7435	56052	2.7580	0.0015
2.9503	6.8684	57090	2.7650	0.0015
2.9537	6.9933	58128	2.7642	0.0015
2.9463	7.1181	59166	2.7654	0.0015
2.9519	7.2430	60204	2.7536	0.0015
2.9503	7.3679	61242	2.7640	0.0015
2.9483	7.4928	62280	2.7520	0.0015
2.9478	7.6177	63318	2.7520	0.0015
2.9478	7.7425	64356	2.7560	0.0015
2.9472	7.8674	65394	2.7561	0.0015
2.9476	7.9923	66432	2.7576	0.0015

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.4.1
Tokenizers 0.21.1

yosefw
/

bert-small-amharic-32k-bs256-512

bert-small-amharic-32k-bs256-512

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yosefw/bert-small-amharic-32k-bs256-512

Evaluation results