distilbert_sharetask-batchsize_l32
This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.3263
- F1: 0.7861
- Precision: 0.7939
- Recall: 0.7811
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | F1 | Precision | Recall |
---|---|---|---|---|---|---|
0.6579 | 1.0 | 345 | 0.6633 | 0.5997 | 0.5990 | 0.6020 |
0.6027 | 2.0 | 690 | 0.6343 | 0.6008 | 0.6641 | 0.6517 |
0.4273 | 3.0 | 1035 | 0.6290 | 0.6954 | 0.7029 | 0.6916 |
0.3212 | 4.0 | 1380 | 0.6907 | 0.7251 | 0.7275 | 0.7404 |
0.1766 | 5.0 | 1725 | 1.0390 | 0.7671 | 0.7717 | 0.7637 |
0.2259 | 6.0 | 2070 | 1.2219 | 0.7749 | 0.7742 | 0.7756 |
0.0923 | 7.0 | 2415 | 1.3263 | 0.7861 | 0.7939 | 0.7811 |
0.019 | 8.0 | 2760 | 1.4365 | 0.7758 | 0.7820 | 0.7716 |
0.0056 | 9.0 | 3105 | 1.5074 | 0.7798 | 0.7838 | 0.7768 |
0.1015 | 10.0 | 3450 | 1.5158 | 0.7806 | 0.7812 | 0.7801 |
Framework versions
- Transformers 4.47.0
- Pytorch 2.5.1+cu121
- Datasets 3.3.1
- Tokenizers 0.21.0
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support