Hyperparameters:
- learning rate: 2e-5
- weight decay: 0.01
- per_device_train_batch_size: 8
- per_device_eval_batch_size: 8
- gradient_accumulation_steps:1
- eval steps: 6000
- max_length: 512
- num_epochs: 2
Dataset version:
- “craffel/tasky_or_not”, “10xp3_10xc4”, “15f88c8”
Checkpoint:
- 48000 steps
Results on Validation set:
Step | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|---|
6000 | 0.031900 | 0.163412 | 0.982194 | 0.999211 | 0.980462 | 0.989748 |
12000 | 0.014700 | 0.106132 | 0.976666 | 0.999639 | 0.973733 | 0.986516 |
18000 | 0.010700 | 0.043012 | 0.995743 | 0.999223 | 0.995918 | 0.997568 |
24000 | 0.007400 | 0.095047 | 0.984724 | 0.999857 | 0.982714 | 0.991211 |
30000 | 0.004100 | 0.087274 | 0.990400 | 0.999829 | 0.989217 | 0.994495 |
36000 | 0.003100 | 0.162909 | 0.981972 | 1.000000 | 0.979434 | 0.989610 |
42000 | 0.002200 | 0.148721 | 0.980454 | 0.999986 | 0.977717 | 0.988726 |
48000 | 0.001000 | 0.094455 | 0.990437 | 0.999943 | 0.989147 | 0.994516 |
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.