|
--- |
|
license: mit |
|
datasets: |
|
- taskydata/tasky_or_not |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
- accuracy |
|
- recall |
|
- precision |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
**Hyperparameters:** |
|
|
|
- learning rate: 2e-5 |
|
- weight decay: 0.01 |
|
- per_device_train_batch_size: 16 |
|
- per_device_eval_batch_size: 16 |
|
- gradient_accumulation_steps:1 |
|
- eval steps: 50000 |
|
- max_length: 512 |
|
- num_epochs: 1 |
|
- hidden_dropout_prob: 0.3 |
|
- attention_probs_dropout_prob: 0.25 |
|
|
|
**Dataset version:** |
|
- taskydata/tasky_or_not/v_1 |
|
|
|
**Checkpoint:** |
|
|
|
- 455000 steps. |
|
|
|
**Results on Validation set:** |
|
|
|
| **Step** | **Training Loss** | **Validation Loss** | **Accuracy** | **Precision** | **Recall** | **F1** | |
|
|:--------:|:-----------------:|:-------------------:|:------------:|:-------------:|:----------:|:--------:| |
|
| 50000 | 0.0148 | 0.10890 | 0.9798 | 0.9755 | 0.9843 | 0.9799 | |
|
| 100000 | 0.0121 | 0.09090 | 0.9863 | 0.9958 | 0.9767 | 0.9862 | |
|
| 150000 | 0.0080 | 0.11800 | 0.9863 | 0.9779 | 0.9950 | 0.9864 | |
|
| 200000 | 0.0116 | 0.08965 | 0.9877 | 0.9905 | 0.9848 | 0.9876 | |
|
| 250000 | 0.0073 | 3.50100 | 0.6507 | 0.5905 | 0.9830 | 0.7378 | |
|
| 300000 | 0.0072 | 0.09807 | 0.9850 | 0.9863 | 0.9870 | 0.9849 | |
|
| 350000 | 0.0053 | 0.09830 | 0.9854 | 0.9939 | 0.9870 | 0.9852 | |
|
| 400000 | 0.0046 | 0.08130 | 0.9893 | 0.9957 | 0.9828 | 0.9892 | |
|
| 450000 | 0.0054 | 0.61280 | 0.9095 | 0.5835 | 0.9888 | 0.9162 | |
|
| 455000 | 0.0055 | 0.15790 | 0.9710 | 0.9561 | 0.9874 | 0.9715 | |
|
|
|
|
|
**Uploaded Checkpoint:** |
|
- 400000 |
|
|