swe_30k_v2_tag5
This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the swe_30k_v2_tag5 dataset. It achieves the following results on the evaluation set:
- Loss: 0.4522
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 4
- total_eval_batch_size: 4
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.6812 | 0.0524 | 100 | 0.5303 |
0.664 | 0.1049 | 200 | 0.5222 |
0.6851 | 0.1573 | 300 | 0.5144 |
0.6637 | 0.2098 | 400 | 0.5085 |
0.5823 | 0.2622 | 500 | 0.4992 |
0.6342 | 0.3146 | 600 | 0.4874 |
0.5819 | 0.3671 | 700 | 0.4845 |
0.5393 | 0.4195 | 800 | 0.4796 |
0.7043 | 0.4719 | 900 | 0.4728 |
0.4485 | 0.5244 | 1000 | 0.4708 |
0.606 | 0.5768 | 1100 | 0.4642 |
0.521 | 0.6293 | 1200 | 0.4612 |
0.542 | 0.6817 | 1300 | 0.4597 |
0.5452 | 0.7341 | 1400 | 0.4562 |
0.5425 | 0.7866 | 1500 | 0.4558 |
0.5805 | 0.8390 | 1600 | 0.4525 |
0.5275 | 0.8915 | 1700 | 0.4524 |
0.5267 | 0.9439 | 1800 | 0.4526 |
0.5343 | 0.9963 | 1900 | 0.4521 |
Framework versions
- Transformers 4.46.1
- Pytorch 2.6.0+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support