Qwen
Collection
4 items
•
Updated
•
1
Qwen2.5-7B-Instruct-preference is a fine-tuned model based on Qwen/Qwen2.5-7B-Instruct. This model is fine-tuned on original dataset. The fine-tuned were carried out at a 1024 context length.
The benchmark score is obtained using arena-hard-auto-multilingual.
Qwen2.5-7B-Instruct | Ours |
---|---|
50.0 | 56.6 |
Step | Traning Loss | Validation Loss |
---|---|---|
10 | 0.678400 | 0.665870 |
20 | 0.608500 | 0.638361 |
30 | 0.577300 | 0.607468 |
40 | 0.526700 | 0.559432 |
50 | 0.489200 | 0.523419 |
60 | 0.502800 | 0.511645 |
70 | 0.462300 | 0.506989 |
80 | 0.419600 | 0.509142 |
90 | 0.445200 | 0.510396 |
100 | 0.424400 | 0.511653 |