Update README.md
Browse files
README.md
CHANGED
@@ -13,4 +13,36 @@ tags:
|
|
13 |
- transformers
|
14 |
- custome
|
15 |
- chat
|
16 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
- transformers
|
14 |
- custome
|
15 |
- chat
|
16 |
+
---
|
17 |
+
# Qwen2.5-1.5B-ultrachat200k
|
18 |
+
|
19 |
+
|
20 |
+
## Model Details
|
21 |
+
|
22 |
+
- **Model type:** sft model
|
23 |
+
- **License:** Apache license 2.0
|
24 |
+
- **Finetuned from model:** [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)
|
25 |
+
- **Training data:** [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
|
26 |
+
- **Training framework:** [trl](https://github.com/huggingface/trl)
|
27 |
+
|
28 |
+
## Training Details
|
29 |
+
|
30 |
+
**Cutome training codes**
|
31 |
+
|
32 |
+
### Training Hyperparameters
|
33 |
+
`attn_implementation`: flash_attention_2 \
|
34 |
+
`bf16`: True \
|
35 |
+
`learning_rate`: 5e-5 \
|
36 |
+
`lr_scheduler_type`: cosine \
|
37 |
+
`per_device_train_batch_size`: 2 \
|
38 |
+
`gradient_accumulation_steps`: 16 \
|
39 |
+
`torch_dtype`: bfloat16 \
|
40 |
+
`num_train_epochs`: 1 \
|
41 |
+
`max_seq_length`: 2048 \
|
42 |
+
`warmup_ratio`: 0.1 \
|
43 |
+
|
44 |
+
### Results
|
45 |
+
|
46 |
+
`init_train_loss`: 1.421 \
|
47 |
+
`final_train_loss`: 1.192 \
|
48 |
+
`eval_loss`: 1.2003
|