Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -13,4 +13,36 @@ tags:
 - transformers
 - custome
 - chat
----

 - transformers
 - custome
 - chat
+---
+# Qwen2.5-1.5B-ultrachat200k
+## Model Details
+- **Model type:** sft model
+- **License:** Apache license 2.0
+- **Finetuned from model:** [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)
+- **Training data:** [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
+- **Training framework:** [trl](https://github.com/huggingface/trl)
+## Training Details
+**Cutome training codes**
+### Training Hyperparameters
+`attn_implementation`: flash_attention_2 \
+`bf16`: True \
+`learning_rate`: 5e-5 \
+`lr_scheduler_type`: cosine \
+`per_device_train_batch_size`: 2 \
+`gradient_accumulation_steps`: 16 \
+`torch_dtype`: bfloat16 \
+`num_train_epochs`: 1 \
+`max_seq_length`: 2048 \
+`warmup_ratio`: 0.1 \
+### Results
+`init_train_loss`: 1.421 \
+`final_train_loss`: 1.192 \
+`eval_loss`: 1.2003