--- library_name: transformers license: bigcode-openrail-m base_model: bigcode/starcoder2-15b tags: - alignment-handbook - trl - sft - generated_from_trainer - trl - sft - generated_from_trainer datasets: - HuggingFaceH4/airoboros-3.2 - HuggingFaceH4/Code-Feedback - HuggingFaceH4/orca-math-word-problems-200k - HuggingFaceH4/SystemChat - HuggingFaceH4/capybara model-index: - name: starchat2-15b-v0.1 results: [] --- # starchat2-15b-v0.1 This model is a fine-tuned version of [bigcode/starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b) on the HuggingFaceH4/airoboros-3.2, the HuggingFaceH4/Code-Feedback, the HuggingFaceH4/orca-math-word-problems-200k, the HuggingFaceH4/SystemChat and the HuggingFaceH4/capybara datasets. It achieves the following results on the evaluation set: - Loss: 0.6601 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 32 - total_train_batch_size: 128 - total_eval_batch_size: 256 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.8402 | 0.1099 | 100 | 0.8307 | | 0.7611 | 0.2198 | 200 | 0.7793 | | 0.7361 | 0.3297 | 300 | 0.7525 | | 0.6854 | 0.4396 | 400 | 0.7337 | | 0.6926 | 0.5495 | 500 | 0.7197 | | 0.7125 | 0.6593 | 600 | 0.7097 | | 0.6662 | 0.7692 | 700 | 0.7015 | | 0.6517 | 0.8791 | 800 | 0.6937 | | 0.6234 | 0.9890 | 900 | 0.6869 | | 0.5925 | 1.0989 | 1000 | 0.6866 | | 0.585 | 1.2088 | 1100 | 0.6832 | | 0.5857 | 1.3187 | 1200 | 0.6798 | | 0.5736 | 1.4286 | 1300 | 0.6746 | | 0.5906 | 1.5385 | 1400 | 0.6723 | | 0.569 | 1.6484 | 1500 | 0.6686 | | 0.5756 | 1.7582 | 1600 | 0.6655 | | 0.545 | 1.8681 | 1700 | 0.6622 | | 0.5505 | 1.9780 | 1800 | 0.6606 | | 0.5149 | 2.0879 | 1900 | 0.6648 | | 0.5234 | 2.1978 | 2000 | 0.6638 | | 0.5239 | 2.3077 | 2100 | 0.6632 | | 0.5142 | 2.4176 | 2200 | 0.6623 | | 0.5086 | 2.5275 | 2300 | 0.6616 | | 0.4998 | 2.6374 | 2400 | 0.6604 | | 0.5029 | 2.7473 | 2500 | 0.6602 | | 0.5146 | 2.8571 | 2600 | 0.6599 | | 0.5293 | 2.9670 | 2700 | 0.6601 | ### Framework versions - Transformers 4.45.2 - Pytorch 2.5.1+rocm6.2 - Datasets 3.5.0 - Tokenizers 0.20.3