--- datasets: - jondurbin/gutenberg-dpo-v0.1 - Qwen/Qwen2.5-14B-Instruct - HuggingFaceH4/ultrafeedback_binarized base_model: - Qwen/Qwen2.5-14B-Instruct - v000000/Qwen2.5-14B-Gutenberg-1e-Delta - tanliboy/lambda-qwen2.5-14b-dpo-test library_name: transformers tags: - qwen - qwen2.5 - finetune - dpo - qwen2 - chat - conversational - instruct - storywriting - roleplay license: apache-2.0 language: - en pipeline_tag: text-generation --- # Qwen2.5-Lumen-14B * *Direct preference optimization finetuned for 3 epoch* ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/ccriYlPOxZLDUI-o2XZ0K.png) A qwen2.5 preference finetune, targeting prompt adherence, storywriting and roleplay. ------------------------------------------------------------------------------- ## Training Notes Trained [Qwen2.5-14B-Instruct] for 2 epochs on [jondurbin/gutenberg-dpo-v0.1] saving different checkpoints along the way. [Tanliboy](https://huggingface.co/tanliboy) trained [Qwen2.5-14B-Instruct] for 1 epoch on [HuggingFaceH4/ultrafeedback_binarized]. *Mass checkpoint merged, Based on Qwen2.5-14B-Instruct.* ## Merge * Merged with a sophosympatheia SLERP *Ultrafeedback-Binarized DPO* and *Gutenberg DPO* * Merged with a sophosympatheia SLERP *Qwen2.5-14B-Instruct* and *Gutenberg DPO* * Merged all DPO checkpoints and SLERP variations with MODEL_STOCK to analyze geometric properties and get the best of all runs/merges. ## Recipe ```yaml models: - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta - model: v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential - model: v000000/Qwen2.5-14B-Gutenberg-0.25e-Early - model: v000000/Qwen2.5-14B-Gutenberg-2e-Sequential - model: v000000/Qwen2.5-14B-Gutenberg-0.37e-Early - model: v000000/Qwen2.5-14B-Gutenberg-2e-Zeta - model: v000000/Qwen2.5-14B-Gutenberg-1e-Theta - model: tanliboy/lambda-qwen2.5-14b-dpo-test - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta - model: tanliboy/lambda-qwen2.5-14b-dpo-test - model: v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno - model: v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno base_model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta merge_method: model_stock dtype: bfloat16 ``` ### Finetune and merge This is a merge and finetune of pre-trained language models. ### Models Merged [Arxiv 2403.19522](https://arxiv.org/abs/2403.19522) The following models were included in the merge: * v000000/Qwen2.5-14B-Gutenberg-1e-Delta * v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential * v000000/Qwen2.5-14B-Gutenberg-0.25e-Early * v000000/Qwen2.5-14B-Gutenberg-2e-Sequential * v000000/Qwen2.5-14B-Gutenberg-0.37e-Early * v000000/Qwen2.5-14B-Gutenberg-2e-Zeta * v000000/Qwen2.5-14B-Gutenberg-1e-Theta * v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno * v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno * tanliboy/lambda-qwen2.5-14b-dpo-test