--- language: - en license: apache-2.0 library_name: transformers tags: - orpo - trl datasets: - alvarobartt/dpo-mix-7k-simplified base_model: mistralai/Mistral-7B-v0.1 pipeline_tag: text-generation inference: false --- ## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/60f0608166e5701b80ed3f02/hRyhnTySt-KQ0gnnoclSm.jpeg) > Stable Diffusion XL "A capybara, a killer whale, and a robot named Ultra being friends" This is an ORPO fine-tune of [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) with [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified). ⚠️ Note that the code is still experimental, as the `ORPOTrainer` PR is still not merged, follow its progress at [🤗`trl` - `ORPOTrainer` PR](https://github.com/huggingface/trl/pull/1435). ## Reference [`ORPO: Monolithic Preference Optimization without Reference Model`](https://huggingface.co/papers/2403.07691)