--- license: llama2 ---

Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment

**Step up your LLM alignment with Xwin-LM!** Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models, reject sampling, reinforcement learning, etc. Our first release, built-upon on the Llama2 base models, ranked **TOP-1** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's **the first to surpass GPT-4** on this benchmark. The project will be continuously updated. ## News - :boom: [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of **95.57%** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as **TOP-1** on AlpacaEval. **It was the FIRST model surpassing GPT-4** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is **60.61**. - :boom: [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved **91.76%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 13B models. - :boom: [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved **87.82%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 7B models. ## Model Card | Model | Checkpoint | Report | License | |------------|------------|-------------|------------------| |Xwin-LM-7B-V0.1| 🤗 HF Link | 📃**Coming soon (Stay tuned)** | Llama 2 License| |Xwin-LM-13B-V0.1| 🤗 HF Link | | Llama 2 License| |Xwin-LM-70B-V0.1| 🤗 HF Link | | Llama 2 License| ## Benchmarks ### Xwin-LM performance on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). The table below displays the performance of Xwin-LM on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), where evaluates its win-rate against Text-Davinci-003 across 805 questions. To provide a comprehensive evaluation, we present, for the first time, the win-rate against ChatGPT and GPT-4 as well. Our Xwin-LM model family establish a new state-of-the-art performance across all metrics. Notably, Xwin-LM-70B-V0.1 has eclipsed GPT-4 for the first time, achieving an impressive win-rate of **95.57%** to Text-Davinci-003 and **60.61%** to GPT-4. | **Model** | **AlpacaEval (winrate %)** | **AlpacaEval (winrate %)** |**AlpacaEval (winrate %)** | |----------------------------------|------------|----------|-------------| | | **v.s. Text-Davinci-003** | **v.s. ChatGPT** | **v.s. GPT4**| | **Xwin-LM-70B-V0.1** | **95.57** | **87.50** | **60.61** | | GPT-4 | 95.28 | 84.66 | 50.00 | | WizardLM-70B-V1.0 | 92.91 | 80.19 | 46.70 | | Llama-2-70B-Chat | 92.66 | 81.97 | 51.19 | | **Xwin-LM-13B-V0.1** | **91.76** | **81.79** | **55.30** | | ChatGPT | 89.37 | 50.00 | 16.60 | | WizardLM-13B-V1.2 | 89.17 | 75.75 | 41.91 | | **Xwin-LM-7B-V0.1** | **87.35** | **76.40** | **47.57** | | Llama-2-13B-Chat | 81.09 | 64.22 | 30.92 | ##