---
license: llama2
---
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
**Step up your LLM alignment with Xwin-LM!**
Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models, reject sampling, reinforcement learning, etc. Our first release, built-upon on the Llama2 base models, ranked **TOP-1** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's **the first to surpass GPT-4** on this benchmark. The project will be continuously updated.
## News
- :boom: [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of **95.57%** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as **TOP-1** on AlpacaEval. **It was the FIRST model surpassing GPT-4** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is **60.61**.
- :boom: [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved **91.76%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 13B models.
- :boom: [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved **87.82%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 7B models.
## Model Card
| Model | Checkpoint | Report | License |
|------------|------------|-------------|------------------|
|Xwin-LM-7B-V0.1| 🤗 HF Link | 📃**Coming soon (Stay tuned)** | Llama 2 License|
|Xwin-LM-13B-V0.1| 🤗 HF Link | | Llama 2 License|
|Xwin-LM-70B-V0.1| 🤗 HF Link | | Llama 2 License|
## Benchmarks
### Xwin-LM performance on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/).
The table below displays the performance of Xwin-LM on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), where evaluates its win-rate against Text-Davinci-003 across 805 questions. To provide a comprehensive evaluation, we present, for the first time, the win-rate against ChatGPT and GPT-4 as well. Our Xwin-LM model family establish a new state-of-the-art performance across all metrics. Notably, Xwin-LM-70B-V0.1 has eclipsed GPT-4 for the first time, achieving an impressive win-rate of **95.57%** to Text-Davinci-003 and **60.61%** to GPT-4.
| **Model** | **AlpacaEval (winrate %)** | **AlpacaEval (winrate %)** |**AlpacaEval (winrate %)** |
|----------------------------------|------------|----------|-------------|
| | **v.s. Text-Davinci-003** | **v.s. ChatGPT** | **v.s. GPT4**|
| **Xwin-LM-70B-V0.1** | **95.57** | **87.50** | **60.61** |
| GPT-4 | 95.28 | 84.66 | 50.00 |
| WizardLM-70B-V1.0 | 92.91 | 80.19 | 46.70 |
| Llama-2-70B-Chat | 92.66 | 81.97 | 51.19 |
| **Xwin-LM-13B-V0.1** | **91.76** | **81.79** | **55.30** |
| ChatGPT | 89.37 | 50.00 | 16.60 |
| WizardLM-13B-V1.2 | 89.17 | 75.75 | 41.91 |
| **Xwin-LM-7B-V0.1** | **87.35** | **76.40** | **47.57** |
| Llama-2-13B-Chat | 81.09 | 64.22 | 30.92 |
##