|
--- |
|
license: llama2 |
|
--- |
|
|
|
<h3 align="center"> |
|
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment |
|
</h3> |
|
|
|
<p align="center"> |
|
<a href="https://huggingface.co/Xwin-LM"> |
|
<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue"> |
|
</a> |
|
</p> |
|
|
|
|
|
|
|
**Step up your LLM alignment with Xwin-LM!** |
|
|
|
Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models, reject sampling, reinforcement learning, etc. Our first release, built-upon on the Llama2 base models, ranked **TOP-1** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's **the first to surpass GPT-4** on this benchmark. The project will be continuously updated. |
|
|
|
## News |
|
|
|
- :boom: [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of **95.57%** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as **TOP-1** on AlpacaEval. **It was the FIRST model surpassing GPT-4** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is **60.61**. |
|
- :boom: [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved **91.76%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 13B models. |
|
- :boom: [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved **87.82%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 7B models. |
|
|
|
|
|
## Model Card |
|
| Model | Checkpoint | Report | License | |
|
|------------|------------|-------------|------------------| |
|
|Xwin-LM-7B-V0.1| π€ <a href="https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1" target="_blank">HF Link</a> | π**Coming soon (Stay tuned)** | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License| |
|
|Xwin-LM-13B-V0.1| π€ <a href="https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1" target="_blank">HF Link</a> | | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License| |
|
|Xwin-LM-70B-V0.1| π€ <a href="https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1" target="_blank">HF Link</a> | | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License| |
|
## Benchmarks |
|
|
|
### Xwin-LM performance on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). |
|
|
|
The table below displays the performance of Xwin-LM on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), where evaluates its win-rate against Text-Davinci-003 across 805 questions. To provide a comprehensive evaluation, we present, for the first time, the win-rate against ChatGPT and GPT-4 as well. Our Xwin-LM model family establish a new state-of-the-art performance across all metrics. Notably, Xwin-LM-70B-V0.1 has eclipsed GPT-4 for the first time, achieving an impressive win-rate of **95.57%** to Text-Davinci-003 and **60.61%** to GPT-4. |
|
|
|
| **Model** | **AlpacaEval (winrate %)** | **AlpacaEval (winrate %)** |**AlpacaEval (winrate %)** | |
|
|----------------------------------|------------|----------|-------------| |
|
| | **v.s. Text-Davinci-003** | **v.s. ChatGPT** | **v.s. GPT4**| |
|
| **Xwin-LM-70B-V0.1** | **95.57** | **87.50** | **60.61** | |
|
| GPT-4 | 95.28 | 84.66 | 50.00 | |
|
| WizardLM-70B-V1.0 | 92.91 | 80.19 | 46.70 | |
|
| Llama-2-70B-Chat | 92.66 | 81.97 | 51.19 | |
|
| **Xwin-LM-13B-V0.1** | **91.76** | **81.79** | **55.30** | |
|
| ChatGPT | 89.37 | 50.00 | 16.60 | |
|
| WizardLM-13B-V1.2 | 89.17 | 75.75 | 41.91 | |
|
| **Xwin-LM-7B-V0.1** | **87.35** | **76.40** | **47.57** | |
|
| Llama-2-13B-Chat | 81.09 | 64.22 | 30.92 | |
|
|
|
## |