Xwin-LM-13B-V0.1 / README.md

Update README.md

2411131 about 1 year ago

4.07 kB

	---
	license: llama2
	---

	<h3 align="center">
	Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
	</h3>

	<p align="center">
	<a href="https://huggingface.co/Xwin-LM">
	<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue">
	</a>
	</p>



	Step up your LLM alignment with Xwin-LM!

	Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models, reject sampling, reinforcement learning, etc. Our first release, built-upon on the Llama2 base models, ranked TOP-1 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's the first to surpass GPT-4 on this benchmark. The project will be continuously updated.

	## News

	- :boom: [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of 95.57% on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as TOP-1 on AlpacaEval. It was the FIRST model surpassing GPT-4 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is 60.61.
	- :boom: [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved 91.76% win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as top-1 among all 13B models.
	- :boom: [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved 87.82% win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as top-1 among all 7B models.


	## Model Card
	\| Model \| Checkpoint \| Report \| License \|
	\|------------\|------------\|-------------\|------------------\|
	\|Xwin-LM-7B-V0.1\| 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1" target="_blank">HF Link</a> \| 📃Coming soon (Stay tuned) \| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License\|
	\|Xwin-LM-13B-V0.1\| 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1" target="_blank">HF Link</a> \| \| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License\|
	\|Xwin-LM-70B-V0.1\| 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1" target="_blank">HF Link</a> \| \| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License\|
	## Benchmarks

	### Xwin-LM performance on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/).

	The table below displays the performance of Xwin-LM on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), where evaluates its win-rate against Text-Davinci-003 across 805 questions. To provide a comprehensive evaluation, we present, for the first time, the win-rate against ChatGPT and GPT-4 as well. Our Xwin-LM model family establish a new state-of-the-art performance across all metrics. Notably, Xwin-LM-70B-V0.1 has eclipsed GPT-4 for the first time, achieving an impressive win-rate of 95.57% to Text-Davinci-003 and 60.61% to GPT-4.

	\| Model \| AlpacaEval (winrate %) \| AlpacaEval (winrate %) \|AlpacaEval (winrate %) \|
	\|----------------------------------\|------------\|----------\|-------------\|
	\| \| v.s. Text-Davinci-003 \| v.s. ChatGPT \| v.s. GPT4\|
	\| Xwin-LM-70B-V0.1 \| 95.57 \| 87.50 \| 60.61 \|
	\| GPT-4 \| 95.28 \| 84.66 \| 50.00 \|
	\| WizardLM-70B-V1.0 \| 92.91 \| 80.19 \| 46.70 \|
	\| Llama-2-70B-Chat \| 92.66 \| 81.97 \| 51.19 \|
	\| Xwin-LM-13B-V0.1 \| 91.76 \| 81.79 \| 55.30 \|
	\| ChatGPT \| 89.37 \| 50.00 \| 16.60 \|
	\| WizardLM-13B-V1.2 \| 89.17 \| 75.75 \| 41.91 \|
	\| Xwin-LM-7B-V0.1 \| 87.35 \| 76.40 \| 47.57 \|
	\| Llama-2-13B-Chat \| 81.09 \| 64.22 \| 30.92 \|

	##

	---
	license: llama2
	---

	<h3 align="center">
	Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
	</h3>

	<p align="center">
	<a href="https://huggingface.co/Xwin-LM">
	<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue">
	</a>
	</p>



	Step up your LLM alignment with Xwin-LM!

	Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models, reject sampling, reinforcement learning, etc. Our first release, built-upon on the Llama2 base models, ranked TOP-1 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's the first to surpass GPT-4 on this benchmark. The project will be continuously updated.

	## News

	- :boom: [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of 95.57% on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as TOP-1 on AlpacaEval. It was the FIRST model surpassing GPT-4 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is 60.61.
	- :boom: [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved 91.76% win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as top-1 among all 13B models.
	- :boom: [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved 87.82% win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as top-1 among all 7B models.


	## Model Card
	\| Model \| Checkpoint \| Report \| License \|
	\|------------\|------------\|-------------\|------------------\|
	\|Xwin-LM-7B-V0.1\| 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1" target="_blank">HF Link</a> \| 📃Coming soon (Stay tuned) \| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License\|
	\|Xwin-LM-13B-V0.1\| 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1" target="_blank">HF Link</a> \| \| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License\|
	\|Xwin-LM-70B-V0.1\| 🤗 <a href="https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1" target="_blank">HF Link</a> \| \| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License\|
	## Benchmarks

	### Xwin-LM performance on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/).

	The table below displays the performance of Xwin-LM on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), where evaluates its win-rate against Text-Davinci-003 across 805 questions. To provide a comprehensive evaluation, we present, for the first time, the win-rate against ChatGPT and GPT-4 as well. Our Xwin-LM model family establish a new state-of-the-art performance across all metrics. Notably, Xwin-LM-70B-V0.1 has eclipsed GPT-4 for the first time, achieving an impressive win-rate of 95.57% to Text-Davinci-003 and 60.61% to GPT-4.

	\| Model \| AlpacaEval (winrate %) \| AlpacaEval (winrate %) \|AlpacaEval (winrate %) \|
	\|----------------------------------\|------------\|----------\|-------------\|
	\| \| v.s. Text-Davinci-003 \| v.s. ChatGPT \| v.s. GPT4\|
	\| Xwin-LM-70B-V0.1 \| 95.57 \| 87.50 \| 60.61 \|
	\| GPT-4 \| 95.28 \| 84.66 \| 50.00 \|
	\| WizardLM-70B-V1.0 \| 92.91 \| 80.19 \| 46.70 \|
	\| Llama-2-70B-Chat \| 92.66 \| 81.97 \| 51.19 \|
	\| Xwin-LM-13B-V0.1 \| 91.76 \| 81.79 \| 55.30 \|
	\| ChatGPT \| 89.37 \| 50.00 \| 16.60 \|
	\| WizardLM-13B-V1.2 \| 89.17 \| 75.75 \| 41.91 \|
	\| Xwin-LM-7B-V0.1 \| 87.35 \| 76.40 \| 47.57 \|
	\| Llama-2-13B-Chat \| 81.09 \| 64.22 \| 30.92 \|

	##