Adding Evaluation Results

be350a3 verified 9 months ago

6.03 kB

	---
	language:
	- zh
	- en
	license: apache-2.0
	model-index:
	- name: chinese-alpaca-2-13b
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 58.7
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ziqingyang/chinese-alpaca-2-13b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 79.74
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ziqingyang/chinese-alpaca-2-13b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 55.1
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ziqingyang/chinese-alpaca-2-13b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 50.22
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ziqingyang/chinese-alpaca-2-13b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 75.69
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ziqingyang/chinese-alpaca-2-13b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 10.46
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ziqingyang/chinese-alpaca-2-13b
	name: Open LLM Leaderboard
	---

	# Chinese-Alpaca-2-13B

	This is the full Chinese-Alpaca-2-13B model，which can be loaded directly for inference and full-parameter training.

	Related models👇
	* Long context base models
	* [Chinese-LLaMA-2-7B-16K (full model)](https://huggingface.co/hfl/chinese-llama-2-7b-16k)
	* [Chinese-LLaMA-2-LoRA-7B-16K (LoRA model)](https://huggingface.co/hfl/chinese-llama-2-lora-7b-16k)
	* [Chinese-LLaMA-2-13B-16K (full model)](https://huggingface.co/hfl/chinese-llama-2-13b-16k)
	* [Chinese-LLaMA-2-LoRA-13B-16K (LoRA model)](https://huggingface.co/hfl/chinese-llama-2-lora-13b-16k)
	* Base models
	* [Chinese-LLaMA-2-7B (full model)](https://huggingface.co/hfl/chinese-llama-2-7b)
	* [Chinese-LLaMA-2-LoRA-7B (LoRA model)](https://huggingface.co/hfl/chinese-llama-2-lora-7b)
	* [Chinese-LLaMA-2-13B (full model)](https://huggingface.co/hfl/chinese-llama-2-13b)
	* [Chinese-LLaMA-2-LoRA-13B (LoRA model)](https://huggingface.co/hfl/chinese-llama-2-lora-13b)
	* Instruction/Chat models
	* [Chinese-Alpaca-2-7B (full model)](https://huggingface.co/hfl/chinese-alpaca-2-7b)
	* [Chinese-Alpaca-2-LoRA-7B (LoRA model)](https://huggingface.co/hfl/chinese-alpaca-2-lora-7b)
	* [Chinese-Alpaca-2-13B (full model)](https://huggingface.co/hfl/chinese-alpaca-2-13b)
	* [Chinese-Alpaca-2-LoRA-13B (LoRA model)](https://huggingface.co/hfl/chinese-alpaca-2-lora-13b)


	# Description of Chinese-LLaMA-Alpaca-2
	This project is based on the Llama-2, released by Meta, and it is the second generation of the Chinese LLaMA & Alpaca LLM project. We open-source Chinese LLaMA-2 (foundation model) and Alpaca-2 (instruction-following model). These models have been expanded and optimized with Chinese vocabulary beyond the original Llama-2. We used large-scale Chinese data for incremental pre-training, which further improved the fundamental semantic understanding of the Chinese language, resulting in a significant performance improvement compared to the first-generation models. The relevant models support a 4K context and can be expanded up to 18K+ using the NTK method.

	The main contents of this project include:

	* 🚀 New extended Chinese vocabulary beyond Llama-2, open-sourcing the Chinese LLaMA-2 and Alpaca-2 LLMs.
	* 🚀 Open-sourced the pre-training and instruction finetuning (SFT) scripts for further tuning on user's data
	* 🚀 Quickly deploy and experience the quantized LLMs on CPU/GPU of personal PC
	* 🚀 Support for LLaMA ecosystems like 🤗transformers, llama.cpp, text-generation-webui, LangChain, vLLM etc.

	Please refer to [https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/) for details.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ziqingyang__chinese-alpaca-2-13b)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|54.99\|
	\|AI2 Reasoning Challenge (25-Shot)\|58.70\|
	\|HellaSwag (10-Shot) \|79.74\|
	\|MMLU (5-Shot) \|55.10\|
	\|TruthfulQA (0-shot) \|50.22\|
	\|Winogrande (5-shot) \|75.69\|
	\|GSM8k (5-shot) \|10.46\|