Adding Evaluation Results

2a11eb0 verified 9 months ago

5.62 kB

	---
	license: llama2
	model-index:
	- name: speechless-mistral-7b-dare-0.85
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 63.31
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=speechlessai/speechless-mistral-7b-dare-0.85
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 84.93
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=speechlessai/speechless-mistral-7b-dare-0.85
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 64.22
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=speechlessai/speechless-mistral-7b-dare-0.85
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 50.68
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=speechlessai/speechless-mistral-7b-dare-0.85
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 79.32
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=speechlessai/speechless-mistral-7b-dare-0.85
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 19.86
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=speechlessai/speechless-mistral-7b-dare-0.85
	name: Open LLM Leaderboard
	---
	* [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/speechless-mistral-7B-dare-0.85-AWQ)
	* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/speechless-mistral-7B-dare-0.85-GPTQ)
	* [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/speechless-mistral-7B-dare-0.85-GGUF)

	Experiment for DARE(Drop and REscale), most of the delta parameters can be directly set to zeros without affecting the capabilities of SFT LMs and larger models can tolerate a higher proportion of discarded parameters.

	Merged with below DARE models.

	weight_mask_rate: 0.85 / use_weight_rescale: True / mask_stratery: random / scaling_coefficient: 1.0

	\| Model \| Average \| ARC \| HellaSwag \| MMLU \| TruthfulQA \| Winogrande \| GSM8K \|
	\| ------ \| ------ \| ------ \| ------ \| ------ \| ------ \| ------ \| ------ \|
	\| Intel/neural-chat-7b-v3-1 \| 61.59 \| 66.21 \| 83.64 \| 62.37 \| 59.65 \| 78.14 \| 19.56 \|
	\| migtissera/SynthIA-7B-v1.3 \| 59.34 \| 62.12 \| 83.45 \| 62.65 \| 51.37 \| 78.85 \| 17.59 \|
	\| bhenrym14/mistral-7b-platypus-fp16 \| 58.71 \| 63.05 \| 84.15 \| 64.11 \| 45.07 \| 78.53 \| 17.36 \|
	\| jondurbin/airoboros-m-7b-3.1.2 \| 58.75 \| 61.86 \| 83.51 \| 61.91 \| 53.75 \| 77.58 \| 13.87 \|
	\| teknium/CollectiveCognition-v1.1-Mistral-7B \| 62.92 \| 62.12 \| 84.17 \| 62.35 \| 57.62 \| 75.37 \| 15.62 \|
	\| uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b \| 62.06 \| 64.33 \| 84.4 \| 63.72 \| 52.52 \| 78.37 \| 21.38 \|
	\| \| \| \| \| \| \| \| \|
	\| speechless-mistral-7b-dare-0.85 (Merge 6 DARE models) \| 64.69 \| 63.57 \| 84.82 \| 64.29 \| 50.66 \| 79.24 \| 45.56 \|

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_speechlessai__speechless-mistral-7b-dare-0.85)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|60.39\|
	\|AI2 Reasoning Challenge (25-Shot)\|63.31\|
	\|HellaSwag (10-Shot) \|84.93\|
	\|MMLU (5-Shot) \|64.22\|
	\|TruthfulQA (0-shot) \|50.68\|
	\|Winogrande (5-shot) \|79.32\|
	\|GSM8k (5-shot) \|19.86\|