File size: 8,077 Bytes

---
language:
- en
license: apache-2.0
model-index:
- name: WestSeverus-7B-DPO-v2
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 71.42
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=FelixChao/WestSeverus-7B-DPO-v2
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 88.27
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=FelixChao/WestSeverus-7B-DPO-v2
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 64.79
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=FelixChao/WestSeverus-7B-DPO-v2
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 72.37
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=FelixChao/WestSeverus-7B-DPO-v2
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 83.27
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=FelixChao/WestSeverus-7B-DPO-v2
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 71.65
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=FelixChao/WestSeverus-7B-DPO-v2
      name: Open LLM Leaderboard
---
# WestSeverus - 7B - DPO - v2

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64a53b0747a04f0512941b6f/-_CvSGuu-kQ1GDNzVMYjg.png)

## ☘️ Model Description

WestSeverus-7B-DPO-v2 is a WestLake Family model trained over [WestSeverus-7B](https://huggingface.co/FelixChao/WestSeverus-7B).

The model was trained on several dpo datasets and it can perform well on basic math problem.

WestSeverus-7B-DPO-v2 can be used in mathematics, chemical, physics and even coding for further research and reference.

# 📖 Table of Contents
1. [Nous Benchmark Results](#🪄-nous-benchmark-results)
    - AGIEval
    - GPT4All
    - TruthfulQA Scores
    - BigBench

2. [Open LLM Leaderboard](#🏆-open-llm-leaderboard)
    - ARC
    - HellaSwag
    - MMLU
    - TruthfulQA
    - Winogrande
    - GSM8K
3. [EvalPlus Leaderboard](#⚡-evalplus-leaderboard)
    - HumanEval
    - HumanEval_Plus
    - MBPP
    - MBPP_Plus
4. [Prompt Format](#⚗️-prompt-format)
5. [Quantized Models](#🛠️-quantized-models)
6. [Gratitude](#🙏-gratitude)
   
## 🪄 Nous Benchmark Results

WestSeverus-7B-DPO-v2 is currently on the top of the [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/CultriX/Yet_Another_LLM_Leaderboard) created by CultriX and it outperforms on TruthfulQA Scores and BigBench.  

| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|---|---:|---:|---:|---:|---:|
| [**WestSeverus-7B-DPO-v2**](https://huggingface.co/FelixChao/WestSeverus-7B-DPO-v2)| **60.98**| 45.29 | 77.2|      **72.72**|   **48.71**|
| [CultriX/Wernicke-7B-v1](https://huggingface.co/CultriX/Wernicke-7B-v1)| 60.73| 45.59 | 77.36 |   71.46   |  48.49 |
| [mlabonne/NeuralBeagle14-7B](https://huggingface.co/mlabonne/NeuralBeagle14-7B) | 60.25 |46.06|76.77  | 70.32 |47.86  |
| [CultriX/MistralTrix-v1](https://huggingface.co/CultriX/MistralTrix-v1)  | 60.05 | 44.98 | 76.62 | 71.44 | 47.17 |
| [senseable/WestLake-7B-v2](https://huggingface.co/senseable/WestLake-7B-v2)  | 59.42 | 44.27 | 77.86 | 67.46 | 48.09 |
| [mlabonne/Daredevil-7B](https://huggingface.co/mlabonne/Daredevil-7B)  | 58.22 | 44.85 | 76.07 | 64.89 | 47.07 |
| [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) | 44.61 | 27.96 | 70.84 | 44.46 | 35.17 |

## 🏆 Open LLM Leaderboard

WestSeverus-7B-DPO-v2 is one of the top 7B model in Open LLM Leaderboard and it outperforms on TruthfulQA and GSM8K.

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |75.29|
|AI2 Reasoning Challenge (25-Shot)|71.42|
|HellaSwag (10-Shot)              |88.27|
|MMLU (5-Shot)                    |64.79|
|TruthfulQA (0-shot)              |72.37|
|Winogrande (5-shot)              |83.27|
|GSM8k (5-shot)                   |71.65|

Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_FelixChao__WestSeverus-7B-DPO-v2)

## ⚡ EvalPlus Leaderboard

| Model | HumanEval | HumanEval_Plus| MBPP | MBPP_Plus |
|---|---:|---:|---:|---:|
| phi-2-2.7B |48.2|43.3|61.9|51.4|
| **WestSeverus-7B-DPO-v2**| 43.3 | 34.1 |TBD |TBD |
| SOLAR-10.7B-Instruct-v1.0 |  42.1   |  34.3    |   42.9  |  34.6   |
| CodeLlama-7B| 37.8| 34.1 | 57.6 |45.4 |

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64a53b0747a04f0512941b6f/lL72F41NUueFMP7p-fPl7.png)

## ⚗️ Prompt Format

WestSeverus-7B-DPO-v2 was trained using the ChatML prompt templates with system prompts. An example follows below:

```
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```

## 🛠️ Quantized Models 

### Another version of WestSeverus Model:

* [**PetroGPT/WestSeverus-7B-DPO**](https://huggingface.co/PetroGPT/WestSeverus-7B-DPO)

* **GGUF**: https://huggingface.co/TheBloke/WestSeverus-7B-DPO-GGUF
* **GGUF**: https://huggingface.co/s3nh/WestSeverus-7B-DPO-GGUF
* **GPTQ**: https://huggingface.co/TheBloke/WestSeverus-7B-DPO-GPTQ
* **AWQ**: https://huggingface.co/TheBloke/WestSeverus-7B-DPO-AWQ

### MaziyarPanahi/WestSeverus-7B-DPO-v2-GGUF

* **GGUF**: https://huggingface.co/MaziyarPanahi/WestSeverus-7B-DPO-v2-GGUF

## 🙏 Gratitude

* Thanks to @senseable for [senseable/WestLake-7B-v2](https://huggingface.co/senseable/WestLake-7B-v2).
* Thanks to @jondurbin for [jondurbin/truthy-dpo-v0.1 dataset](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1).
* Thanks to @Charles Goddard for MergeKit.
* Thanks to @TheBloke, @s3nh, @MaziyarPanahi for Quantized Models.
* Thanks to @mlabonne, @CultriX for YALL - Yet Another LLM Leaderboard.
* Thank you to all the other people in the Open Source AI community who utilized this model for further research and improvement.

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_FelixChao__WestSeverus-7B-DPO-v2)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |75.29|
|AI2 Reasoning Challenge (25-Shot)|71.42|
|HellaSwag (10-Shot)              |88.27|
|MMLU (5-Shot)                    |64.79|
|TruthfulQA (0-shot)              |72.37|
|Winogrande (5-shot)              |83.27|
|GSM8k (5-shot)                   |71.65|