Update README.md
Browse files
README.md
CHANGED
@@ -17,14 +17,16 @@ base_model:
|
|
17 |
|
18 |
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/9XVgxKyuXTQVO5mO-EOd4.jpeg)
|
19 |
|
20 |
-
# ๐ฎ Beyonder-4x7B-v3
|
21 |
|
22 |
-
Beyonder-4x7B-v3 is an improvement over the popular [Beyonder-4x7B-v2](https://huggingface.co/mlabonne/Beyonder-4x7B-v2). It's a Mixture of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
|
23 |
* [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B)
|
24 |
* [beowolx/CodeNinja-1.0-OpenChat-7B](https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B)
|
25 |
* [SanjiWatsuki/Kunoichi-DPO-v2-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B)
|
26 |
* [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B)
|
27 |
|
|
|
|
|
28 |
## ๐ Applications
|
29 |
|
30 |
This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).
|
@@ -35,10 +37,16 @@ Thanks to its four experts, it's a well-rounded model, capable of achieving most
|
|
35 |
|
36 |
## โก Quantized models
|
37 |
|
|
|
|
|
38 |
* **GGUF**: https://huggingface.co/mlabonne/Beyonder-4x7B-v3-GGUF
|
|
|
|
|
39 |
|
40 |
## ๐ Evaluation
|
41 |
|
|
|
|
|
42 |
### Nous
|
43 |
|
44 |
Beyonder-4x7B-v3 is one of the best models on Nous' benchmark suite (evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval)) and significantly outperforms the v2. See the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
|
@@ -48,11 +56,21 @@ Beyonder-4x7B-v3 is one of the best models on Nous' benchmark suite (evaluation
|
|
48 |
| [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B) [๐](https://gist.github.com/mlabonne/1d33c86824b3a11d2308e36db1ba41c1) | 62.74 | 45.37 | 77.01 | 78.39 | 50.2 |
|
49 |
| [**mlabonne/Beyonder-4x7B-v3**](https://huggingface.co/mlabonne/Beyonder-4x7B-v3) [๐](https://gist.github.com/mlabonne/3740020807e559f7057c32e85ce42d92) | **61.91** | **45.85** | **76.67** | **74.98** | **50.12** |
|
50 |
| [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B) [๐](https://gist.github.com/mlabonne/cbeb077d1df71cb81c78f742f19f4155) | 59.39 | 45.23 | 76.2 | 67.61 | 48.52 |
|
|
|
51 |
| [mlabonne/Beyonder-4x7B-v2](https://huggingface.co/mlabonne/Beyonder-4x7B-v2) [๐](https://gist.github.com/mlabonne/f73baa140a510a676242f8a4496d05ca) | 57.13 | 45.29 | 75.95 | 60.86 | 46.4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
### Open LLM Leaderboard
|
54 |
|
55 |
-
|
|
|
|
|
56 |
|
57 |
## ๐งฉ Configuration
|
58 |
|
@@ -89,29 +107,6 @@ experts:
|
|
89 |
- "count"
|
90 |
```
|
91 |
|
92 |
-
##
|
93 |
-
|
94 |
-
```python
|
95 |
-
!pip install -qU transformers bitsandbytes accelerate
|
96 |
-
|
97 |
-
from transformers import AutoTokenizer
|
98 |
-
import transformers
|
99 |
-
import torch
|
100 |
-
|
101 |
-
model = "mlabonne/Beyonder-4x7B-v3"
|
102 |
-
|
103 |
-
tokenizer = AutoTokenizer.from_pretrained(model)
|
104 |
-
pipeline = transformers.pipeline(
|
105 |
-
"text-generation",
|
106 |
-
model=model,
|
107 |
-
model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
|
108 |
-
)
|
109 |
-
|
110 |
-
messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
|
111 |
-
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
112 |
-
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
113 |
-
print(outputs[0]["generated_text"])
|
114 |
-
```
|
115 |
-
Output:
|
116 |
|
117 |
-
|
|
|
17 |
|
18 |
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/9XVgxKyuXTQVO5mO-EOd4.jpeg)
|
19 |
|
20 |
+
# ๐ฎ Beyonder-4x7B-v3 GGUF
|
21 |
|
22 |
+
[Beyonder-4x7B-v3](https://huggingface.co/mlabonne/Beyonder-4x7B-v3) is an improvement over the popular [Beyonder-4x7B-v2](https://huggingface.co/mlabonne/Beyonder-4x7B-v2). It's a Mixture of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
|
23 |
* [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B)
|
24 |
* [beowolx/CodeNinja-1.0-OpenChat-7B](https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B)
|
25 |
* [SanjiWatsuki/Kunoichi-DPO-v2-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B)
|
26 |
* [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B)
|
27 |
|
28 |
+
Special thanks to [beowolx](https://huggingface.co/beowolx) for making the best Mistral-based code model and to [SanjiWatsuki](https://huggingface.co/SanjiWatsuki) for creating one of the very best RP models.
|
29 |
+
|
30 |
## ๐ Applications
|
31 |
|
32 |
This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).
|
|
|
37 |
|
38 |
## โก Quantized models
|
39 |
|
40 |
+
Thanks [bartowski](https://huggingface.co/bartowski) for quantizing this model.
|
41 |
+
|
42 |
* **GGUF**: https://huggingface.co/mlabonne/Beyonder-4x7B-v3-GGUF
|
43 |
+
* **More GGUF**: https://huggingface.co/bartowski/Beyonder-4x7B-v3-GGUF
|
44 |
+
* **ExLlamaV2**: https://huggingface.co/bartowski/Beyonder-4x7B-v3-exl2
|
45 |
|
46 |
## ๐ Evaluation
|
47 |
|
48 |
+
This model is not designed to excel in traditional benchmarks, as the code and role-playing models generally do not apply to those contexts. Nonetheless, it performs remarkably well thanks to strong general-purpose experts.
|
49 |
+
|
50 |
### Nous
|
51 |
|
52 |
Beyonder-4x7B-v3 is one of the best models on Nous' benchmark suite (evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval)) and significantly outperforms the v2. See the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
|
|
|
56 |
| [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B) [๐](https://gist.github.com/mlabonne/1d33c86824b3a11d2308e36db1ba41c1) | 62.74 | 45.37 | 77.01 | 78.39 | 50.2 |
|
57 |
| [**mlabonne/Beyonder-4x7B-v3**](https://huggingface.co/mlabonne/Beyonder-4x7B-v3) [๐](https://gist.github.com/mlabonne/3740020807e559f7057c32e85ce42d92) | **61.91** | **45.85** | **76.67** | **74.98** | **50.12** |
|
58 |
| [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B) [๐](https://gist.github.com/mlabonne/cbeb077d1df71cb81c78f742f19f4155) | 59.39 | 45.23 | 76.2 | 67.61 | 48.52 |
|
59 |
+
| [SanjiWatsuki/Kunoichi-DPO-v2-7B](https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B) [๐](https://gist.github.com/mlabonne/895ff5171e998abfdf2a41a4f9c84450) | 58.29 | 44.79 | 75.05 | 65.68 | 47.65 |
|
60 |
| [mlabonne/Beyonder-4x7B-v2](https://huggingface.co/mlabonne/Beyonder-4x7B-v2) [๐](https://gist.github.com/mlabonne/f73baa140a510a676242f8a4496d05ca) | 57.13 | 45.29 | 75.95 | 60.86 | 46.4 |
|
61 |
+
| [beowolx/CodeNinja-1.0-OpenChat-7B](https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B) [๐](https://gist.github.com/mlabonne/08b5280c221fbd7f98eb27561ae902a3) | 50.35 | 39.98 | 71.77 | 48.73 | 40.92 |
|
62 |
+
|
63 |
+
### EQ-Bench
|
64 |
+
|
65 |
+
Beyonder-4x7B-v3 is the best 4x7B model on the EQ-Bench leaderboard, outperforming older versions of ChatGPT and Llama-2-70b-chat. It is very close to Mixtral-8x7B-Instruct-v0.1 and Gemini Pro. Thanks [Sam Paech](https://huggingface.co/sam-paech) for running the eval.
|
66 |
+
|
67 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/-OSHe2ImrxN8wAREnSZAZ.png)
|
68 |
|
69 |
### Open LLM Leaderboard
|
70 |
|
71 |
+
It's also a strong performer on the Open LLM Leaderboard, significantly outperforming the v2 model.
|
72 |
+
|
73 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/NFRYqzwuy9TB-s-Hy3gRy.png)
|
74 |
|
75 |
## ๐งฉ Configuration
|
76 |
|
|
|
107 |
- "count"
|
108 |
```
|
109 |
|
110 |
+
## ๐ณ Model Family Tree
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
111 |
|
112 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/zQi5VgmdqJv6pFaGoQ2AL.png)
|