File size: 11,083 Bytes
8d096c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269

---

license: llama3.2
language:
- en
base_model:
- meta-llama/Llama-3.2-3B-Instruct
tags:
- merge

---

[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)


# QuantFactory/Eximius_Persona_5B-GGUF
This is quantized version of [SicariusSicariiStuff/Eximius_Persona_5B](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B) created using llama.cpp

# Original Model Card


<div align="center">
  <b style="font-size: 40px;">Eximius_Persona_5B</b>


</div>


<img src="https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B/resolve/main/Images/Eximius_Persona_5B.png" alt="Eximius_Persona_5B" style="width: 70%; min-width: 500px; display: block; margin: auto;">


---

<a href="https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B#tldr" style="color: purple; font-weight: bold; font-size: 48px; text-decoration: none; display: block; text-align: center;">Click here for TL;DR</a>

---


I wanted to create a model with an **exceptional** capacity for using varied speech patterns and **fresh** role-play takes. The model had to have a unique personality, not on a surface level but on the inside, **for real**. Unfortunately, SFT alone just didn't cut it. And I had only 16GB of VRAM at the time. Oh, and I wanted it to be small enough to be viable for phones and to be able to give a fight to larger models while at it. If only there was a magical way to do it.

**Merges**. Merges are quite unique. In the early days, they were considered "fake." Clearly, there's no such thing as merges. Where are the papers? No papers? Then it's clearly impossible. "Mathematically impossible." Simply preposterous. To mix layers and hope for a coherent output? What nonsense!

And yet, they were **real**. <a href="https://huggingface.co/Undi95">Undi95</a> made some of the earliest merges I can remember, and the "LLAMA2 Era" was truly amazing and innovative thanks to them. Cool stuff like <a href="https://huggingface.co/KoboldAI/LLaMA2-13B-TiefighterLR">Tiefighter</a> was being made, and eventually the time tested <a href="https://huggingface.co/sophosympatheia/Midnight-Miqu-70B-v1.5">Midnight-Miqu-70B (v1.5 is my personal favorite)</a>.

Merges are an interesting thing, as they affect LLMs in a way that is currently **impossible** to reproduce using **SFT** (or any 'SOTA' technique). One of the plagues we have today, while we have orders of magnitude smarter LLMs, is **GPTisms** and **predictability**. Merges can potentially 'solve' that. How? In short, if you physically tear neurons (**passthrough** brain surgery) while you somehow manage to keep the model coherent enough, and if you're lucky, it can even follows instructions- then magical stuff begins to happen.

Magic, because it's **not** an exact science, there's some art to it, as it is done with a lot of **intuition**. GPTisms are patterns that the model really **really** "wants" to follow, it's quite hard to dissuade it. But if you yeet a couple of layers and rearrange them, boy does it get hard to spew those shivers down the spine... and instead the model starts spewing stuff that it was never intended to. It breaks its patterns and introduces some healthy chaos into the mix.

This model, **Eximius_Persona_5B**, is the result of multiple merges, that have been tuned, then merged again, then... for many times and iterations. The base was LLAMA 3.2 3B and I focused on achieving the following **4 traits**, in that specific order:

- Varied speech patterns

- Roleplay ability

- Long context coherency

- Instruction following

For me, getting varied speech patterns was more important than instruction following, for instruction following we got API models, or LLAMA 3.3. Many models are excellent assistants, yet they all sound pretty much the same.

I also wanted to make use of my **4090m 16GB** while my workstation crunches **Phi-4'** brain. Making a nice 5B model aligns with my goal of making AI accessible and fun for everyone, and hence **Eximius_Persona_5B** was born. Let this also be a call to action for more people to make AI models, you don't have to have multiple GPUs or spend a fortune on the cloud (although that definitely opens up options), you can do plenty with a mere 16GB of VRAM. And in case 16GB seems out of reach too, I should mention that Google Collab gives access to a free T4.

I uploaded a more funky, less stable, and thiccer version of Eximius_Persona to my prototyping org here:

[Eximius_Persona with 84 Layers from various checkpoints](https://huggingface.co/Sicarius-Prototyping/Eximius_Persona_84L)

(from some early tests, occasionally it outputs stories that fool GPTZERO that it was written by a human- **60% human**, 40% AI with a lucky roll)

<details>
<summary><b>See example:</b></summary>

<img src="https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B/resolve/main/Images/Eximius_Persona_5B_GPTZERO.png" alt="GPTZERO Example" style="width: 100%; min-width: 600px; display: block; margin: auto;">

</details>


---

### TL;DR
- **Fun & Fresh Roleplay** flavour.
- **Interesting speech patterns** in creative writing.
- **Good long context coherency** in Roleplay.
- **Occasionally** outputs quite **human like** stories.
- **50 Layers** LLAMA 3.2, fully coherent.
- **Strong performance** in general for a **5B model**.

### Important: Make sure to use the correct settings!

[Assistant settings](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B#recommended-settings-for-assistant-mode)

[Roleplay settings](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B#recommended-settings-for-roleplay-mode)


---

## Eximius_Persona_5B is available at the following quantizations:

- Original: [FP16](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B)
- GGUF: [Static Quants](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B_GGUF) | [iMatrix_GGUF](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B_iMatrix)
- EXL2: [3.5 bpw](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B-3.5bpw) | [4.0 bpw](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B-4.0bpw) | [5.0 bpw](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B-5.0bpw) | [6.0 bpw](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B-6.0bpw) | [7.0 bpw](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B-7.0bpw) | [8.0 bpw](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B-8.0bpw)
- Specialized: [FP8](https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B_FP8)

---

## Model Details

- Intended use: **Role-Play**, **Creative Writing**, General Tasks.

- Censorship level: <b>Medium</b>

- **5 / 10** (10 completely uncensored)


## UGI score:


  <img src="https://huggingface.co/SicariusSicariiStuff/Eximius_Persona_5B/resolve/main/Images/Eximius_Persona_5B_UGI.png" alt="UGI Score" style="width: 100%; min-width: 700px; display: block;">

### Don't use it for coding :)
---


# Model instruction template: Llama-3-Instruct

```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{output}<|eot_id|>
```

---
<h2 style="color: darkorange; font-weight: bold; font-size: 55px; text-align: center;">Roleplay format: Classic Internet RP</h2>

```
*action* speech *narration*
```

### The model is pretty smart, so it might handle other formats as well, but it was trained and tested specifically with the classic internet RP style in mind.

## Recommended settings for assistant mode
<details>
<summary>Full generation settings: <b>Debug Deterministic</b>.</summary>

<img src="https://huggingface.co/SicariusSicariiStuff/Dusk_Rainbow/resolve/main/Presets/Debug-deterministic.png" alt="Negative_LLAMA_70B_Settings" style="width: 100%; min-width: 600px; display: block; margin: auto;">

</details>

<details>
<summary>Full generation settings: <b>min_p</b>.</summary>

<img src="https://huggingface.co/SicariusSicariiStuff/Dusk_Rainbow/resolve/main/Presets/min_p.png" alt="Negative_LLAMA_70B_Settings" style="width: 100%; min-width: 600px; display: block; margin: auto;">

</details>

---

## Recommended settings for Roleplay mode

<details>
<summary><b>Roleplay settings:</b>.</summary>
A good repetition_penalty range is <b>between 1.12 - 1.15</b>, feel free to experiment.

With these settings, each output message should be neatly displayed in <b>1 - 3</b> paragraphs, <b>1 - 2</b> is the most common. A single paragraph will be output as a response to a simple message ("What was your name again?").

<b>min_P</b> for RP works too but is more likely to put everything under one large paragraph, instead of a neatly formatted short one. Feel free to switch in between.

<b>(Open the image in a new window to better see the full details)</b>
<img src="https://huggingface.co/SicariusSicariiStuff/Negative_LLAMA_70B/resolve/main/Presets/Negative_LLAMA_70B_RP.png" alt="Negative_LLAMA_70B_Settings" style="width: 100%; min-width: 600px; display: block; margin: auto;">

```
temperature:  0.8
top_p:  0.95
top_k:  25
typical_p:  1
min_p:  0
repetition_penalty: 1.12
repetition_penalty_range: 1024
```

</details>


---

**Other recommended generation Presets:**

<details>
<summary><b>Midnight Enigma</b></summary>

```
max_new_tokens: 512
temperature: 0.98
top_p: 0.37
top_k: 100
typical_p: 1
min_p: 0
repetition_penalty: 1.18
do_sample: True
```


</details>


<details>
<summary><b>Divine Intellect</b></summary>

```
max_new_tokens: 512
temperature: 1.31
top_p: 0.14
top_k: 49
typical_p: 1
min_p: 0
repetition_penalty: 1.17
do_sample: True
```


</details>

<details>
<summary><b>simple-1</b></summary>

```
max_new_tokens: 512
temperature: 0.7
top_p: 0.9
top_k: 20
typical_p: 1
min_p: 0
repetition_penalty: 1.15
do_sample: True
```


</details>

---

<h2 style="color: green; font-weight: bold; font-size: 65px; text-align: center;">Your support = more models</h2>
<a href="https://ko-fi.com/sicarius" style="color: pink; font-weight: bold; font-size: 48px; text-decoration: none; display: block; text-align: center;">My Ko-fi page (Click here)</a>

---

## Benchmarks

|      Metric       |Value|
|-------------------|----:|
|Avg.               |21.78|
|IFEval (0-Shot)    |65.60|
|BBH (3-Shot)       |22.20|
|MATH Lvl 5 (4-Shot)| 9.89|
|GPQA (0-shot)      | 1.90|
|MuSR (0-shot)      | 7.33|
|MMLU-PRO (5-shot)  |23.78|

---

## Other stuff
- [SLOP_Detector](https://github.com/SicariusSicariiStuff/SLOP_Detector) Nuke GPTisms, with SLOP detector.
- [LLAMA-3_8B_Unaligned](https://huggingface.co/SicariusSicariiStuff/LLAMA-3_8B_Unaligned) The grand project that started it all.
- [Blog and updates (Archived)](https://huggingface.co/SicariusSicariiStuff/Blog_And_Updates) Some updates, some rambles, sort of a mix between a diary and a blog.