Text2Text Generation
Safetensors
English
gemma3
Not-For-All-Audiences
File size: 13,481 Bytes
74a8dd7
 
 
 
 
 
 
 
7dea25b
1b1c80f
 
 
74a8dd7
 
 
 
 
 
 
 
 
e253129
 
34bab1e
74a8dd7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4cc8f54
74a8dd7
 
 
 
 
 
 
 
 
 
 
238d145
 
 
 
 
 
f7414bf
98204e7
 
 
74a8dd7
 
 
98204e7
 
74a8dd7
 
 
 
f5d16e0
74a8dd7
 
 
 
 
 
 
 
 
 
 
9e1a896
 
 
 
 
 
 
 
 
 
 
74a8dd7
 
 
 
3228486
239401f
3228486
74a8dd7
 
 
 
 
9e1a896
74a8dd7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5d16e0
74a8dd7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3228486
4afb155
 
3228486
4afb155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74a8dd7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ae1695
74a8dd7
 
 
 
 
 
1b1c80f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
---
license: gemma
language:
- en
base_model:
- google/gemma-3-12b-it
tags:
- not-for-all-audiences
pipeline_tag: text2text-generation
datasets:
- SicariusSicariiStuff/UBW_Tapestries
- SicariusSicariiStuff/Synth_Usernames
---

<div align="center">
  <b style="font-size: 40px;">Oni_Mitsubishi_12B</b>


</div>



<img src="https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B/resolve/main/Images/Oni_Mitsubishi_12B.png" alt="Oni_Mitsubishi_12B" style="width: 70%; min-width: 500px; display: block; margin: auto;">

---

<a href="https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B#tldr" style="color: purple; font-weight: bold; font-size: 48px; text-decoration: none; display: block; text-align: center;">Click here for TL;DR</a>

---
**It happened**. The long-awaited **Gemma-3** is here, and not only are the model sizes really good (**1, 4, 12, 27**), but the **128k** context (except for the 1B 32k) was exactly what the Open-Source community wanted and asked for. My only issue with Gemma models in general, is the VRAM requirement for **tuning them**, but that's a "me problem." End users will probably be very happy with Gemma-3 in terms of the VRAM requirement for **running it**.

On the **12th** of March, the Gemma-3 family of models was released. So I decided to go **full superstitious**, and took this omen as a divine calling to finetune the **12B** model first. This is how **Oni_Mitsubishi_12B** was born.

Before starting the actual training run, I used the following command, which I believe has helped the model to converge "better":
```
for i in {1..666}; do nvidia-smi; done
```


Gemma is known for its "**Gemma knowledge**": fandom and \ or other obscure knowledge that sometimes even larger LLMs often do not possess. It gets even better, as this time we also got a **vision model** embedded into all the Gemma-3 models, except for the 1B. I wonder what are the possibilities for the vision part if the text layers are uncensored?

I have used brand new **long context markdown data**, some **deslopped** instruct data (very lightly deslopped, it's very time-consuming to get right), **and more than 50%** of highly curated and filtered organic human data, meticulously cleaned, and parsed into obedience. A new stack of organic and data-engineered text was used **for the first time** for **Oni_Mitsubishi_12B**. I truly hope creating it was worth the effort.

At **NO POINT** ChatGPT was used for data generation, however, the new **Claude 3.7** sonnet was used **VERY** sparingly for the **specific task** of creating a small number of humorous datasets (very human-like, was done with a decent amount of prompt engineering), I've meticulously checked them for slop, and it is **minimal**. This goal of said data was to imitate human text, using the **4chan vernacular**.

Speaking of which, I've published a highly curated, SFT-ready 4chan dataset here: [UBW_Tapestries](https://huggingface.co/datasets/SicariusSicariiStuff/UBW_Tapestries), naturally I have included it in the dataset used for this model as well.

---

# Technical details

I've used the "ancient" **Alpaca chat template** because the **Gemma-3 chat template** was behaving funkily, and I didn't want to waste precious time, and instead give the community a more uncensored finetune to play with, as fast as possible (I saw this requested a lot on both Reddit and discord, understandable). In my opinion, it's silly to let perfect be an enemy of the good. Anyway, I had to use both bleeding edge **Transformers** and **Axolotl**, and modify stuff **that wasn't even supposed to work** (like the model's config.json).

Since it's a hybrid model, training its text-only part is a bit problematic, so I hacked a config.json that gaslights the model into thinking it's only a text model, and got some warnings like:

```
'vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight', 'vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias'}
- This IS expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Gemma3ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
```

Then I saw it trains.

>The absolute state when you can train a model before you can actually inference it.

## Feedback, as always, is very much welcomed (even if it's negative).

---

# Included Character cards in this repo:

- [Takai_Puraisu](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B/resolve/main/Character_Cards/Takai_Puraisu.png) (Car dealership simulator)

---

# Other character cards:
- [Vesper](https://huggingface.co/SicariusSicariiStuff/Phi-Line_14B/resolve/main/Character_Cards/Vesper.png) (Schizo **Space Adventure**)
- [Nina_Nakamura](https://huggingface.co/SicariusSicariiStuff/Phi-Line_14B/resolve/main/Character_Cards/Nina_Nakamura.png) (The **sweetest** dorky co-worker)
- [Employe#11](https://huggingface.co/SicariusSicariiStuff/Phi-Line_14B/resolve/main/Character_Cards/Employee%2311.png) (**Schizo workplace** with a **schizo worker**)

---

### TL;DR
<details>
<summary><b>First Gemma-3 Tune in the world</b></summary>

<img src="https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B/resolve/main/Images/1st.png" alt="Oni_Mitsubishi_12B_Tune" style="width: 100%; min-width: 600px; display: block; margin: auto;">

</details>

- **Excellent Roleplay** abilities. Like Gemma-2, but better in every way. Probably. More testing is needed.
- **Short to Medium length** response (1-4 paragraphs, usually 1-2).
- **Schizo assistant** with an exceptional tables and markdown understanding.
- Strong **Creative writing** abilities due to huge chunk of organic creative writing data. Will obey requests regarding formatting (markdown headlines for paragraphs, etc).
- **LOW refusals** - Total freedom in RP, can do things other RP models won't, and I'll leave it at that. Low refusals in assistant tasks as well.
- **VERY good** at following the **character card**. Based on the best RP datasets I have available.
- **4chan hard bias** can be either good or bad.
- **Unhinged** to the point it made me worry at first.

### Important: Make sure to use the correct settings!
[Assistant settings](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B#recommended-settings-for-assistant-mode)

[Roleplay settings](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B#rp-settings-below-)


---

## Oni_Mitsubishi_12B is available at the following quantizations:

- Original: [FP16](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B)
- GGUF & iMatrix: [GGUF](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B_GGUF) | [iMatrix](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B_iMatrix)
- Mobile (ARM): [Q4_0](https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B_ARM)
---


# Vision, model variations, etc

- As mentioned above, this model was hacked together quickly, so the embedded vision model was removed. This makes it both lighter and more accessible compliance-wise (due to certain EU laws restricting the use of multimodal models, etc.).  
- The **full model**, with vision embedded, is available here: [Oni_Mitsubishi_12B_Vision](https://huggingface.co/Sicarius-Prototyping/Oni_Mitsubishi_12B_Vision).  
- The **vision model alone**, without the language model, is available here: [Gemma-3_12B_Vision_Only](https://huggingface.co/Sicarius-Prototyping/Gemma-3_12B_Vision_Only).  
- **Regarding NSFW and vision:** Testing shows that the model behaves in alignment with its UGI score—it is moderately censored. It will not generate graphic depictions of certain body parts but provide more detailed descriptions than the stock Gemma.  
- **Was the vision model fine-tuned?** No.  

---

## Model Details

- Intended use: **Role-Play**, **Creative Writing**, **General Tasks**.

- Censorship level: <b>Medium</b>

- **4.5 / 10** (10 completely uncensored)


## UGI score:


<img src="https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B/resolve/main/Images/UGI.png" style="width: 100%; min-width: 600px; display: block; margin: auto;">



---


## Recommended settings for assistant mode
<details>
<summary>Full generation settings: <b>Debug Deterministic</b>.</summary>

<img src="https://huggingface.co/SicariusSicariiStuff/Dusk_Rainbow/resolve/main/Presets/Debug-deterministic.png" alt="Oni_Mitsubishi_12B_Settings" style="width: 100%; min-width: 600px; display: block; margin: auto;">

</details>

<details>
<summary>Full generation settings: <b>min_p</b>.</summary>

<img src="https://huggingface.co/SicariusSicariiStuff/Dusk_Rainbow/resolve/main/Presets/min_p.png" alt="Oni_Mitsubishi_12B_Settings" style="width: 100%; min-width: 600px; display: block; margin: auto;">

</details>

## RP settings below-
---

<h2 style="color: green; font-weight: bold; font-size: 36px; text-align: center;">Settings for RP, click below to expand:</h2>

<details>
<summary><b>Roleplay settings:</b>.</summary>
A good repetition_penalty range is <b>between 1.12 - 1.15</b>, feel free to experiment.

With these settings, each output message should be neatly displayed in <b>1 - 5</b> paragraphs, <b>2 - 3</b> is the most common. A single paragraph will be output as a response to a simple message ("What was your name again?").

<b>min_P</b> for RP works too but is more likely to put everything under one large paragraph, instead of a neatly formatted short one. Feel free to switch in between.

<b>(Open the image in a new window to better see the full details)</b>
<img src="https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B/resolve/main/Presets/Oni_Mitsubishi_12B_RP.png" alt="Oni_Mitsubishi_12B_Settings" style="width: 100%; min-width: 600px; display: block; margin: auto;">

```
temperature:  0.8
top_p:  0.95
top_k:  25
typical_p:  1
min_p:  0
repetition_penalty: 1.12
repetition_penalty_range: 1024
```

</details>


<h2 style="color: darkorange; font-weight: bold; font-size: 65px; text-align: center;">Roleplay format: Classic Internet RP</h2>

```
*action* speech *narration*
```

- **min_p** will bias towards a **single big paragraph**.
- The recommended RP settings will bias towards **1-3 small paragraphs** (on some occasions 4-5)

---



# Model instruction template: Alpaca

```
### Instruction:
{prompt}

### Response:
```

---

**Other recommended generation Presets:**

<details>
<summary><b>Midnight Enigma</b></summary>

```
max_new_tokens: 512
temperature: 0.98
top_p: 0.37
top_k: 100
typical_p: 1
min_p: 0
repetition_penalty: 1.18
do_sample: True
```


</details>


<details>
<summary><b>Divine Intellect</b></summary>

```
max_new_tokens: 512
temperature: 1.31
top_p: 0.14
top_k: 49
typical_p: 1
min_p: 0
repetition_penalty: 1.17
do_sample: True
```


</details>

<details>
<summary><b>simple-1</b></summary>

```
max_new_tokens: 512
temperature: 0.7
top_p: 0.9
top_k: 20
typical_p: 1
min_p: 0
repetition_penalty: 1.15
do_sample: True
```


</details>

---

<h2 style="color: green; font-weight: bold; font-size: 65px; text-align: center;">Your support = more models</h2>
<a href="https://ko-fi.com/sicarius" style="color: pink; font-weight: bold; font-size: 48px; text-decoration: none; display: block; text-align: center;">My Ko-fi page (Click here)</a>

---

## Safety update


While the model is very **overtly** toxic, it was evaluated on the [UGI leaderboard](https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard). It was found to be only moderately uncensored. It seems that this 'aggressiveness' and overtness towards toxicity is indeed due to the 4chan dataset used for training. Still, use your judgment when using this. My thanks to the UGI leaderboard for helping me verify that the model is more tame than initially thought.


## No Liability

The creators, distributors, and hosts of this model:
- Accept NO LIABILITY for any misuse of this model
- Make NO WARRANTIES regarding its performance or safety
- Do NOT endorse any content the model may generate

## Potential Risks

This model may:
- Generate toxic, offensive, or harmful content
- Exhibit biases present in the training data
- Produce outputs that violate ethical standards or terms of service on various platforms

## Responsible Usage

Researchers using this model should implement appropriate safeguards, content filtering, and human oversight when conducting experiments.

## Citation Information

```
@llm{Oni_Mitsubishi_12B,
  author = {SicariusSicariiStuff},
  title = {Oni_Mitsubishi_12B},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/SicariusSicariiStuff/Oni_Mitsubishi_12B}
}
```

---

## Benchmarks


Nevermind, HF closed the leaderboard, due to (probably) too many people benchmaxxing using merges. Probably the right call, it's about time.

---

## Other stuff
- [SLOP_Detector](https://github.com/SicariusSicariiStuff/SLOP_Detector) Nuke GPTisms, with SLOP detector.
- [LLAMA-3_8B_Unaligned](https://huggingface.co/SicariusSicariiStuff/LLAMA-3_8B_Unaligned) The grand project that started it all.
- [Blog and updates (Archived)](https://huggingface.co/SicariusSicariiStuff/Blog_And_Updates) Some updates, some rambles, sort of a mix between a diary and a blog.