File size: 3,993 Bytes
00045ec
0df5370
00045ec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db698dc
 
00045ec
 
 
 
 
69b4274
00045ec
 
 
 
 
515a3c8
00045ec
 
515a3c8
 
 
7ab6351
00045ec
 
 
 
 
1be2f05
 
 
00045ec
d104ba7
00045ec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7ab6351
515a3c8
96995d9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
license: apache-2.0
base_model: Qwen/Qwen3-8B-Base
library_name: transformers
tags:
- mergekit
- axolotl
- unsloth
- roleplay
- conversational
datasets:
- PygmalionAI/PIPPA
- Alfitaria/nemotron-ultra-reasoning-synthkink
- PocketDoc/Dans-Prosemaxx-Gutenberg
- FreedomIntelligence/Medical-R1-Distill-Data
- cognitivecomputations/SystemChat-2.0
- allenai/tulu-3-sft-personas-instruction-following
- kalomaze/Opus_Instruct_25k
- simplescaling/s1K-claude-3-7-sonnet
- ai2-adapt-dev/flan_v2_converted
- grimulkan/theory-of-mind
- grimulkan/physical-reasoning
- nvidia/HelpSteer3
- nbeerbower/gutenberg2-dpo
- nbeerbower/gutenberg-moderne-dpo
- nbeerbower/Purpura-DPO
- antiven0m/physical-reasoning-dpo
- allenai/tulu-3-IF-augmented-on-policy-70b
- NobodyExistsOnTheInternet/system-message-DPO
---

# Q3-8B-Kintsugi

![Sketch drawing of a picture of a Kitsune hugging a Fox plushie on her bed. Generated with Midjourney v7](https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/o_fhP0riFrKh-5XyPxQyk.png)
<small><i>get it? because kintsugi sounds like kitsune? hahaha-</i></small>

# Overview

***Q3-8B-Kintsugi*** is a roleplaying model finetuned from [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base).

During testing, Kintsugi punched well above its weight class in terms of parameters, especially for 1-on-1 roleplaying and general storywriting.

# Quantizations


EXL3:
- [Official EXL3 quant repo](https://huggingface.co/allura-quants/allura-org_Q3-8B-Kintsugi-EXL3)

GGUF:
- [Official static GGUF quants](https://huggingface.co/allura-quants/allura-org_Q3-8B-Kintsugi-GGUF)

MLX:
- [8, 6, and 4bpw MLX-formrt quants by soundTeam](https://huggingface.co/collections/allura-quants/q3-8b-kintsugi-mlx-684fc48444f1214749f538c4)
  
# Usage

- Format is plain-old ChatML (please note that, unlike regular Qwen 3, you do *not* need to prefill empty think tags for it not to reason -- see below).

- Settings used by testers varied, but we generally stayed around 0.9 temperature and 0.1 min p. Do *not* use repetition penalties (DRY included). They break it.

- Any system prompt can likely be used, but I used the Shingame system prompt (link will be added later i promise)

- The official instruction following version of Qwen3-8B was not used as a base. Instruction-following is trained in post-hoc, and "thinking" traces were not included. __As a result of this, "thinking" will not function.__

# Training Process

1. The [base model](https://huggingface.co/Qwen/Qwen3-8B-Base) first went through a supervised finetune on a corpus of instruction following data, roleplay conversations, and human writing based on the [Ink](https://huggingface.co/collections/allura-org/ink-6772fd1442308781594bbabb)/[Bigger Body](https://huggingface.co/collections/allura-org/bigger-body-67b277af0861cec33b54745d)/[Remnant](https://huggingface.co/collections/allura-org/remnant-6817c2113bbb2aed501513d0) lineage.

2. Finally, a KTO reinforcement learning phase steered the model away from the very purple prose the initial merge had, and improved its logical+spatial reasoning and sense of overall "intelligence".

Both stages here are very similar to [Q3-30B-A3B-Designant](https://huggingface.co/allura-org/Q3-30B-A3B-Designant), which went through a very similar process with the same data.

# Credits

- Fizz - Training, Data Wrangling

- Toaster, Mango, Bot, probably others I forgot ;-; - Testing

- inflatebot - original Designant model card that this one was yoinked from

- Artus - Funding

- Alibaba - Making the original model

- Axolotl, Unsloth, Huggingface - Making the frameworks used to train this model (Axolotl was used for the SFT process, and Unsloth+TRL was used for the KTO process)

- All quanters, inside and outside the org, specifically Artus, Lyra, and soundTeam/Heni

We would like to thank the Allura community on Discord, especially Curse, Heni, Artus and Mawnipulator, for their companionship and moral support. You all mean the world to us <3