|
--- |
|
license: apache-2.0 |
|
base_model: Qwen/Qwen3-8B-Base |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- axolotl |
|
- unsloth |
|
- roleplay |
|
- conversational |
|
datasets: |
|
- PygmalionAI/PIPPA |
|
- Alfitaria/nemotron-ultra-reasoning-synthkink |
|
- PocketDoc/Dans-Prosemaxx-Gutenberg |
|
- FreedomIntelligence/Medical-R1-Distill-Data |
|
- cognitivecomputations/SystemChat-2.0 |
|
- allenai/tulu-3-sft-personas-instruction-following |
|
- kalomaze/Opus_Instruct_25k |
|
- simplescaling/s1K-claude-3-7-sonnet |
|
- ai2-adapt-dev/flan_v2_converted |
|
- grimulkan/theory-of-mind |
|
- grimulkan/physical-reasoning |
|
- nvidia/HelpSteer3 |
|
- nbeerbower/gutenberg2-dpo |
|
- nbeerbower/gutenberg-moderne-dpo |
|
- nbeerbower/Purpura-DPO |
|
- antiven0m/physical-reasoning-dpo |
|
- allenai/tulu-3-IF-augmented-on-policy-70b |
|
- NobodyExistsOnTheInternet/system-message-DPO |
|
--- |
|
|
|
# Q3-8B-Kintsugi |
|
|
|
 |
|
<small><i>get it? because kintsugi sounds like kitsune? hahaha-</i></small> |
|
|
|
# Overview |
|
|
|
***Q3-8B-Kintsugi*** is a roleplaying model finetuned from [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base). |
|
|
|
During testing, Kintsugi punched well above its weight class in terms of parameters, especially for 1-on-1 roleplaying and general storywriting. |
|
|
|
# Quantizations |
|
|
|
|
|
EXL3: |
|
- [Official EXL3 quant repo](https://huggingface.co/allura-quants/allura-org_Q3-8B-Kintsugi-EXL3) |
|
|
|
GGUF: |
|
- [Official static GGUF quants](https://huggingface.co/allura-quants/allura-org_Q3-8B-Kintsugi-GGUF) |
|
|
|
MLX: |
|
- [8, 6, and 4bpw MLX-formrt quants by soundTeam](https://huggingface.co/collections/allura-quants/q3-8b-kintsugi-mlx-684fc48444f1214749f538c4) |
|
|
|
# Usage |
|
|
|
- Format is plain-old ChatML (please note that, unlike regular Qwen 3, you do *not* need to prefill empty think tags for it not to reason -- see below). |
|
|
|
- Settings used by testers varied, but we generally stayed around 0.9 temperature and 0.1 min p. Do *not* use repetition penalties (DRY included). They break it. |
|
|
|
- Any system prompt can likely be used, but I used the Shingame system prompt (link will be added later i promise) |
|
|
|
- The official instruction following version of Qwen3-8B was not used as a base. Instruction-following is trained in post-hoc, and "thinking" traces were not included. __As a result of this, "thinking" will not function.__ |
|
|
|
# Training Process |
|
|
|
1. The [base model](https://huggingface.co/Qwen/Qwen3-8B-Base) first went through a supervised finetune on a corpus of instruction following data, roleplay conversations, and human writing based on the [Ink](https://huggingface.co/collections/allura-org/ink-6772fd1442308781594bbabb)/[Bigger Body](https://huggingface.co/collections/allura-org/bigger-body-67b277af0861cec33b54745d)/[Remnant](https://huggingface.co/collections/allura-org/remnant-6817c2113bbb2aed501513d0) lineage. |
|
|
|
2. Finally, a KTO reinforcement learning phase steered the model away from the very purple prose the initial merge had, and improved its logical+spatial reasoning and sense of overall "intelligence". |
|
|
|
Both stages here are very similar to [Q3-30B-A3B-Designant](https://huggingface.co/allura-org/Q3-30B-A3B-Designant), which went through a very similar process with the same data. |
|
|
|
# Credits |
|
|
|
- Fizz - Training, Data Wrangling |
|
|
|
- Toaster, Mango, Bot, probably others I forgot ;-; - Testing |
|
|
|
- inflatebot - original Designant model card that this one was yoinked from |
|
|
|
- Artus - Funding |
|
|
|
- Alibaba - Making the original model |
|
|
|
- Axolotl, Unsloth, Huggingface - Making the frameworks used to train this model (Axolotl was used for the SFT process, and Unsloth+TRL was used for the KTO process) |
|
|
|
- All quanters, inside and outside the org, specifically Artus, Lyra, and soundTeam/Heni |
|
|
|
We would like to thank the Allura community on Discord, especially Curse, Heni, Artus and Mawnipulator, for their companionship and moral support. You all mean the world to us <3 |