Q3-8B-Kintsugi / README.md

Update README.md

7ab6351 verified 2 months ago

3.99 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-8B-Base
	library_name: transformers
	tags:
	- mergekit
	- axolotl
	- unsloth
	- roleplay
	- conversational
	datasets:
	- PygmalionAI/PIPPA
	- Alfitaria/nemotron-ultra-reasoning-synthkink
	- PocketDoc/Dans-Prosemaxx-Gutenberg
	- FreedomIntelligence/Medical-R1-Distill-Data
	- cognitivecomputations/SystemChat-2.0
	- allenai/tulu-3-sft-personas-instruction-following
	- kalomaze/Opus_Instruct_25k
	- simplescaling/s1K-claude-3-7-sonnet
	- ai2-adapt-dev/flan_v2_converted
	- grimulkan/theory-of-mind
	- grimulkan/physical-reasoning
	- nvidia/HelpSteer3
	- nbeerbower/gutenberg2-dpo
	- nbeerbower/gutenberg-moderne-dpo
	- nbeerbower/Purpura-DPO
	- antiven0m/physical-reasoning-dpo
	- allenai/tulu-3-IF-augmented-on-policy-70b
	- NobodyExistsOnTheInternet/system-message-DPO
	---

	# Q3-8B-Kintsugi

	![Sketch drawing of a picture of a Kitsune hugging a Fox plushie on her bed. Generated with Midjourney v7](https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/o_fhP0riFrKh-5XyPxQyk.png)
	<small><i>get it? because kintsugi sounds like kitsune? hahaha-</i></small>

	# Overview

	*Q3-8B-Kintsugi* is a roleplaying model finetuned from [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base).

	During testing, Kintsugi punched well above its weight class in terms of parameters, especially for 1-on-1 roleplaying and general storywriting.

	# Quantizations


	EXL3:
	- [Official EXL3 quant repo](https://huggingface.co/allura-quants/allura-org_Q3-8B-Kintsugi-EXL3)

	GGUF:
	- [Official static GGUF quants](https://huggingface.co/allura-quants/allura-org_Q3-8B-Kintsugi-GGUF)

	MLX:
	- [8, 6, and 4bpw MLX-formrt quants by soundTeam](https://huggingface.co/collections/allura-quants/q3-8b-kintsugi-mlx-684fc48444f1214749f538c4)

	# Usage

	- Format is plain-old ChatML (please note that, unlike regular Qwen 3, you do not need to prefill empty think tags for it not to reason -- see below).

	- Settings used by testers varied, but we generally stayed around 0.9 temperature and 0.1 min p. Do not use repetition penalties (DRY included). They break it.

	- Any system prompt can likely be used, but I used the Shingame system prompt (link will be added later i promise)

	- The official instruction following version of Qwen3-8B was not used as a base. Instruction-following is trained in post-hoc, and "thinking" traces were not included. __As a result of this, "thinking" will not function.__

	# Training Process

	1. The [base model](https://huggingface.co/Qwen/Qwen3-8B-Base) first went through a supervised finetune on a corpus of instruction following data, roleplay conversations, and human writing based on the [Ink](https://huggingface.co/collections/allura-org/ink-6772fd1442308781594bbabb)/[Bigger Body](https://huggingface.co/collections/allura-org/bigger-body-67b277af0861cec33b54745d)/[Remnant](https://huggingface.co/collections/allura-org/remnant-6817c2113bbb2aed501513d0) lineage.

	2. Finally, a KTO reinforcement learning phase steered the model away from the very purple prose the initial merge had, and improved its logical+spatial reasoning and sense of overall "intelligence".

	Both stages here are very similar to [Q3-30B-A3B-Designant](https://huggingface.co/allura-org/Q3-30B-A3B-Designant), which went through a very similar process with the same data.

	# Credits

	- Fizz - Training, Data Wrangling

	- Toaster, Mango, Bot, probably others I forgot ;-; - Testing

	- inflatebot - original Designant model card that this one was yoinked from

	- Artus - Funding

	- Alibaba - Making the original model

	- Axolotl, Unsloth, Huggingface - Making the frameworks used to train this model (Axolotl was used for the SFT process, and Unsloth+TRL was used for the KTO process)

	- All quanters, inside and outside the org, specifically Artus, Lyra, and soundTeam/Heni

	We would like to thank the Allura community on Discord, especially Curse, Heni, Artus and Mawnipulator, for their companionship and moral support. You all mean the world to us <3