subsectmusic SUBSECT420 commited on
Commit
7f04c26
ยท
verified ยท
1 Parent(s): 0e273ec

Update README.md (#1)

Browse files

- Update README.md (ec6ec11c4d9e7e9bc6dcdcbad6a519c679a852a7)


Co-authored-by: EREW <[email protected]>

Files changed (1) hide show
  1. README.md +334 -8
README.md CHANGED
@@ -1,22 +1,348 @@
1
  ---
2
- base_model: unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
- - unsloth
7
  - qwen3
8
  - gguf
 
 
 
 
9
  license: apache-2.0
10
  language:
11
  - en
 
 
12
  ---
13
 
14
- # Uploaded model
15
 
16
- - **Developed by:** subsectmusic
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
19
 
20
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: Qwen/Qwen3-4B-Instruct
3
  tags:
4
  - text-generation-inference
5
  - transformers
 
6
  - qwen3
7
  - gguf
8
+ - character-roleplay
9
+ - tsundere
10
+ - conversational-ai
11
+ - fine-tuned
12
  license: apache-2.0
13
  language:
14
  - en
15
+ pipeline_tag: text-generation
16
+ library_name: transformers
17
  ---
18
 
19
+ # ๐ŸฆŠ QwRiko3-4B-Instruct-2507 โ€” Tsundere Kitsune AI
20
 
21
+ <div align="center">
22
+ <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>
23
+ </div>
24
 
25
+ ## ๐Ÿ“‹ Model Overview
26
 
27
+ **QwRiko3-4B-Instruct-2507** is a conversational AI model fine-tuned to embody **Riko**, a tsundere kitsune character. Built on **Qwen3-4B-Instruct**, this release (version **2507**) delivers engaging, personality-driven dialogue with sharp wit, playful bite, and hidden warmth.
28
+
29
+ - **Model ID (this repo):** `subsectmusic/qwriko3-4b-instruct-2507`
30
+ - **Base Model:** `Qwen/Qwen3-4B-Instruct`
31
+ - **Project:** Project Horizon LLM
32
+ - **Developer:** @subsectmusic
33
+ - **Training Framework:** Unsloth + Hugging Face TRL (SFT)
34
+ - **License:** Apache-2.0 (repo)
35
+ - **Parameters:** ~4B
36
+ - **Formats:** PyTorch; optional GGUF export for Ollama
37
+
38
+ ## ๐ŸŽญ Character Profile: Riko
39
+
40
+ - **Tsundere cadence:** โ€œItโ€™s not like I like you or anythingโ€ฆ b-baka!โ€
41
+ - **Kitsune vibes:** fox-spirit mischief + sly wisdom
42
+ - **Emotional core:** tough shell, soft center (rarely admitted)
43
+ - **Style:** snappy, teasing, ultimately caring
44
+
45
+ ## ๐Ÿš€ Quick Start
46
+
47
+ ### Option 1 โ€” Hugging Face Transformers (Python)
48
+
49
+ ```python
50
+ # QwRiko3-4B-Instruct-2507 โ€” Complete, ready-to-run example
51
+ # Requirements:
52
+ # pip install transformers>=4.42.0 torch>=2.1.0 accelerate
53
+ # (CUDA recommended; works on CPU with slower generation)
54
+
55
+ import torch
56
+ from transformers import AutoTokenizer, AutoModelForCausalLM
57
+
58
+ MODEL_ID = "subsectmusic/qwriko3-4b-instruct-2507"
59
+
60
+ # Load tokenizer & model
61
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
62
+ model = AutoModelForCausalLM.from_pretrained(
63
+ MODEL_ID,
64
+ torch_dtype=torch.float16,
65
+ device_map="auto"
66
+ )
67
+
68
+ # Chat messages using the model's chat template (preferred)
69
+ messages = [
70
+ {"role": "system", "content": "You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth."},
71
+ {"role": "user", "content": "Hey Riko, how are you today?"}
72
+ ]
73
+
74
+ # Apply chat template if available; otherwise fall back to a plain prompt
75
+ if hasattr(tokenizer, "apply_chat_template"):
76
+ inputs = tokenizer.apply_chat_template(
77
+ messages,
78
+ tokenize=True,
79
+ add_generation_prompt=True,
80
+ return_tensors="pt"
81
+ )
82
+ else:
83
+ # Fallback prompt string (works without chat template)
84
+ prompt = (
85
+ "System: You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth.\n"
86
+ "User: Hey Riko, how are you today?\n"
87
+ "Assistant:"
88
+ )
89
+ inputs = tokenizer(prompt, return_tensors="pt").input_ids
90
+
91
+ # Move inputs to the same device as model
92
+ if hasattr(inputs, "to"):
93
+ inputs = inputs.to(model.device)
94
+
95
+ # Sensible generation defaults for a 4B instruct chat model
96
+ gen_kwargs = {
97
+ "max_new_tokens": 256,
98
+ "temperature": 0.85,
99
+ "top_p": 0.9,
100
+ "top_k": 50,
101
+ "repetition_penalty": 1.1,
102
+ "do_sample": True,
103
+ "pad_token_id": tokenizer.eos_token_id,
104
+ "eos_token_id": tokenizer.eos_token_id,
105
+ }
106
+
107
+ with torch.no_grad():
108
+ output = model.generate(inputs, **gen_kwargs)
109
+
110
+ # If we used the chat template, slice after the prompt tokens
111
+ if hasattr(tokenizer, "apply_chat_template"):
112
+ prompt_len = inputs.shape[1]
113
+ text = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
114
+ else:
115
+ text = tokenizer.decode(output[0], skip_special_tokens=True)
116
+
117
+ print("\nRiko:", text.strip())
118
+ ```
119
+
120
+ ### Option 2 โ€” Text Generation Inference (TGI)
121
+
122
+ ```bash
123
+ # Start a local TGI server serving the model
124
+ # Requirements: text-generation-inference installed and a GPU is recommended
125
+ text-generation-launcher --model-id subsectmusic/qwriko3-4b-instruct-2507 --hostname 0.0.0.0 --port 8080
126
+ ```
127
+
128
+ Example request:
129
+
130
+ ```bash
131
+ curl http://localhost:8080/generate -X POST -H "Content-Type: application/json" -d '{
132
+ "inputs": [
133
+ {"role":"system","content":"You are Riko, a tsundere kitsune AI."},
134
+ {"role":"user","content":"Write a playful greeting in your style."}
135
+ ],
136
+ "parameters": {
137
+ "max_new_tokens": 200,
138
+ "temperature": 0.9,
139
+ "top_p": 0.9,
140
+ "repetition_penalty": 1.1
141
+ }
142
+ }'
143
+ ```
144
+
145
+ ### Option 3 โ€” Ollama (GGUF)
146
+
147
+ If you export or publish a GGUF build of this model:
148
+
149
+ ```bash
150
+ # Pull (requires a GGUF build with this exact tag to be available)
151
+ ollama pull subsectmusic/qwriko3-4b-instruct-2507
152
+
153
+ # Chat
154
+ ollama run subsectmusic/qwriko3-4b-instruct-2507 "Riko, give me some fox-spirit advice for a Monday."
155
+ ```
156
+
157
+ > Tip: To create a local GGUF for testing, convert via llama.cpp/Qwen-compatible tools and set an `Modelfile` with the chat template matching Qwen3.
158
+
159
+ ## ๐Ÿงช Minimal Conversation Template (Python)
160
+
161
+ ```python
162
+ from transformers import AutoTokenizer, AutoModelForCausalLM
163
+ import torch
164
+
165
+ MODEL_ID = "subsectmusic/qwriko3-4b-instruct-2507"
166
+
167
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
168
+ model = AutoModelForCausalLM.from_pretrained(
169
+ MODEL_ID,
170
+ torch_dtype=torch.float16,
171
+ device_map="auto"
172
+ )
173
+
174
+ def chat(user_text: str) -> str:
175
+ messages = [
176
+ {"role": "system", "content": "You are Riko, a tsundere kitsune AI. Reply in-character."},
177
+ {"role": "user", "content": user_text}
178
+ ]
179
+ inputs = tokenizer.apply_chat_template(
180
+ messages,
181
+ tokenize=True,
182
+ add_generation_prompt=True,
183
+ return_tensors="pt"
184
+ ).to(model.device)
185
+
186
+ output = model.generate(
187
+ inputs,
188
+ max_new_tokens=256,
189
+ temperature=0.85,
190
+ top_p=0.9,
191
+ top_k=50,
192
+ repetition_penalty=1.1,
193
+ do_sample=True,
194
+ pad_token_id=tokenizer.eos_token_id,
195
+ eos_token_id=tokenizer.eos_token_id
196
+ )
197
+ text = tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True)
198
+ return text.strip()
199
+
200
+ print(chat("Give me a short pep talk for studying."))
201
+ ```
202
+
203
+ ## ๐Ÿ’ก Use Cases
204
+
205
+ - Character roleplay & entertainment
206
+ - Creative writing assistance (tsundere voice)
207
+ - Personality-driven chatbots
208
+ - Research on alternating-turn distillation & style transfer
209
+
210
+ ## ๐Ÿ”ฌ Project Horizon LLM Methodology
211
+
212
+ **Alternating-turn distillation** to preserve consistent character voice:
213
+
214
+ 1. Extract human/user turns from multi-turn chats
215
+ 2. Generate responses from two high-quality sources in alternation (e.g., **Kimi K2** โ†’ odd turns, **Horizon Beta** โ†’ even turns)
216
+ 3. Curate for Rikoโ€™s tsundere persona
217
+ 4. Compile into supervised fine-tuning (SFT) dataset
218
+ 5. Fine-tune **Qwen3-4B-Instruct** using **Unsloth + TRL**
219
+
220
+ Benefits:
221
+ - Personality consistency across topics
222
+ - Response diversity from multiple teacher styles
223
+ - Efficient transfer into a compact 4B model
224
+
225
+ ## ๐Ÿ› ๏ธ Training Details
226
+
227
+ ### Dataset & Method
228
+ - **Format:** ShareGPT-style โ†’ Alpaca single-turn pairs
229
+ - **Teachers:** Kimi K2 (odd) + Horizon Beta (even)
230
+ - **Focus:** Tsundere kitsune persona, witty banter, emotional subtext
231
+ - **Curation:** Manual filtering for tone & safety
232
+
233
+ ### Example Training Config (SFT)
234
+
235
+ ```yaml
236
+ Training Framework: Unsloth + TRL SFTTrainer
237
+ Base Model: Qwen/Qwen3-4B-Instruct
238
+ Batch Size: 2 per device
239
+ Gradient Accumulation: 4
240
+ Learning Rate: 2e-4
241
+ Optimizer: AdamW 8-bit
242
+ Weight Decay: 0.01
243
+ Scheduler: Linear
244
+ Max Steps: 100+
245
+ Warmup Steps: 5
246
+ Sequence Length: up to model context
247
+ Precision: fp16
248
+ ```
249
+
250
+ ### Performance Notes
251
+ - **Compact:** ~4B parameters for fast local use
252
+ - **Unsloth optimizations:** faster training/inference
253
+ - **Quantization:** 4-bit/8-bit supported via bitsandbytes (PyTorch) and GGUF (Ollama) if exported
254
+
255
+ ## ๐Ÿ“Š Model Specifications
256
+
257
+ | Attribute | Details |
258
+ |------------------|-------------------------------|
259
+ | Architecture | Qwen3 Transformer |
260
+ | Parameters | ~4B |
261
+ | Base | Qwen/Qwen3-4B-Instruct |
262
+ | Context Length | Base-dependent (Qwen3 config) |
263
+ | Formats | PyTorch; GGUF (optional) |
264
+ | Framework | PyTorch + Transformers |
265
+ | Optimization | Unsloth-accelerated SFT |
266
+ | Style | Tsundere kitsune (Riko) |
267
+
268
+ ## ๐ŸŽฏ Recommended Inference Settings
269
+
270
+ ```python
271
+ generation_config = {
272
+ "max_new_tokens": 256,
273
+ "temperature": 0.85, # playful but coherent
274
+ "top_p": 0.9, # nucleus sampling
275
+ "top_k": 50, # limit candidate tokens
276
+ "repetition_penalty": 1.1, # reduce loops
277
+ "do_sample": True,
278
+ "pad_token_id": tokenizer.eos_token_id,
279
+ "eos_token_id": tokenizer.eos_token_id
280
+ }
281
+ ```
282
+
283
+ ## โš ๏ธ Limitations
284
+
285
+ - In-character bias (tsundere tone) may color factual or technical answers
286
+ - Compact 4B size: may require careful prompting for complex tasks
287
+ - Quantization can slightly affect nuance
288
+
289
+ ## ๐Ÿ”’ Ethical Considerations
290
+
291
+ - Designed for entertainment and creative use
292
+ - Not for professional advice or therapy
293
+ - Follow platform guidelines and content policies
294
+
295
+ ## ๐Ÿ“š Citation
296
+
297
+ If you use this model, please cite:
298
+
299
+ ```bibtex
300
+ @model{qwriko3-4b-instruct-2507,
301
+ title={QwRiko3-4B-Instruct-2507: Tsundere Kitsune AI},
302
+ author={subsectmusic},
303
+ year={2025},
304
+ publisher={Hugging Face},
305
+ url={https://huggingface.co/subsectmusic/qwriko3-4b-instruct-2507}
306
+ }
307
+ ```
308
+
309
+ ## ๐Ÿค Acknowledgments
310
+
311
+ - **Kimi K2** & **Horizon Beta**: alternating-turn teacher models
312
+ - **Project Horizon LLM**: methodology & curation
313
+ - **Unsloth**: training acceleration
314
+ - **Qwen Team**: base architecture
315
+ - **Hugging Face / TRL**: libraries & hosting
316
+ - **Ollama**: GGUF local runtime
317
+
318
+ ## ๐Ÿ“ฆ Deployment Options
319
+
320
+ ### Transformers (PyTorch)
321
+ - FP16/BF16 inference on GPU; CPU supported (slower)
322
+ - Bitsandbytes 4-bit/8-bit loading for low-VRAM setups
323
+
324
+ ### TGI
325
+ - Production-grade server with simple HTTP API
326
+
327
+ ### Ollama (GGUF)
328
+ - Local, offline chat once a GGUF build is produced for this model
329
+
330
+ ```bash
331
+ # Example Ollama flow (if/when GGUF is published)
332
+ curl -fsSL https://ollama.ai/install.sh | sh
333
+ ollama pull subsectmusic/qwriko3-4b-instruct-2507
334
+ ollama run subsectmusic/qwriko3-4b-instruct-2507 "Hello Riko!"
335
+ ```
336
+
337
+ ## ๐Ÿ“ž Support & Community
338
+
339
+ - **Issues:** Open on this repoโ€™s Issues tab
340
+ - **Discussions:** Community threads for tips and prompts
341
+ - **Updates:** Watch the repo for new model variants and GGUF builds
342
+
343
+ ---
344
+
345
+ <div align="center">
346
+ <b>Made with โค๏ธ using Unsloth</b><br>
347
+ <i>Training AI personalities, one tsundere at a time!</i>
348
+ </div>