subsectmusic commited on
Commit
19feb4e
Β·
verified Β·
1 Parent(s): 7f04c26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -196
README.md CHANGED
@@ -5,6 +5,9 @@ tags:
5
  - transformers
6
  - qwen3
7
  - gguf
 
 
 
8
  - character-roleplay
9
  - tsundere
10
  - conversational-ai
@@ -16,7 +19,7 @@ pipeline_tag: text-generation
16
  library_name: transformers
17
  ---
18
 
19
- # 🦊 QwRiko3-4B-Instruct-2507 β€” Tsundere Kitsune AI
20
 
21
  <div align="center">
22
  <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>
@@ -24,143 +27,107 @@ library_name: transformers
24
 
25
  ## πŸ“‹ Model Overview
26
 
27
- **QwRiko3-4B-Instruct-2507** is a conversational AI model fine-tuned to embody **Riko**, a tsundere kitsune character. Built on **Qwen3-4B-Instruct**, this release (version **2507**) delivers engaging, personality-driven dialogue with sharp wit, playful bite, and hidden warmth.
28
 
29
  - **Model ID (this repo):** `subsectmusic/qwriko3-4b-instruct-2507`
 
 
30
  - **Base Model:** `Qwen/Qwen3-4B-Instruct`
31
- - **Project:** Project Horizon LLM
32
- - **Developer:** @subsectmusic
33
- - **Training Framework:** Unsloth + Hugging Face TRL (SFT)
34
- - **License:** Apache-2.0 (repo)
35
  - **Parameters:** ~4B
36
- - **Formats:** PyTorch; optional GGUF export for Ollama
 
 
 
37
 
38
  ## 🎭 Character Profile: Riko
39
 
40
- - **Tsundere cadence:** β€œIt’s not like I like you or anything… b-baka!”
41
- - **Kitsune vibes:** fox-spirit mischief + sly wisdom
42
- - **Emotional core:** tough shell, soft center (rarely admitted)
43
  - **Style:** snappy, teasing, ultimately caring
44
 
45
- ## πŸš€ Quick Start
46
-
47
- ### Option 1 β€” Hugging Face Transformers (Python)
48
-
49
- ```python
50
- # QwRiko3-4B-Instruct-2507 β€” Complete, ready-to-run example
51
- # Requirements:
52
- # pip install transformers>=4.42.0 torch>=2.1.0 accelerate
53
- # (CUDA recommended; works on CPU with slower generation)
54
-
55
- import torch
56
- from transformers import AutoTokenizer, AutoModelForCausalLM
57
-
58
- MODEL_ID = "subsectmusic/qwriko3-4b-instruct-2507"
59
-
60
- # Load tokenizer & model
61
- tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
62
- model = AutoModelForCausalLM.from_pretrained(
63
- MODEL_ID,
64
- torch_dtype=torch.float16,
65
- device_map="auto"
66
- )
67
-
68
- # Chat messages using the model's chat template (preferred)
69
- messages = [
70
- {"role": "system", "content": "You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth."},
71
- {"role": "user", "content": "Hey Riko, how are you today?"}
72
- ]
73
 
74
- # Apply chat template if available; otherwise fall back to a plain prompt
75
- if hasattr(tokenizer, "apply_chat_template"):
76
- inputs = tokenizer.apply_chat_template(
77
- messages,
78
- tokenize=True,
79
- add_generation_prompt=True,
80
- return_tensors="pt"
81
- )
82
- else:
83
- # Fallback prompt string (works without chat template)
84
- prompt = (
85
- "System: You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth.\n"
86
- "User: Hey Riko, how are you today?\n"
87
- "Assistant:"
88
- )
89
- inputs = tokenizer(prompt, return_tensors="pt").input_ids
90
 
91
- # Move inputs to the same device as model
92
- if hasattr(inputs, "to"):
93
- inputs = inputs.to(model.device)
94
 
95
- # Sensible generation defaults for a 4B instruct chat model
96
- gen_kwargs = {
97
- "max_new_tokens": 256,
98
- "temperature": 0.85,
99
- "top_p": 0.9,
100
- "top_k": 50,
101
- "repetition_penalty": 1.1,
102
- "do_sample": True,
103
- "pad_token_id": tokenizer.eos_token_id,
104
- "eos_token_id": tokenizer.eos_token_id,
105
- }
106
 
107
- with torch.no_grad():
108
- output = model.generate(inputs, **gen_kwargs)
 
 
 
 
109
 
110
- # If we used the chat template, slice after the prompt tokens
111
- if hasattr(tokenizer, "apply_chat_template"):
112
- prompt_len = inputs.shape[1]
113
- text = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
114
- else:
115
- text = tokenizer.decode(output[0], skip_special_tokens=True)
116
 
117
- print("\nRiko:", text.strip())
 
118
  ```
119
 
120
- ### Option 2 β€” Text Generation Inference (TGI)
121
 
122
  ```bash
123
- # Start a local TGI server serving the model
124
- # Requirements: text-generation-inference installed and a GPU is recommended
125
- text-generation-launcher --model-id subsectmusic/qwriko3-4b-instruct-2507 --hostname 0.0.0.0 --port 8080
126
  ```
127
 
128
- Example request:
129
 
130
  ```bash
131
- curl http://localhost:8080/generate -X POST -H "Content-Type: application/json" -d '{
132
- "inputs": [
133
- {"role":"system","content":"You are Riko, a tsundere kitsune AI."},
134
- {"role":"user","content":"Write a playful greeting in your style."}
135
- ],
136
- "parameters": {
137
- "max_new_tokens": 200,
138
- "temperature": 0.9,
139
- "top_p": 0.9,
140
- "repetition_penalty": 1.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
141
  }
142
- }'
 
143
  ```
144
 
145
- ### Option 3 β€” Ollama (GGUF)
146
 
147
- If you export or publish a GGUF build of this model:
148
 
149
  ```bash
150
- # Pull (requires a GGUF build with this exact tag to be available)
151
- ollama pull subsectmusic/qwriko3-4b-instruct-2507
152
-
153
- # Chat
154
- ollama run subsectmusic/qwriko3-4b-instruct-2507 "Riko, give me some fox-spirit advice for a Monday."
155
  ```
156
 
157
- > Tip: To create a local GGUF for testing, convert via llama.cpp/Qwen-compatible tools and set an `Modelfile` with the chat template matching Qwen3.
158
 
159
- ## πŸ§ͺ Minimal Conversation Template (Python)
160
 
161
  ```python
162
- from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
 
163
  import torch
 
164
 
165
  MODEL_ID = "subsectmusic/qwriko3-4b-instruct-2507"
166
 
@@ -171,66 +138,58 @@ model = AutoModelForCausalLM.from_pretrained(
171
  device_map="auto"
172
  )
173
 
174
- def chat(user_text: str) -> str:
175
- messages = [
176
- {"role": "system", "content": "You are Riko, a tsundere kitsune AI. Reply in-character."},
177
- {"role": "user", "content": user_text}
178
- ]
 
179
  inputs = tokenizer.apply_chat_template(
180
  messages,
181
  tokenize=True,
182
  add_generation_prompt=True,
183
  return_tensors="pt"
184
  ).to(model.device)
185
-
186
- output = model.generate(
187
- inputs,
188
- max_new_tokens=256,
189
- temperature=0.85,
190
- top_p=0.9,
191
- top_k=50,
192
- repetition_penalty=1.1,
193
- do_sample=True,
194
- pad_token_id=tokenizer.eos_token_id,
195
- eos_token_id=tokenizer.eos_token_id
196
  )
197
- text = tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True)
198
- return text.strip()
199
-
200
- print(chat("Give me a short pep talk for studying."))
 
 
 
 
 
 
 
 
 
 
 
201
  ```
202
 
 
 
203
  ## πŸ’‘ Use Cases
204
 
205
  - Character roleplay & entertainment
206
- - Creative writing assistance (tsundere voice)
207
  - Personality-driven chatbots
208
  - Research on alternating-turn distillation & style transfer
209
 
210
- ## πŸ”¬ Project Horizon LLM Methodology
211
-
212
- **Alternating-turn distillation** to preserve consistent character voice:
213
-
214
- 1. Extract human/user turns from multi-turn chats
215
- 2. Generate responses from two high-quality sources in alternation (e.g., **Kimi K2** β†’ odd turns, **Horizon Beta** β†’ even turns)
216
- 3. Curate for Riko’s tsundere persona
217
- 4. Compile into supervised fine-tuning (SFT) dataset
218
- 5. Fine-tune **Qwen3-4B-Instruct** using **Unsloth + TRL**
219
-
220
- Benefits:
221
- - Personality consistency across topics
222
- - Response diversity from multiple teacher styles
223
- - Efficient transfer into a compact 4B model
224
 
225
- ## πŸ› οΈ Training Details
226
-
227
- ### Dataset & Method
228
- - **Format:** ShareGPT-style β†’ Alpaca single-turn pairs
229
- - **Teachers:** Kimi K2 (odd) + Horizon Beta (even)
230
- - **Focus:** Tsundere kitsune persona, witty banter, emotional subtext
231
  - **Curation:** Manual filtering for tone & safety
232
 
233
- ### Example Training Config (SFT)
234
 
235
  ```yaml
236
  Training Framework: Unsloth + TRL SFTTrainer
@@ -247,12 +206,7 @@ Sequence Length: up to model context
247
  Precision: fp16
248
  ```
249
 
250
- ### Performance Notes
251
- - **Compact:** ~4B parameters for fast local use
252
- - **Unsloth optimizations:** faster training/inference
253
- - **Quantization:** 4-bit/8-bit supported via bitsandbytes (PyTorch) and GGUF (Ollama) if exported
254
-
255
- ## πŸ“Š Model Specifications
256
 
257
  | Attribute | Details |
258
  |------------------|-------------------------------|
@@ -260,7 +214,7 @@ Precision: fp16
260
  | Parameters | ~4B |
261
  | Base | Qwen/Qwen3-4B-Instruct |
262
  | Context Length | Base-dependent (Qwen3 config) |
263
- | Formats | PyTorch; GGUF (optional) |
264
  | Framework | PyTorch + Transformers |
265
  | Optimization | Unsloth-accelerated SFT |
266
  | Style | Tsundere kitsune (Riko) |
@@ -270,32 +224,29 @@ Precision: fp16
270
  ```python
271
  generation_config = {
272
  "max_new_tokens": 256,
273
- "temperature": 0.85, # playful but coherent
274
- "top_p": 0.9, # nucleus sampling
275
- "top_k": 50, # limit candidate tokens
276
- "repetition_penalty": 1.1, # reduce loops
277
  "do_sample": True,
278
  "pad_token_id": tokenizer.eos_token_id,
279
  "eos_token_id": tokenizer.eos_token_id
280
  }
281
  ```
282
 
283
- ## ⚠️ Limitations
284
 
285
- - In-character bias (tsundere tone) may color factual or technical answers
286
- - Compact 4B size: may require careful prompting for complex tasks
287
  - Quantization can slightly affect nuance
288
 
289
- ## πŸ”’ Ethical Considerations
290
 
291
- - Designed for entertainment and creative use
292
- - Not for professional advice or therapy
293
- - Follow platform guidelines and content policies
294
 
295
  ## πŸ“š Citation
296
 
297
- If you use this model, please cite:
298
-
299
  ```bibtex
300
  @model{qwriko3-4b-instruct-2507,
301
  title={QwRiko3-4B-Instruct-2507: Tsundere Kitsune AI},
@@ -308,37 +259,10 @@ If you use this model, please cite:
308
 
309
  ## 🀝 Acknowledgments
310
 
311
- - **Kimi K2** & **Horizon Beta**: alternating-turn teacher models
312
- - **Project Horizon LLM**: methodology & curation
313
- - **Unsloth**: training acceleration
314
- - **Qwen Team**: base architecture
315
- - **Hugging Face / TRL**: libraries & hosting
316
- - **Ollama**: GGUF local runtime
317
-
318
- ## πŸ“¦ Deployment Options
319
-
320
- ### Transformers (PyTorch)
321
- - FP16/BF16 inference on GPU; CPU supported (slower)
322
- - Bitsandbytes 4-bit/8-bit loading for low-VRAM setups
323
-
324
- ### TGI
325
- - Production-grade server with simple HTTP API
326
-
327
- ### Ollama (GGUF)
328
- - Local, offline chat once a GGUF build is produced for this model
329
-
330
- ```bash
331
- # Example Ollama flow (if/when GGUF is published)
332
- curl -fsSL https://ollama.ai/install.sh | sh
333
- ollama pull subsectmusic/qwriko3-4b-instruct-2507
334
- ollama run subsectmusic/qwriko3-4b-instruct-2507 "Hello Riko!"
335
- ```
336
-
337
- ## πŸ“ž Support & Community
338
-
339
- - **Issues:** Open on this repo’s Issues tab
340
- - **Discussions:** Community threads for tips and prompts
341
- - **Updates:** Watch the repo for new model variants and GGUF builds
342
 
343
  ---
344
 
 
5
  - transformers
6
  - qwen3
7
  - gguf
8
+ - ollama
9
+ - tools
10
+ - function-calling
11
  - character-roleplay
12
  - tsundere
13
  - conversational-ai
 
19
  library_name: transformers
20
  ---
21
 
22
+ # 🦊 QwRiko3-4B-Instruct-2507 β€” Tsundere Kitsune AI (GGUF β€’ Ollama β€’ Tools)
23
 
24
  <div align="center">
25
  <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>
 
27
 
28
  ## πŸ“‹ Model Overview
29
 
30
+ **QwRiko3-4B-Instruct-2507** is a conversational AI model fine-tuned to embody **Riko**, a tsundere kitsune character. This release targets **GGUF** for **Ollama** first, with solid **tool calling** support when run via Ollama’s tools API. A PyTorch build (Transformers) is also supported.
31
 
32
  - **Model ID (this repo):** `subsectmusic/qwriko3-4b-instruct-2507`
33
+ - **Primary format:** **GGUF** (Ollama-compatible)
34
+ - **Alt format:** PyTorch (Transformers)
35
  - **Base Model:** `Qwen/Qwen3-4B-Instruct`
 
 
 
 
36
  - **Parameters:** ~4B
37
+ - **License:** Apache-2.0 (repo)
38
+ - **Project:** Project Horizon LLM
39
+ - **Developer:** @subsectmusic
40
+ - **Training Framework:** Unsloth + TRL (SFT)
41
 
42
  ## 🎭 Character Profile: Riko
43
 
44
+ - **Tsundere cadence:** β€œIt’s not like I like you or anything… b-baka!”
45
+ - **Kitsune vibes:** fox-spirit mischief + sly wisdom
46
+ - **Emotional core:** tough shell, soft center
47
  - **Style:** snappy, teasing, ultimately caring
48
 
49
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
+ ## πŸš€ Quick Start (Ollama β€’ GGUF)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
+ > These steps assume you have a local GGUF file named `qwriko3-4b-instruct-2507.Q4_K_M.gguf` in the working directory. If your filename differs, update the `FROM` path in the Modelfile accordingly.
 
 
54
 
55
+ 1) **Create a Modelfile** (exact content below is also saved as `Modelfile` in this package):
 
 
 
 
 
 
 
 
 
 
56
 
57
+ ```Dockerfile
58
+ # Modelfile
59
+ FROM ./qwriko3-4b-instruct-2507.Q4_K_M.gguf
60
+ PARAMETER num_ctx 8192
61
+ # (Optional) you can set temperature/top_p/etc. via `ollama run -p` or the API.
62
+ ```
63
 
64
+ 2) **Create the Ollama model**:
 
 
 
 
 
65
 
66
+ ```bash
67
+ ollama create qwriko3-4b-instruct-2507 -f Modelfile
68
  ```
69
 
70
+ 3) **Chat**:
71
 
72
  ```bash
73
+ ollama run qwriko3-4b-instruct-2507 "Riko, give me a playful hello."
 
 
74
  ```
75
 
76
+ ### Tool Calling with Ollama (cURL)
77
 
78
  ```bash
79
+ curl http://localhost:11434/api/chat -d '{
80
+ "model": "qwriko3-4b-instruct-2507",
81
+ "messages": [
82
+ { "role": "user", "content": "What is the weather today in Toronto?" }
83
+ ],
84
+ "tools": [
85
+ {
86
+ "type": "function",
87
+ "function": {
88
+ "name": "get_current_weather",
89
+ "description": "Get the current weather for a location",
90
+ "parameters": {
91
+ "type": "object",
92
+ "properties": {
93
+ "location": {
94
+ "type": "string",
95
+ "description": "The location to get the weather for, e.g. Toronto"
96
+ },
97
+ "format": {
98
+ "type": "string",
99
+ "description": "Temperature units",
100
+ "enum": ["celsius", "fahrenheit"]
101
+ }
102
+ },
103
+ "required": ["location", "format"]
104
+ }
105
+ }
106
  }
107
+ ]
108
+ }'
109
  ```
110
 
111
+ ### Tool Calling with Ollama (Python)
112
 
113
+ A complete, ready-to-run example is saved as `tools_demo.py` in this package. It defines a couple of functions and lets the model call them. You can run it after installing the Python client:
114
 
115
  ```bash
116
+ pip install -U ollama
117
+ python tools_demo.py
 
 
 
118
  ```
119
 
120
+ ---
121
 
122
+ ## πŸ§ͺ Quick Start (Transformers β€’ PyTorch)
123
 
124
  ```python
125
+ # Requirements:
126
+ # pip install "transformers>=4.42.0" "torch>=2.1.0" accelerate
127
+ # (CUDA recommended; CPU works but is slower.)
128
+
129
  import torch
130
+ from transformers import AutoTokenizer, AutoModelForCausalLM
131
 
132
  MODEL_ID = "subsectmusic/qwriko3-4b-instruct-2507"
133
 
 
138
  device_map="auto"
139
  )
140
 
141
+ messages = [
142
+ {"role": "system", "content": "You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth."},
143
+ {"role": "user", "content": "Hey Riko, how are you today?"}
144
+ ]
145
+
146
+ if hasattr(tokenizer, "apply_chat_template"):
147
  inputs = tokenizer.apply_chat_template(
148
  messages,
149
  tokenize=True,
150
  add_generation_prompt=True,
151
  return_tensors="pt"
152
  ).to(model.device)
153
+ else:
154
+ prompt = (
155
+ "System: You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth.\n"
156
+ "User: Hey Riko, how are you today?\n"
157
+ "Assistant:"
 
 
 
 
 
 
158
  )
159
+ inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
160
+
161
+ gen = model.generate(
162
+ inputs,
163
+ max_new_tokens=256,
164
+ temperature=0.85,
165
+ top_p=0.9,
166
+ top_k=50,
167
+ repetition_penalty=1.1,
168
+ do_sample=True,
169
+ pad_token_id=tokenizer.eos_token_id,
170
+ eos_token_id=tokenizer.eos_token_id,
171
+ )
172
+ out = tokenizer.decode(gen[0][inputs.shape[1]:], skip_special_tokens=True)
173
+ print("\nRiko:", out.strip())
174
  ```
175
 
176
+ ---
177
+
178
  ## πŸ’‘ Use Cases
179
 
180
  - Character roleplay & entertainment
181
+ - Creative writing in a tsundere voice
182
  - Personality-driven chatbots
183
  - Research on alternating-turn distillation & style transfer
184
 
185
+ ## πŸ”¬ Training Summary (SFT)
 
 
 
 
 
 
 
 
 
 
 
 
 
186
 
187
+ - **Format:** ShareGPT-style β†’ Alpaca single-turn pairs
188
+ - **Teachers:** Kimi K2 (odd) + Horizon Beta (even)
189
+ - **Focus:** Tsundere kitsune persona, witty banter, emotional subtext
 
 
 
190
  - **Curation:** Manual filtering for tone & safety
191
 
192
+ Example SFT settings:
193
 
194
  ```yaml
195
  Training Framework: Unsloth + TRL SFTTrainer
 
206
  Precision: fp16
207
  ```
208
 
209
+ ## πŸ“Š Specs
 
 
 
 
 
210
 
211
  | Attribute | Details |
212
  |------------------|-------------------------------|
 
214
  | Parameters | ~4B |
215
  | Base | Qwen/Qwen3-4B-Instruct |
216
  | Context Length | Base-dependent (Qwen3 config) |
217
+ | Formats | **GGUF (Ollama)**; PyTorch |
218
  | Framework | PyTorch + Transformers |
219
  | Optimization | Unsloth-accelerated SFT |
220
  | Style | Tsundere kitsune (Riko) |
 
224
  ```python
225
  generation_config = {
226
  "max_new_tokens": 256,
227
+ "temperature": 0.85,
228
+ "top_p": 0.9,
229
+ "top_k": 50,
230
+ "repetition_penalty": 1.1,
231
  "do_sample": True,
232
  "pad_token_id": tokenizer.eos_token_id,
233
  "eos_token_id": tokenizer.eos_token_id
234
  }
235
  ```
236
 
237
+ ## ⚠️ Notes
238
 
239
+ - In-character style can color responses to factual queries
240
+ - Compact 4B size benefits from clear prompts for complex tasks
241
  - Quantization can slightly affect nuance
242
 
243
+ ## πŸ”’ Ethics
244
 
245
+ - Entertainment & creative use; not professional advice
246
+ - Follow platform/community guidelines
 
247
 
248
  ## πŸ“š Citation
249
 
 
 
250
  ```bibtex
251
  @model{qwriko3-4b-instruct-2507,
252
  title={QwRiko3-4B-Instruct-2507: Tsundere Kitsune AI},
 
259
 
260
  ## 🀝 Acknowledgments
261
 
262
+ - Kimi K2 & Horizon Beta (teachers)
263
+ - Project Horizon LLM (methodology)
264
+ - Unsloth, Qwen Team, Hugging Face / TRL
265
+ - Ollama (GGUF runtime)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
266
 
267
  ---
268