Update README.md (#1)

Browse files

- Update README.md (ec6ec11c4d9e7e9bc6dcdcbad6a519c679a852a7)

Co-authored-by: EREW <[email protected]>

Files changed (1) hide show

README.md +334 -8

README.md CHANGED Viewed

@@ -1,22 +1,348 @@
 ---
-base_model: unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
 tags:
 - text-generation-inference
 - transformers
-- unsloth
 - qwen3
 - gguf
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** subsectmusic
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/qwen3-4b-instruct-2507-unsloth-bnb-4bit
-This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+base_model: Qwen/Qwen3-4B-Instruct
 tags:
 - text-generation-inference
 - transformers
 - qwen3
 - gguf
+- character-roleplay
+- tsundere
+- conversational-ai
+- fine-tuned
 license: apache-2.0
 language:
 - en
+pipeline_tag: text-generation
+library_name: transformers
 ---
+# 🦊 QwRiko3-4B-Instruct-2507 — Tsundere Kitsune AI
+<div align="center">
+  <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>
+</div>
+## 📋 Model Overview
+**QwRiko3-4B-Instruct-2507** is a conversational AI model fine-tuned to embody **Riko**, a tsundere kitsune character. Built on **Qwen3-4B-Instruct**, this release (version **2507**) delivers engaging, personality-driven dialogue with sharp wit, playful bite, and hidden warmth.
+- **Model ID (this repo):** `subsectmusic/qwriko3-4b-instruct-2507`
+- **Base Model:** `Qwen/Qwen3-4B-Instruct`
+- **Project:** Project Horizon LLM
+- **Developer:** @subsectmusic
+- **Training Framework:** Unsloth + Hugging Face TRL (SFT)
+- **License:** Apache-2.0 (repo)
+- **Parameters:** ~4B
+- **Formats:** PyTorch; optional GGUF export for Ollama
+## 🎭 Character Profile: Riko
+- **Tsundere cadence:** “It’s not like I like you or anything… b-baka!”
+- **Kitsune vibes:** fox-spirit mischief + sly wisdom
+- **Emotional core:** tough shell, soft center (rarely admitted)
+- **Style:** snappy, teasing, ultimately caring
+## 🚀 Quick Start
+### Option 1 — Hugging Face Transformers (Python)
+```python
+# QwRiko3-4B-Instruct-2507 — Complete, ready-to-run example
+# Requirements:
+#   pip install transformers>=4.42.0 torch>=2.1.0 accelerate
+#   (CUDA recommended; works on CPU with slower generation)
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+MODEL_ID = "subsectmusic/qwriko3-4b-instruct-2507"
+# Load tokenizer & model
+tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_ID,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+# Chat messages using the model's chat template (preferred)
+messages = [
+    {"role": "system", "content": "You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth."},
+    {"role": "user", "content": "Hey Riko, how are you today?"}
+]
+# Apply chat template if available; otherwise fall back to a plain prompt
+if hasattr(tokenizer, "apply_chat_template"):
+    inputs = tokenizer.apply_chat_template(
+        messages,
+        tokenize=True,
+        add_generation_prompt=True,
+        return_tensors="pt"
+    )
+else:
+    # Fallback prompt string (works without chat template)
+    prompt = (
+        "System: You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth.\n"
+        "User: Hey Riko, how are you today?\n"
+        "Assistant:"
+    )
+    inputs = tokenizer(prompt, return_tensors="pt").input_ids
+# Move inputs to the same device as model
+if hasattr(inputs, "to"):
+    inputs = inputs.to(model.device)
+# Sensible generation defaults for a 4B instruct chat model
+gen_kwargs = {
+    "max_new_tokens": 256,
+    "temperature": 0.85,
+    "top_p": 0.9,
+    "top_k": 50,
+    "repetition_penalty": 1.1,
+    "do_sample": True,
+    "pad_token_id": tokenizer.eos_token_id,
+    "eos_token_id": tokenizer.eos_token_id,
+}
+with torch.no_grad():
+    output = model.generate(inputs, **gen_kwargs)
+# If we used the chat template, slice after the prompt tokens
+if hasattr(tokenizer, "apply_chat_template"):
+    prompt_len = inputs.shape[1]
+    text = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
+else:
+    text = tokenizer.decode(output[0], skip_special_tokens=True)
+print("\nRiko:", text.strip())
+```
+### Option 2 — Text Generation Inference (TGI)
+```bash
+# Start a local TGI server serving the model
+# Requirements: text-generation-inference installed and a GPU is recommended
+text-generation-launcher --model-id subsectmusic/qwriko3-4b-instruct-2507 --hostname 0.0.0.0 --port 8080
+```
+Example request:
+```bash
+curl http://localhost:8080/generate   -X POST   -H "Content-Type: application/json"   -d '{
+    "inputs": [
+      {"role":"system","content":"You are Riko, a tsundere kitsune AI."},
+      {"role":"user","content":"Write a playful greeting in your style."}
+    ],
+    "parameters": {
+      "max_new_tokens": 200,
+      "temperature": 0.9,
+      "top_p": 0.9,
+      "repetition_penalty": 1.1
+    }
+  }'
+```
+### Option 3 — Ollama (GGUF)
+If you export or publish a GGUF build of this model:
+```bash
+# Pull (requires a GGUF build with this exact tag to be available)
+ollama pull subsectmusic/qwriko3-4b-instruct-2507
+# Chat
+ollama run subsectmusic/qwriko3-4b-instruct-2507 "Riko, give me some fox-spirit advice for a Monday."
+```
+> Tip: To create a local GGUF for testing, convert via llama.cpp/Qwen-compatible tools and set an `Modelfile` with the chat template matching Qwen3.
+## 🧪 Minimal Conversation Template (Python)
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+MODEL_ID = "subsectmusic/qwriko3-4b-instruct-2507"
+tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_ID,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+def chat(user_text: str) -> str:
+    messages = [
+        {"role": "system", "content": "You are Riko, a tsundere kitsune AI. Reply in-character."},
+        {"role": "user", "content": user_text}
+    ]
+    inputs = tokenizer.apply_chat_template(
+        messages,
+        tokenize=True,
+        add_generation_prompt=True,
+        return_tensors="pt"
+    ).to(model.device)
+    output = model.generate(
+        inputs,
+        max_new_tokens=256,
+        temperature=0.85,
+        top_p=0.9,
+        top_k=50,
+        repetition_penalty=1.1,
+        do_sample=True,
+        pad_token_id=tokenizer.eos_token_id,
+        eos_token_id=tokenizer.eos_token_id
+    )
+    text = tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True)
+    return text.strip()
+print(chat("Give me a short pep talk for studying."))
+```
+## 💡 Use Cases
+- Character roleplay & entertainment
+- Creative writing assistance (tsundere voice)
+- Personality-driven chatbots
+- Research on alternating-turn distillation & style transfer
+## 🔬 Project Horizon LLM Methodology
+**Alternating-turn distillation** to preserve consistent character voice:
+1. Extract human/user turns from multi-turn chats
+2. Generate responses from two high-quality sources in alternation (e.g., **Kimi K2** → odd turns, **Horizon Beta** → even turns)
+3. Curate for Riko’s tsundere persona
+4. Compile into supervised fine-tuning (SFT) dataset
+5. Fine-tune **Qwen3-4B-Instruct** using **Unsloth + TRL**
+Benefits:
+- Personality consistency across topics
+- Response diversity from multiple teacher styles
+- Efficient transfer into a compact 4B model
+## 🛠️ Training Details
+### Dataset & Method
+- **Format:** ShareGPT-style → Alpaca single-turn pairs
+- **Teachers:** Kimi K2 (odd) + Horizon Beta (even)
+- **Focus:** Tsundere kitsune persona, witty banter, emotional subtext
+- **Curation:** Manual filtering for tone & safety
+### Example Training Config (SFT)
+```yaml
+Training Framework: Unsloth + TRL SFTTrainer
+Base Model: Qwen/Qwen3-4B-Instruct
+Batch Size: 2 per device
+Gradient Accumulation: 4
+Learning Rate: 2e-4
+Optimizer: AdamW 8-bit
+Weight Decay: 0.01
+Scheduler: Linear
+Max Steps: 100+
+Warmup Steps: 5
+Sequence Length: up to model context
+Precision: fp16
+```
+### Performance Notes
+- **Compact:** ~4B parameters for fast local use
+- **Unsloth optimizations:** faster training/inference
+- **Quantization:** 4-bit/8-bit supported via bitsandbytes (PyTorch) and GGUF (Ollama) if exported
+## 📊 Model Specifications
+| Attribute        | Details                       |
+|------------------|-------------------------------|
+| Architecture     | Qwen3 Transformer             |
+| Parameters       | ~4B                           |
+| Base             | Qwen/Qwen3-4B-Instruct        |
+| Context Length   | Base-dependent (Qwen3 config) |
+| Formats          | PyTorch; GGUF (optional)      |
+| Framework        | PyTorch + Transformers        |
+| Optimization     | Unsloth-accelerated SFT       |
+| Style            | Tsundere kitsune (Riko)       |
+## 🎯 Recommended Inference Settings
+```python
+generation_config = {
+    "max_new_tokens": 256,
+    "temperature": 0.85,       # playful but coherent
+    "top_p": 0.9,              # nucleus sampling
+    "top_k": 50,               # limit candidate tokens
+    "repetition_penalty": 1.1, # reduce loops
+    "do_sample": True,
+    "pad_token_id": tokenizer.eos_token_id,
+    "eos_token_id": tokenizer.eos_token_id
+}
+```
+## ⚠️ Limitations
+- In-character bias (tsundere tone) may color factual or technical answers
+- Compact 4B size: may require careful prompting for complex tasks
+- Quantization can slightly affect nuance
+## 🔒 Ethical Considerations
+- Designed for entertainment and creative use
+- Not for professional advice or therapy
+- Follow platform guidelines and content policies
+## 📚 Citation
+If you use this model, please cite:
+```bibtex
+@model{qwriko3-4b-instruct-2507,
+  title={QwRiko3-4B-Instruct-2507: Tsundere Kitsune AI},
+  author={subsectmusic},
+  year={2025},
+  publisher={Hugging Face},
+  url={https://huggingface.co/subsectmusic/qwriko3-4b-instruct-2507}
+}
+```
+## 🤝 Acknowledgments
+- **Kimi K2** & **Horizon Beta**: alternating-turn teacher models
+- **Project Horizon LLM**: methodology & curation
+- **Unsloth**: training acceleration
+- **Qwen Team**: base architecture
+- **Hugging Face / TRL**: libraries & hosting
+- **Ollama**: GGUF local runtime
+## 📦 Deployment Options
+### Transformers (PyTorch)
+- FP16/BF16 inference on GPU; CPU supported (slower)
+- Bitsandbytes 4-bit/8-bit loading for low-VRAM setups
+### TGI
+- Production-grade server with simple HTTP API
+### Ollama (GGUF)
+- Local, offline chat once a GGUF build is produced for this model
+```bash
+# Example Ollama flow (if/when GGUF is published)
+curl -fsSL https://ollama.ai/install.sh | sh
+ollama pull subsectmusic/qwriko3-4b-instruct-2507
+ollama run subsectmusic/qwriko3-4b-instruct-2507 "Hello Riko!"
+```
+## 📞 Support & Community
+- **Issues:** Open on this repo’s Issues tab
+- **Discussions:** Community threads for tips and prompts
+- **Updates:** Watch the repo for new model variants and GGUF builds
+---
+<div align="center">
+  <b>Made with ❤️ using Unsloth</b><br>
+  <i>Training AI personalities, one tsundere at a time!</i>
+</div>