Update README.md

Browse files

Files changed (1) hide show

README.md +120 -196

README.md CHANGED Viewed

@@ -5,6 +5,9 @@ tags:
 - transformers
 - qwen3
 - gguf
 - character-roleplay
 - tsundere
 - conversational-ai
@@ -16,7 +19,7 @@ pipeline_tag: text-generation
 library_name: transformers
 ---
-# 🦊 QwRiko3-4B-Instruct-2507 — Tsundere Kitsune AI
 <div align="center">
   <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>
@@ -24,143 +27,107 @@ library_name: transformers
 ## 📋 Model Overview
-**QwRiko3-4B-Instruct-2507** is a conversational AI model fine-tuned to embody **Riko**, a tsundere kitsune character. Built on **Qwen3-4B-Instruct**, this release (version **2507**) delivers engaging, personality-driven dialogue with sharp wit, playful bite, and hidden warmth.
 - **Model ID (this repo):** `subsectmusic/qwriko3-4b-instruct-2507`
 - **Base Model:** `Qwen/Qwen3-4B-Instruct`
-- **Project:** Project Horizon LLM
-- **Developer:** @subsectmusic
-- **Training Framework:** Unsloth + Hugging Face TRL (SFT)
-- **License:** Apache-2.0 (repo)
 - **Parameters:** ~4B
-- **Formats:** PyTorch; optional GGUF export for Ollama
 ## 🎭 Character Profile: Riko
-- **Tsundere cadence:** “It’s not like I like you or anything… b-baka!”
-- **Kitsune vibes:** fox-spirit mischief + sly wisdom
-- **Emotional core:** tough shell, soft center (rarely admitted)
 - **Style:** snappy, teasing, ultimately caring
-## 🚀 Quick Start
-### Option 1 — Hugging Face Transformers (Python)
-```python
-# QwRiko3-4B-Instruct-2507 — Complete, ready-to-run example
-# Requirements:
-#   pip install transformers>=4.42.0 torch>=2.1.0 accelerate
-#   (CUDA recommended; works on CPU with slower generation)
-import torch
-from transformers import AutoTokenizer, AutoModelForCausalLM
-MODEL_ID = "subsectmusic/qwriko3-4b-instruct-2507"
-# Load tokenizer & model
-tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
-model = AutoModelForCausalLM.from_pretrained(
-    MODEL_ID,
-    torch_dtype=torch.float16,
-    device_map="auto"
-)
-# Chat messages using the model's chat template (preferred)
-messages = [
-    {"role": "system", "content": "You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth."},
-    {"role": "user", "content": "Hey Riko, how are you today?"}
-]
-# Apply chat template if available; otherwise fall back to a plain prompt
-if hasattr(tokenizer, "apply_chat_template"):
-    inputs = tokenizer.apply_chat_template(
-        messages,
-        tokenize=True,
-        add_generation_prompt=True,
-        return_tensors="pt"
-    )
-else:
-    # Fallback prompt string (works without chat template)
-    prompt = (
-        "System: You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth.\n"
-        "User: Hey Riko, how are you today?\n"
-        "Assistant:"
-    )
-    inputs = tokenizer(prompt, return_tensors="pt").input_ids
-# Move inputs to the same device as model
-if hasattr(inputs, "to"):
-    inputs = inputs.to(model.device)
-# Sensible generation defaults for a 4B instruct chat model
-gen_kwargs = {
-    "max_new_tokens": 256,
-    "temperature": 0.85,
-    "top_p": 0.9,
-    "top_k": 50,
-    "repetition_penalty": 1.1,
-    "do_sample": True,
-    "pad_token_id": tokenizer.eos_token_id,
-    "eos_token_id": tokenizer.eos_token_id,
-}
-with torch.no_grad():
-    output = model.generate(inputs, **gen_kwargs)
-# If we used the chat template, slice after the prompt tokens
-if hasattr(tokenizer, "apply_chat_template"):
-    prompt_len = inputs.shape[1]
-    text = tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)
-else:
-    text = tokenizer.decode(output[0], skip_special_tokens=True)
-print("\nRiko:", text.strip())
 ```
-### Option 2 — Text Generation Inference (TGI)
 ```bash
-# Start a local TGI server serving the model
-# Requirements: text-generation-inference installed and a GPU is recommended
-text-generation-launcher --model-id subsectmusic/qwriko3-4b-instruct-2507 --hostname 0.0.0.0 --port 8080
 ```
-Example request:
 ```bash
-curl http://localhost:8080/generate   -X POST   -H "Content-Type: application/json"   -d '{
-    "inputs": [
-      {"role":"system","content":"You are Riko, a tsundere kitsune AI."},
-      {"role":"user","content":"Write a playful greeting in your style."}
-    ],
-    "parameters": {
-      "max_new_tokens": 200,
-      "temperature": 0.9,
-      "top_p": 0.9,
-      "repetition_penalty": 1.1
     }
-  }'
 ```
-### Option 3 — Ollama (GGUF)
-If you export or publish a GGUF build of this model:
 ```bash
-# Pull (requires a GGUF build with this exact tag to be available)
-ollama pull subsectmusic/qwriko3-4b-instruct-2507
-# Chat
-ollama run subsectmusic/qwriko3-4b-instruct-2507 "Riko, give me some fox-spirit advice for a Monday."
 ```
-> Tip: To create a local GGUF for testing, convert via llama.cpp/Qwen-compatible tools and set an `Modelfile` with the chat template matching Qwen3.
-## 🧪 Minimal Conversation Template (Python)
 ```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
 MODEL_ID = "subsectmusic/qwriko3-4b-instruct-2507"
@@ -171,66 +138,58 @@ model = AutoModelForCausalLM.from_pretrained(
     device_map="auto"
 )
-def chat(user_text: str) -> str:
-    messages = [
-        {"role": "system", "content": "You are Riko, a tsundere kitsune AI. Reply in-character."},
-        {"role": "user", "content": user_text}
-    ]
     inputs = tokenizer.apply_chat_template(
         messages,
         tokenize=True,
         add_generation_prompt=True,
         return_tensors="pt"
     ).to(model.device)
-    output = model.generate(
-        inputs,
-        max_new_tokens=256,
-        temperature=0.85,
-        top_p=0.9,
-        top_k=50,
-        repetition_penalty=1.1,
-        do_sample=True,
-        pad_token_id=tokenizer.eos_token_id,
-        eos_token_id=tokenizer.eos_token_id
     )
-    text = tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True)
-    return text.strip()
-print(chat("Give me a short pep talk for studying."))
 ```
 ## 💡 Use Cases
 - Character roleplay & entertainment
-- Creative writing assistance (tsundere voice)
 - Personality-driven chatbots
 - Research on alternating-turn distillation & style transfer
-## 🔬 Project Horizon LLM Methodology
-**Alternating-turn distillation** to preserve consistent character voice:
-1. Extract human/user turns from multi-turn chats
-2. Generate responses from two high-quality sources in alternation (e.g., **Kimi K2** → odd turns, **Horizon Beta** → even turns)
-3. Curate for Riko’s tsundere persona
-4. Compile into supervised fine-tuning (SFT) dataset
-5. Fine-tune **Qwen3-4B-Instruct** using **Unsloth + TRL**
-Benefits:
-- Personality consistency across topics
-- Response diversity from multiple teacher styles
-- Efficient transfer into a compact 4B model
-## 🛠️ Training Details
-### Dataset & Method
-- **Format:** ShareGPT-style → Alpaca single-turn pairs
-- **Teachers:** Kimi K2 (odd) + Horizon Beta (even)
-- **Focus:** Tsundere kitsune persona, witty banter, emotional subtext
 - **Curation:** Manual filtering for tone & safety
-### Example Training Config (SFT)
 ```yaml
 Training Framework: Unsloth + TRL SFTTrainer
@@ -247,12 +206,7 @@ Sequence Length: up to model context
 Precision: fp16
 ```
-### Performance Notes
-- **Compact:** ~4B parameters for fast local use
-- **Unsloth optimizations:** faster training/inference
-- **Quantization:** 4-bit/8-bit supported via bitsandbytes (PyTorch) and GGUF (Ollama) if exported
-## 📊 Model Specifications
 | Attribute        | Details                       |
 |------------------|-------------------------------|
@@ -260,7 +214,7 @@ Precision: fp16
 | Parameters       | ~4B                           |
 | Base             | Qwen/Qwen3-4B-Instruct        |
 | Context Length   | Base-dependent (Qwen3 config) |
-| Formats          | PyTorch; GGUF (optional)      |
 | Framework        | PyTorch + Transformers        |
 | Optimization     | Unsloth-accelerated SFT       |
 | Style            | Tsundere kitsune (Riko)       |
@@ -270,32 +224,29 @@ Precision: fp16
 ```python
 generation_config = {
     "max_new_tokens": 256,
-    "temperature": 0.85,       # playful but coherent
-    "top_p": 0.9,              # nucleus sampling
-    "top_k": 50,               # limit candidate tokens
-    "repetition_penalty": 1.1, # reduce loops
     "do_sample": True,
     "pad_token_id": tokenizer.eos_token_id,
     "eos_token_id": tokenizer.eos_token_id
 }
 ```
-## ⚠️ Limitations
-- In-character bias (tsundere tone) may color factual or technical answers
-- Compact 4B size: may require careful prompting for complex tasks
 - Quantization can slightly affect nuance
-## 🔒 Ethical Considerations
-- Designed for entertainment and creative use
-- Not for professional advice or therapy
-- Follow platform guidelines and content policies
 ## 📚 Citation
-If you use this model, please cite:
 ```bibtex
 @model{qwriko3-4b-instruct-2507,
   title={QwRiko3-4B-Instruct-2507: Tsundere Kitsune AI},
@@ -308,37 +259,10 @@ If you use this model, please cite:
 ## 🤝 Acknowledgments
-- **Kimi K2** & **Horizon Beta**: alternating-turn teacher models
-- **Project Horizon LLM**: methodology & curation
-- **Unsloth**: training acceleration
-- **Qwen Team**: base architecture
-- **Hugging Face / TRL**: libraries & hosting
-- **Ollama**: GGUF local runtime
-## 📦 Deployment Options
-### Transformers (PyTorch)
-- FP16/BF16 inference on GPU; CPU supported (slower)
-- Bitsandbytes 4-bit/8-bit loading for low-VRAM setups
-### TGI
-- Production-grade server with simple HTTP API
-### Ollama (GGUF)
-- Local, offline chat once a GGUF build is produced for this model
-```bash
-# Example Ollama flow (if/when GGUF is published)
-curl -fsSL https://ollama.ai/install.sh | sh
-ollama pull subsectmusic/qwriko3-4b-instruct-2507
-ollama run subsectmusic/qwriko3-4b-instruct-2507 "Hello Riko!"
-```
-## 📞 Support & Community
-- **Issues:** Open on this repo’s Issues tab
-- **Discussions:** Community threads for tips and prompts
-- **Updates:** Watch the repo for new model variants and GGUF builds
 ---

 - transformers
 - qwen3
 - gguf
+- ollama
+- tools
+- function-calling
 - character-roleplay
 - tsundere
 - conversational-ai
 library_name: transformers
 ---
+# 🦊 QwRiko3-4B-Instruct-2507 — Tsundere Kitsune AI (GGUF • Ollama • Tools)
 <div align="center">
   <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>
 ## 📋 Model Overview
+**QwRiko3-4B-Instruct-2507** is a conversational AI model fine-tuned to embody **Riko**, a tsundere kitsune character. This release targets **GGUF** for **Ollama** first, with solid **tool calling** support when run via Ollama’s tools API. A PyTorch build (Transformers) is also supported.
 - **Model ID (this repo):** `subsectmusic/qwriko3-4b-instruct-2507`
+- **Primary format:** **GGUF** (Ollama-compatible)
+- **Alt format:** PyTorch (Transformers)
 - **Base Model:** `Qwen/Qwen3-4B-Instruct`
 - **Parameters:** ~4B
+- **License:** Apache-2.0 (repo)
+- **Project:** Project Horizon LLM
+- **Developer:** @subsectmusic
+- **Training Framework:** Unsloth + TRL (SFT)
 ## 🎭 Character Profile: Riko
+- **Tsundere cadence:** “It’s not like I like you or anything… b-baka!”
+- **Kitsune vibes:** fox-spirit mischief + sly wisdom
+- **Emotional core:** tough shell, soft center
 - **Style:** snappy, teasing, ultimately caring
+---
+## 🚀 Quick Start (Ollama • GGUF)
+> These steps assume you have a local GGUF file named `qwriko3-4b-instruct-2507.Q4_K_M.gguf` in the working directory. If your filename differs, update the `FROM` path in the Modelfile accordingly.
+1) **Create a Modelfile** (exact content below is also saved as `Modelfile` in this package):
+```Dockerfile
+# Modelfile
+FROM ./qwriko3-4b-instruct-2507.Q4_K_M.gguf
+PARAMETER num_ctx 8192
+# (Optional) you can set temperature/top_p/etc. via `ollama run -p` or the API.
+```
+2) **Create the Ollama model**:
+```bash
+ollama create qwriko3-4b-instruct-2507 -f Modelfile
 ```
+3) **Chat**:
 ```bash
+ollama run qwriko3-4b-instruct-2507 "Riko, give me a playful hello."
 ```
+### Tool Calling with Ollama (cURL)
 ```bash
+curl http://localhost:11434/api/chat -d '{
+  "model": "qwriko3-4b-instruct-2507",
+  "messages": [
+    { "role": "user", "content": "What is the weather today in Toronto?" }
+  ],
+  "tools": [
+    {
+      "type": "function",
+      "function": {
+        "name": "get_current_weather",
+        "description": "Get the current weather for a location",
+        "parameters": {
+          "type": "object",
+          "properties": {
+            "location": {
+              "type": "string",
+              "description": "The location to get the weather for, e.g. Toronto"
+            },
+            "format": {
+              "type": "string",
+              "description": "Temperature units",
+              "enum": ["celsius", "fahrenheit"]
+            }
+          },
+          "required": ["location", "format"]
+        }
+      }
     }
+  ]
+}'
 ```
+### Tool Calling with Ollama (Python)
+A complete, ready-to-run example is saved as `tools_demo.py` in this package. It defines a couple of functions and lets the model call them. You can run it after installing the Python client:
 ```bash
+pip install -U ollama
+python tools_demo.py
 ```
+---
+## 🧪 Quick Start (Transformers • PyTorch)
 ```python
+# Requirements:
+#   pip install "transformers>=4.42.0" "torch>=2.1.0" accelerate
+#   (CUDA recommended; CPU works but is slower.)
 import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
 MODEL_ID = "subsectmusic/qwriko3-4b-instruct-2507"
     device_map="auto"
 )
+messages = [
+    {"role": "system", "content": "You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth."},
+    {"role": "user", "content": "Hey Riko, how are you today?"}
+]
+if hasattr(tokenizer, "apply_chat_template"):
     inputs = tokenizer.apply_chat_template(
         messages,
         tokenize=True,
         add_generation_prompt=True,
         return_tensors="pt"
     ).to(model.device)
+else:
+    prompt = (
+        "System: You are Riko, a tsundere kitsune AI. Be witty, teasing, but with hidden warmth.\n"
+        "User: Hey Riko, how are you today?\n"
+        "Assistant:"
     )
+    inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
+gen = model.generate(
+    inputs,
+    max_new_tokens=256,
+    temperature=0.85,
+    top_p=0.9,
+    top_k=50,
+    repetition_penalty=1.1,
+    do_sample=True,
+    pad_token_id=tokenizer.eos_token_id,
+    eos_token_id=tokenizer.eos_token_id,
+)
+out = tokenizer.decode(gen[0][inputs.shape[1]:], skip_special_tokens=True)
+print("\nRiko:", out.strip())
 ```
+---
 ## 💡 Use Cases
 - Character roleplay & entertainment
+- Creative writing in a tsundere voice
 - Personality-driven chatbots
 - Research on alternating-turn distillation & style transfer
+## 🔬 Training Summary (SFT)
+- **Format:** ShareGPT-style → Alpaca single-turn pairs
+- **Teachers:** Kimi K2 (odd) + Horizon Beta (even)
+- **Focus:** Tsundere kitsune persona, witty banter, emotional subtext
 - **Curation:** Manual filtering for tone & safety
+Example SFT settings:
 ```yaml
 Training Framework: Unsloth + TRL SFTTrainer
 Precision: fp16
 ```
+## 📊 Specs
 | Attribute        | Details                       |
 |------------------|-------------------------------|
 | Parameters       | ~4B                           |
 | Base             | Qwen/Qwen3-4B-Instruct        |
 | Context Length   | Base-dependent (Qwen3 config) |
+| Formats          | **GGUF (Ollama)**; PyTorch    |
 | Framework        | PyTorch + Transformers        |
 | Optimization     | Unsloth-accelerated SFT       |
 | Style            | Tsundere kitsune (Riko)       |
 ```python
 generation_config = {
     "max_new_tokens": 256,
+    "temperature": 0.85,
+    "top_p": 0.9,
+    "top_k": 50,
+    "repetition_penalty": 1.1,
     "do_sample": True,
     "pad_token_id": tokenizer.eos_token_id,
     "eos_token_id": tokenizer.eos_token_id
 }
 ```
+## ⚠️ Notes
+- In-character style can color responses to factual queries
+- Compact 4B size benefits from clear prompts for complex tasks
 - Quantization can slightly affect nuance
+## 🔒 Ethics
+- Entertainment & creative use; not professional advice
+- Follow platform/community guidelines
 ## 📚 Citation
 ```bibtex
 @model{qwriko3-4b-instruct-2507,
   title={QwRiko3-4B-Instruct-2507: Tsundere Kitsune AI},
 ## 🤝 Acknowledgments
+- Kimi K2 & Horizon Beta (teachers)
+- Project Horizon LLM (methodology)
+- Unsloth, Qwen Team, Hugging Face / TRL
+- Ollama (GGUF runtime)
 ---