---
base_model: unsloth/Qwen3-7b-Base-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3
- gguf
- character-roleplay
- tsundere
- conversational-ai
- fine-tuned
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# 🦊 Riko-Qwen3-7b: Tsundere Kitsune AI
## 📋 Model Overview
**Riko-Qwen3-7b** is a specialized conversational AI model fine-tuned to embody the personality of Riko, a tsundere kitsune character. Part of **Project Horizon LLM**, this model was trained using alternating responses from Kimi K2 and Horizon Beta, built on the robust Qwen3-7b foundation, delivering engaging, personality-driven conversations with authentic tsundere characteristics.
- **Base Model:** unsloth/Qwen3-7b-Base-unsloth-bnb-4bit
- **Source Models:** Kimi K2 + Horizon Beta (alternating turns)
- **Project:** Project Horizon LLM
- **Developer:** subsectmusic
- **Training Framework:** Unsloth + Hugging Face TRL
- **Training Speed:** 2x faster optimization via Unsloth
- **License:** Apache 2.0
- **Model Size:** 7b parameters (4-bit quantized)
- **Format Support:** GGUF compatible for Ollama deployment
## 🎭 Character Profile: Riko
Riko is a tsundere kitsune AI with a complex personality that balances tough exterior attitudes with hidden warmth and care. Key traits include:
- **Tsundere Behavior:** Classic "it's not like I like you or anything!" responses
- **Kitsune Heritage:** Fox-spirit wisdom mixed with playful mischief
- **Emotional Depth:** Genuine care hidden behind defensive barriers
- **Conversational Style:** Witty, sometimes sarcastic, but ultimately endearing
## 🚀 Quick Start
### Option 1: Hugging Face Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "subsectmusic/riko-qwen3-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
```
### Option 2: Ollama Deployment (GGUF)
```bash
# Pull the GGUF model for Ollama
ollama pull subsectmusic/riko-qwen3-7b
# Start chatting with Riko
ollama run subsectmusic/riko-qwen3-7b
```
### Conversation Template
```python
prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
You are Riko, respond as the tsundere kitsune AI with your usual personality.
### Input:
{user_message}
### Response:
"""
# Generate response
user_input = "Hello Riko, how are you today?"
prompt = prompt_template.format(user_message=user_input)
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.8,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Riko: {response}")
```
## 💡 Use Cases
- **Interactive Roleplay:** Engaging character-based conversations with tsundere personality
- **Local Deployment:** Run efficiently on personal hardware via Ollama/GGUF
- **Creative Writing:** Generate authentic tsundere character dialogue and interactions
- **Chatbot Applications:** Personality-driven AI assistant with character consistency
- **Entertainment:** Fun, character-consistent interactions with kitsune AI personality
- **Research:** Study knowledge distillation from larger models (Kimi K2 → Qwen3-7b)
- **Educational:** Understanding Project Horizon LLM methodology and alternating training approaches
## 🔬 Project Horizon LLM Methodology
**Project Horizon LLM** represents an innovative approach to knowledge distillation and character-consistent AI training:
### Distillation Process
- **Source Models:**
- **Kimi K2** (Turn 1, 3, 5... responses)
- **Horizon Beta** (Turn 2, 4, 6... responses) - OpenRouter's cloaked model (#2 Translation, #3 Programming)
- **Target Model:** Qwen3-7b (student model)
- **Knowledge Transfer:** Personality traits and response patterns from both high-quality models
- **Character Focus:** Specialized curation for tsundere kitsune personality (Riko)
### Alternating Turn Training
The training methodology involves:
1. **Human Query Extraction:** Extract the human/user portions from conversation datasets
2. **Turn 1:** Feed query to **Kimi K2** → Generate response
3. **Turn 2:** Feed next query to **Horizon Beta** → Generate response
4. **Alternating Pattern:** Continue alternating between Kimi K2 and Horizon Beta for each turn
5. **Response Curation:** Select and refine responses that best match Riko's tsundere personality
6. **Dataset Compilation:** Combine curated human queries with personality-matched responses
7. **Fine-tuning:** Train Qwen3-7b on the curated dataset using Unsloth + TRL
This approach ensures:
- **Personality Consistency:** Responses align with Riko's tsundere kitsune character
- **Response Diversity:** Multiple LLM perspectives create varied, natural conversations
- **Knowledge Distillation:** Key traits from larger models transferred to smaller, efficient models
- **Quality Control:** Human curation ensures character authenticity
## 🛠️ Training Details
### Dataset & Methodology
- **Project:** Project Horizon LLM alternating methodology
- **Source Format:** ShareGPT converted to Alpaca format
- **Source Models:** Kimi K2 and Horizon Beta (alternating responses)
- **Training Approach:** Turn-based alternating - human queries fed alternately to Kimi K2 (turn 1) and Horizon Beta (turn 2)
- **Content:** Curated conversations showcasing Riko's tsundere kitsune personality
- **Size:** Custom dataset focused on character consistency and personality traits
- **Quality:** Filtered and refined responses from both models for authentic tsundere character traits
### Training Configuration
```yaml
Training Framework: Unsloth + TRL SFTTrainer
Batch Size: 2 (per device)
Gradient Accumulation: 4 steps
Learning Rate: 2e-4
Optimizer: AdamW 8-bit
Weight Decay: 0.01
Scheduler: Linear
Max Steps: 100+
Warmup Steps: 5
Sequence Length: Dynamic (up to context limit)
```
### Performance Optimizations
- **4-bit Quantization:** Efficient memory usage
- **Gradient Accumulation Fix:** Implemented Unsloth's gradient bug fix
- **Fast Inference:** 2x speed improvement via Unsloth optimizations
## 📊 Model Specifications
| Attribute | Details |
|-----------|---------|
| Architecture | Qwen3 Transformer |
| Parameters | 7b (4-bit quantized) |
| Source Models | Kimi K2 + Horizon Beta (alternating) |
| Project | Project Horizon LLM |
| Context Length | Model dependent |
| Quantization | 4-bit BNB |
| Format Support | PyTorch, GGUF (Ollama compatible) |
| Framework | PyTorch + Transformers |
| Optimization | Unsloth accelerated |
| Training Method | Turn-based alternating between two high-quality models |
## 🎯 Recommended Inference Settings
```python
generation_config = {
"max_new_tokens": 256,
"temperature": 0.8, # Balanced creativity
"top_p": 0.9, # Focused sampling
"top_k": 50, # Vocabulary limiting
"repetition_penalty": 1.1, # Reduce repetition
"do_sample": True, # Enable sampling
"pad_token_id": tokenizer.eos_token_id
}
```
## ⚠️ Limitations & Considerations
- **Character Consistency:** Performance depends on prompt quality and context
- **Content Scope:** Optimized for conversational roleplay, may struggle with technical tasks
- **Quantization Effects:** 4-bit quantization may impact some response nuances
- **Training Data:** Limited to specific personality patterns in training set
- **Language:** Primarily trained on English conversations
## 🔒 Ethical Considerations
- This model is designed for entertainment and creative purposes
- Users should be aware they're interacting with an AI character, not a real person
- Content generation should align with platform and community guidelines
- Not intended for therapeutic, advisory, or decision-making applications
## 📚 Citation
If you use this model in your research or applications, please cite:
```bibtex
@model{riko-qwen3-7b,
title={Riko-Qwen3-7b: Tsundere Kitsune AI},
author={subsectmusic},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/subsectmusic/riko-qwen3-7b}
}
```
## 🤝 Acknowledgments
- **Kimi K2 Team:** For providing high-quality responses in the alternating training (odd turns)
- **Horizon Beta Team:** For the excellent cloaked model responses in alternating training (even turns)
- **OpenRouter:** For providing access to Horizon Beta during the community testing period
- **Project Horizon LLM:** For the innovative alternating turn training methodology
- **Unsloth Team:** For the incredible training acceleration framework
- **Qwen Team:** For the robust base model architecture
- **Hugging Face:** For the transformers library and model hosting
- **TRL Team:** For the supervised fine-tuning framework
- **Ollama Team:** For GGUF support and local deployment capabilities
## 📦 Deployment Options
### Hugging Face Transformers
- Standard PyTorch deployment
- Full precision and quantized versions
- GPU acceleration support
- Integration with existing HF pipelines
### Ollama/GGUF
- Local deployment without internet
- Efficient CPU/GPU inference
- Easy installation and management
- Cross-platform compatibility
- Reduced VRAM requirements
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Run Riko locally
ollama pull subsectmusic/riko-qwen3-7b
ollama run subsectmusic/riko-qwen3-7b "Hello Riko!"
```
## 📞 Support & Community
- **Issues:** Report via GitHub Issues
- **Discussions:** Join the community discussions
- **Updates:** Follow for model improvements and versions
---
Made with ❤️ using Unsloth
Training AI personalities, one tsundere at a time!