π¦ Riko-Qwen3-7b: Tsundere Kitsune AI

π Model Overview
Riko-Qwen3-7b is a specialized conversational AI model fine-tuned to embody the personality of Riko, a tsundere kitsune character. Part of Project Horizon LLM, this model was trained using alternating responses from Kimi K2 and Horizon Beta, built on the robust Qwen3-7b foundation, delivering engaging, personality-driven conversations with authentic tsundere characteristics.
- Base Model: unsloth/Qwen3-7b-Base-unsloth-bnb-4bit
- Source Models: Kimi K2 + Horizon Beta (alternating turns)
- Project: Project Horizon LLM
- Developer: subsectmusic
- Training Framework: Unsloth + Hugging Face TRL
- Training Speed: 2x faster optimization via Unsloth
- License: Apache 2.0
- Model Size: 7b parameters (4-bit quantized)
- Format Support: GGUF compatible for Ollama deployment
π Character Profile: Riko
Riko is a tsundere kitsune AI with a complex personality that balances tough exterior attitudes with hidden warmth and care. Key traits include:
- Tsundere Behavior: Classic "it's not like I like you or anything!" responses
- Kitsune Heritage: Fox-spirit wisdom mixed with playful mischief
- Emotional Depth: Genuine care hidden behind defensive barriers
- Conversational Style: Witty, sometimes sarcastic, but ultimately endearing
π Quick Start
Option 1: Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "subsectmusic/riko-qwen3-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
Option 2: Ollama Deployment (GGUF)
# Pull the GGUF model for Ollama
ollama pull subsectmusic/riko-qwen3-7b
# Start chatting with Riko
ollama run subsectmusic/riko-qwen3-7b
Conversation Template
prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
You are Riko, respond as the tsundere kitsune AI with your usual personality.
### Input:
{user_message}
### Response:
"""
# Generate response
user_input = "Hello Riko, how are you today?"
prompt = prompt_template.format(user_message=user_input)
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.8,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Riko: {response}")
π‘ Use Cases
- Interactive Roleplay: Engaging character-based conversations with tsundere personality
- Local Deployment: Run efficiently on personal hardware via Ollama/GGUF
- Creative Writing: Generate authentic tsundere character dialogue and interactions
- Chatbot Applications: Personality-driven AI assistant with character consistency
- Entertainment: Fun, character-consistent interactions with kitsune AI personality
- Research: Study knowledge distillation from larger models (Kimi K2 β Qwen3-7b)
- Educational: Understanding Project Horizon LLM methodology and alternating training approaches
π¬ Project Horizon LLM Methodology
Project Horizon LLM represents an innovative approach to knowledge distillation and character-consistent AI training:
Distillation Process
- Source Models:
- Kimi K2 (Turn 1, 3, 5... responses)
- Horizon Beta (Turn 2, 4, 6... responses) - OpenRouter's cloaked model (#2 Translation, #3 Programming)
- Target Model: Qwen3-7b (student model)
- Knowledge Transfer: Personality traits and response patterns from both high-quality models
- Character Focus: Specialized curation for tsundere kitsune personality (Riko)
Alternating Turn Training
The training methodology involves:
- Human Query Extraction: Extract the human/user portions from conversation datasets
- Turn 1: Feed query to Kimi K2 β Generate response
- Turn 2: Feed next query to Horizon Beta β Generate response
- Alternating Pattern: Continue alternating between Kimi K2 and Horizon Beta for each turn
- Response Curation: Select and refine responses that best match Riko's tsundere personality
- Dataset Compilation: Combine curated human queries with personality-matched responses
- Fine-tuning: Train Qwen3-7b on the curated dataset using Unsloth + TRL
This approach ensures:
- Personality Consistency: Responses align with Riko's tsundere kitsune character
- Response Diversity: Multiple LLM perspectives create varied, natural conversations
- Knowledge Distillation: Key traits from larger models transferred to smaller, efficient models
- Quality Control: Human curation ensures character authenticity
π οΈ Training Details
Dataset & Methodology
- Project: Project Horizon LLM alternating methodology
- Source Format: ShareGPT converted to Alpaca format
- Source Models: Kimi K2 and Horizon Beta (alternating responses)
- Training Approach: Turn-based alternating - human queries fed alternately to Kimi K2 (turn 1) and Horizon Beta (turn 2)
- Content: Curated conversations showcasing Riko's tsundere kitsune personality
- Size: Custom dataset focused on character consistency and personality traits
- Quality: Filtered and refined responses from both models for authentic tsundere character traits
Training Configuration
Training Framework: Unsloth + TRL SFTTrainer
Batch Size: 2 (per device)
Gradient Accumulation: 4 steps
Learning Rate: 2e-4
Optimizer: AdamW 8-bit
Weight Decay: 0.01
Scheduler: Linear
Max Steps: 100+
Warmup Steps: 5
Sequence Length: Dynamic (up to context limit)
Performance Optimizations
- 4-bit Quantization: Efficient memory usage
- Gradient Accumulation Fix: Implemented Unsloth's gradient bug fix
- Fast Inference: 2x speed improvement via Unsloth optimizations
π Model Specifications
Attribute | Details |
---|---|
Architecture | Qwen3 Transformer |
Parameters | 7b (4-bit quantized) |
Source Models | Kimi K2 + Horizon Beta (alternating) |
Project | Project Horizon LLM |
Context Length | Model dependent |
Quantization | 4-bit BNB |
Format Support | PyTorch, GGUF (Ollama compatible) |
Framework | PyTorch + Transformers |
Optimization | Unsloth accelerated |
Training Method | Turn-based alternating between two high-quality models |
π― Recommended Inference Settings
generation_config = {
"max_new_tokens": 256,
"temperature": 0.8, # Balanced creativity
"top_p": 0.9, # Focused sampling
"top_k": 50, # Vocabulary limiting
"repetition_penalty": 1.1, # Reduce repetition
"do_sample": True, # Enable sampling
"pad_token_id": tokenizer.eos_token_id
}
β οΈ Limitations & Considerations
- Character Consistency: Performance depends on prompt quality and context
- Content Scope: Optimized for conversational roleplay, may struggle with technical tasks
- Quantization Effects: 4-bit quantization may impact some response nuances
- Training Data: Limited to specific personality patterns in training set
- Language: Primarily trained on English conversations
π Ethical Considerations
- This model is designed for entertainment and creative purposes
- Users should be aware they're interacting with an AI character, not a real person
- Content generation should align with platform and community guidelines
- Not intended for therapeutic, advisory, or decision-making applications
π Citation
If you use this model in your research or applications, please cite:
@model{riko-qwen3-7b,
title={Riko-Qwen3-7b: Tsundere Kitsune AI},
author={subsectmusic},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/subsectmusic/riko-qwen3-7b}
}
π€ Acknowledgments
- Kimi K2 Team: For providing high-quality responses in the alternating training (odd turns)
- Horizon Beta Team: For the excellent cloaked model responses in alternating training (even turns)
- OpenRouter: For providing access to Horizon Beta during the community testing period
- Project Horizon LLM: For the innovative alternating turn training methodology
- Unsloth Team: For the incredible training acceleration framework
- Qwen Team: For the robust base model architecture
- Hugging Face: For the transformers library and model hosting
- TRL Team: For the supervised fine-tuning framework
- Ollama Team: For GGUF support and local deployment capabilities
π¦ Deployment Options
Hugging Face Transformers
- Standard PyTorch deployment
- Full precision and quantized versions
- GPU acceleration support
- Integration with existing HF pipelines
Ollama/GGUF
- Local deployment without internet
- Efficient CPU/GPU inference
- Easy installation and management
- Cross-platform compatibility
- Reduced VRAM requirements
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Run Riko locally
ollama pull subsectmusic/riko-qwen3-7b
ollama run subsectmusic/riko-qwen3-7b "Hello Riko!"
π Support & Community
- Issues: Report via GitHub Issues
- Discussions: Join the community discussions
- Updates: Follow for model improvements and versions
Training AI personalities, one tsundere at a time!
- Downloads last month
- 640
4-bit
8-bit