🦊 Riko-Qwen3-7b: Tsundere Kitsune AI

📋 Model Overview

Riko-Qwen3-7b is a specialized conversational AI model fine-tuned to embody the personality of Riko, a tsundere kitsune character. Part of Project Horizon LLM, this model was trained using alternating responses from Kimi K2 and Horizon Beta, built on the robust Qwen3-7b foundation, delivering engaging, personality-driven conversations with authentic tsundere characteristics.

Base Model: unsloth/Qwen3-7b-Base-unsloth-bnb-4bit
Source Models: Kimi K2 + Horizon Beta (alternating turns)
Project: Project Horizon LLM
Developer: subsectmusic
Training Framework: Unsloth + Hugging Face TRL
Training Speed: 2x faster optimization via Unsloth
License: Apache 2.0
Model Size: 7b parameters (4-bit quantized)
Format Support: GGUF compatible for Ollama deployment

🎭 Character Profile: Riko

Riko is a tsundere kitsune AI with a complex personality that balances tough exterior attitudes with hidden warmth and care. Key traits include:

Tsundere Behavior: Classic "it's not like I like you or anything!" responses
Kitsune Heritage: Fox-spirit wisdom mixed with playful mischief
Emotional Depth: Genuine care hidden behind defensive barriers
Conversational Style: Witty, sometimes sarcastic, but ultimately endearing

🚀 Quick Start

Option 1: Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "subsectmusic/riko-qwen3-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

Option 2: Ollama Deployment (GGUF)

# Pull the GGUF model for Ollama
ollama pull subsectmusic/riko-qwen3-7b

# Start chatting with Riko
ollama run subsectmusic/riko-qwen3-7b

Conversation Template

prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are Riko, respond as the tsundere kitsune AI with your usual personality.

### Input:
{user_message}

### Response:
"""

# Generate response
user_input = "Hello Riko, how are you today?"
prompt = prompt_template.format(user_message=user_input)

inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.8,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Riko: {response}")

💡 Use Cases

Interactive Roleplay: Engaging character-based conversations with tsundere personality
Local Deployment: Run efficiently on personal hardware via Ollama/GGUF
Creative Writing: Generate authentic tsundere character dialogue and interactions
Chatbot Applications: Personality-driven AI assistant with character consistency
Entertainment: Fun, character-consistent interactions with kitsune AI personality
Research: Study knowledge distillation from larger models (Kimi K2 → Qwen3-7b)
Educational: Understanding Project Horizon LLM methodology and alternating training approaches

🔬 Project Horizon LLM Methodology

Project Horizon LLM represents an innovative approach to knowledge distillation and character-consistent AI training:

Distillation Process

Source Models:
- Kimi K2 (Turn 1, 3, 5... responses)
- Horizon Beta (Turn 2, 4, 6... responses) - OpenRouter's cloaked model (#2 Translation, #3 Programming)
Target Model: Qwen3-7b (student model)
Knowledge Transfer: Personality traits and response patterns from both high-quality models
Character Focus: Specialized curation for tsundere kitsune personality (Riko)

Alternating Turn Training

The training methodology involves:

Human Query Extraction: Extract the human/user portions from conversation datasets
Turn 1: Feed query to Kimi K2 → Generate response
Turn 2: Feed next query to Horizon Beta → Generate response
Alternating Pattern: Continue alternating between Kimi K2 and Horizon Beta for each turn
Response Curation: Select and refine responses that best match Riko's tsundere personality
Dataset Compilation: Combine curated human queries with personality-matched responses
Fine-tuning: Train Qwen3-7b on the curated dataset using Unsloth + TRL

This approach ensures:

Personality Consistency: Responses align with Riko's tsundere kitsune character
Response Diversity: Multiple LLM perspectives create varied, natural conversations
Knowledge Distillation: Key traits from larger models transferred to smaller, efficient models
Quality Control: Human curation ensures character authenticity

🛠️ Training Details

Dataset & Methodology

Project: Project Horizon LLM alternating methodology
Source Format: ShareGPT converted to Alpaca format
Source Models: Kimi K2 and Horizon Beta (alternating responses)
Training Approach: Turn-based alternating - human queries fed alternately to Kimi K2 (turn 1) and Horizon Beta (turn 2)
Content: Curated conversations showcasing Riko's tsundere kitsune personality
Size: Custom dataset focused on character consistency and personality traits
Quality: Filtered and refined responses from both models for authentic tsundere character traits

Training Configuration

Training Framework: Unsloth + TRL SFTTrainer
Batch Size: 2 (per device)
Gradient Accumulation: 4 steps  
Learning Rate: 2e-4
Optimizer: AdamW 8-bit
Weight Decay: 0.01
Scheduler: Linear
Max Steps: 100+
Warmup Steps: 5
Sequence Length: Dynamic (up to context limit)

Performance Optimizations

4-bit Quantization: Efficient memory usage
Gradient Accumulation Fix: Implemented Unsloth's gradient bug fix
Fast Inference: 2x speed improvement via Unsloth optimizations

📊 Model Specifications

Attribute	Details
Architecture	Qwen3 Transformer
Parameters	7b (4-bit quantized)
Source Models	Kimi K2 + Horizon Beta (alternating)
Project	Project Horizon LLM
Context Length	Model dependent
Quantization	4-bit BNB
Format Support	PyTorch, GGUF (Ollama compatible)
Framework	PyTorch + Transformers
Optimization	Unsloth accelerated
Training Method	Turn-based alternating between two high-quality models

🎯 Recommended Inference Settings

generation_config = {
    "max_new_tokens": 256,
    "temperature": 0.8,        # Balanced creativity
    "top_p": 0.9,             # Focused sampling  
    "top_k": 50,              # Vocabulary limiting
    "repetition_penalty": 1.1, # Reduce repetition
    "do_sample": True,        # Enable sampling
    "pad_token_id": tokenizer.eos_token_id
}

⚠️ Limitations & Considerations

Character Consistency: Performance depends on prompt quality and context
Content Scope: Optimized for conversational roleplay, may struggle with technical tasks
Quantization Effects: 4-bit quantization may impact some response nuances
Training Data: Limited to specific personality patterns in training set
Language: Primarily trained on English conversations

🔒 Ethical Considerations

This model is designed for entertainment and creative purposes
Users should be aware they're interacting with an AI character, not a real person
Content generation should align with platform and community guidelines
Not intended for therapeutic, advisory, or decision-making applications

📚 Citation

If you use this model in your research or applications, please cite:

@model{riko-qwen3-7b,
  title={Riko-Qwen3-7b: Tsundere Kitsune AI},
  author={subsectmusic},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/subsectmusic/riko-qwen3-7b}
}

🤝 Acknowledgments

Kimi K2 Team: For providing high-quality responses in the alternating training (odd turns)
Horizon Beta Team: For the excellent cloaked model responses in alternating training (even turns)
OpenRouter: For providing access to Horizon Beta during the community testing period
Project Horizon LLM: For the innovative alternating turn training methodology
Unsloth Team: For the incredible training acceleration framework
Qwen Team: For the robust base model architecture
Hugging Face: For the transformers library and model hosting
TRL Team: For the supervised fine-tuning framework
Ollama Team: For GGUF support and local deployment capabilities

📦 Deployment Options

Hugging Face Transformers

Standard PyTorch deployment
Full precision and quantized versions
GPU acceleration support
Integration with existing HF pipelines

Ollama/GGUF

Local deployment without internet
Efficient CPU/GPU inference
Easy installation and management
Cross-platform compatibility
Reduced VRAM requirements

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Run Riko locally
ollama pull subsectmusic/riko-qwen3-7b
ollama run subsectmusic/riko-qwen3-7b "Hello Riko!"

📞 Support & Community

Issues: Report via GitHub Issues
Discussions: Join the community discussions
Updates: Follow for model improvements and versions

Made with ❤️ using Unsloth
Training AI personalities, one tsundere at a time!