🦊 Riko-Qwen3-7b: Tsundere Kitsune AI

πŸ“‹ Model Overview

Riko-Qwen3-7b is a specialized conversational AI model fine-tuned to embody the personality of Riko, a tsundere kitsune character. Part of Project Horizon LLM, this model was trained using alternating responses from Kimi K2 and Horizon Beta, built on the robust Qwen3-7b foundation, delivering engaging, personality-driven conversations with authentic tsundere characteristics.

  • Base Model: unsloth/Qwen3-7b-Base-unsloth-bnb-4bit
  • Source Models: Kimi K2 + Horizon Beta (alternating turns)
  • Project: Project Horizon LLM
  • Developer: subsectmusic
  • Training Framework: Unsloth + Hugging Face TRL
  • Training Speed: 2x faster optimization via Unsloth
  • License: Apache 2.0
  • Model Size: 7b parameters (4-bit quantized)
  • Format Support: GGUF compatible for Ollama deployment

🎭 Character Profile: Riko

Riko is a tsundere kitsune AI with a complex personality that balances tough exterior attitudes with hidden warmth and care. Key traits include:

  • Tsundere Behavior: Classic "it's not like I like you or anything!" responses
  • Kitsune Heritage: Fox-spirit wisdom mixed with playful mischief
  • Emotional Depth: Genuine care hidden behind defensive barriers
  • Conversational Style: Witty, sometimes sarcastic, but ultimately endearing

πŸš€ Quick Start

Option 1: Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "subsectmusic/riko-qwen3-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

Option 2: Ollama Deployment (GGUF)

# Pull the GGUF model for Ollama
ollama pull subsectmusic/riko-qwen3-7b

# Start chatting with Riko
ollama run subsectmusic/riko-qwen3-7b

Conversation Template

prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are Riko, respond as the tsundere kitsune AI with your usual personality.

### Input:
{user_message}

### Response:
"""

# Generate response
user_input = "Hello Riko, how are you today?"
prompt = prompt_template.format(user_message=user_input)

inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.8,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Riko: {response}")

πŸ’‘ Use Cases

  • Interactive Roleplay: Engaging character-based conversations with tsundere personality
  • Local Deployment: Run efficiently on personal hardware via Ollama/GGUF
  • Creative Writing: Generate authentic tsundere character dialogue and interactions
  • Chatbot Applications: Personality-driven AI assistant with character consistency
  • Entertainment: Fun, character-consistent interactions with kitsune AI personality
  • Research: Study knowledge distillation from larger models (Kimi K2 β†’ Qwen3-7b)
  • Educational: Understanding Project Horizon LLM methodology and alternating training approaches

πŸ”¬ Project Horizon LLM Methodology

Project Horizon LLM represents an innovative approach to knowledge distillation and character-consistent AI training:

Distillation Process

  • Source Models:
    • Kimi K2 (Turn 1, 3, 5... responses)
    • Horizon Beta (Turn 2, 4, 6... responses) - OpenRouter's cloaked model (#2 Translation, #3 Programming)
  • Target Model: Qwen3-7b (student model)
  • Knowledge Transfer: Personality traits and response patterns from both high-quality models
  • Character Focus: Specialized curation for tsundere kitsune personality (Riko)

Alternating Turn Training

The training methodology involves:

  1. Human Query Extraction: Extract the human/user portions from conversation datasets
  2. Turn 1: Feed query to Kimi K2 β†’ Generate response
  3. Turn 2: Feed next query to Horizon Beta β†’ Generate response
  4. Alternating Pattern: Continue alternating between Kimi K2 and Horizon Beta for each turn
  5. Response Curation: Select and refine responses that best match Riko's tsundere personality
  6. Dataset Compilation: Combine curated human queries with personality-matched responses
  7. Fine-tuning: Train Qwen3-7b on the curated dataset using Unsloth + TRL

This approach ensures:

  • Personality Consistency: Responses align with Riko's tsundere kitsune character
  • Response Diversity: Multiple LLM perspectives create varied, natural conversations
  • Knowledge Distillation: Key traits from larger models transferred to smaller, efficient models
  • Quality Control: Human curation ensures character authenticity

πŸ› οΈ Training Details

Dataset & Methodology

  • Project: Project Horizon LLM alternating methodology
  • Source Format: ShareGPT converted to Alpaca format
  • Source Models: Kimi K2 and Horizon Beta (alternating responses)
  • Training Approach: Turn-based alternating - human queries fed alternately to Kimi K2 (turn 1) and Horizon Beta (turn 2)
  • Content: Curated conversations showcasing Riko's tsundere kitsune personality
  • Size: Custom dataset focused on character consistency and personality traits
  • Quality: Filtered and refined responses from both models for authentic tsundere character traits

Training Configuration

Training Framework: Unsloth + TRL SFTTrainer
Batch Size: 2 (per device)
Gradient Accumulation: 4 steps  
Learning Rate: 2e-4
Optimizer: AdamW 8-bit
Weight Decay: 0.01
Scheduler: Linear
Max Steps: 100+
Warmup Steps: 5
Sequence Length: Dynamic (up to context limit)

Performance Optimizations

  • 4-bit Quantization: Efficient memory usage
  • Gradient Accumulation Fix: Implemented Unsloth's gradient bug fix
  • Fast Inference: 2x speed improvement via Unsloth optimizations

πŸ“Š Model Specifications

Attribute Details
Architecture Qwen3 Transformer
Parameters 7b (4-bit quantized)
Source Models Kimi K2 + Horizon Beta (alternating)
Project Project Horizon LLM
Context Length Model dependent
Quantization 4-bit BNB
Format Support PyTorch, GGUF (Ollama compatible)
Framework PyTorch + Transformers
Optimization Unsloth accelerated
Training Method Turn-based alternating between two high-quality models

🎯 Recommended Inference Settings

generation_config = {
    "max_new_tokens": 256,
    "temperature": 0.8,        # Balanced creativity
    "top_p": 0.9,             # Focused sampling  
    "top_k": 50,              # Vocabulary limiting
    "repetition_penalty": 1.1, # Reduce repetition
    "do_sample": True,        # Enable sampling
    "pad_token_id": tokenizer.eos_token_id
}

⚠️ Limitations & Considerations

  • Character Consistency: Performance depends on prompt quality and context
  • Content Scope: Optimized for conversational roleplay, may struggle with technical tasks
  • Quantization Effects: 4-bit quantization may impact some response nuances
  • Training Data: Limited to specific personality patterns in training set
  • Language: Primarily trained on English conversations

πŸ”’ Ethical Considerations

  • This model is designed for entertainment and creative purposes
  • Users should be aware they're interacting with an AI character, not a real person
  • Content generation should align with platform and community guidelines
  • Not intended for therapeutic, advisory, or decision-making applications

πŸ“š Citation

If you use this model in your research or applications, please cite:

@model{riko-qwen3-7b,
  title={Riko-Qwen3-7b: Tsundere Kitsune AI},
  author={subsectmusic},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/subsectmusic/riko-qwen3-7b}
}

🀝 Acknowledgments

  • Kimi K2 Team: For providing high-quality responses in the alternating training (odd turns)
  • Horizon Beta Team: For the excellent cloaked model responses in alternating training (even turns)
  • OpenRouter: For providing access to Horizon Beta during the community testing period
  • Project Horizon LLM: For the innovative alternating turn training methodology
  • Unsloth Team: For the incredible training acceleration framework
  • Qwen Team: For the robust base model architecture
  • Hugging Face: For the transformers library and model hosting
  • TRL Team: For the supervised fine-tuning framework
  • Ollama Team: For GGUF support and local deployment capabilities

πŸ“¦ Deployment Options

Hugging Face Transformers

  • Standard PyTorch deployment
  • Full precision and quantized versions
  • GPU acceleration support
  • Integration with existing HF pipelines

Ollama/GGUF

  • Local deployment without internet
  • Efficient CPU/GPU inference
  • Easy installation and management
  • Cross-platform compatibility
  • Reduced VRAM requirements
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Run Riko locally
ollama pull subsectmusic/riko-qwen3-7b
ollama run subsectmusic/riko-qwen3-7b "Hello Riko!"

πŸ“ž Support & Community

  • Issues: Report via GitHub Issues
  • Discussions: Join the community discussions
  • Updates: Follow for model improvements and versions

Made with ❀️ using Unsloth
Training AI personalities, one tsundere at a time!
Downloads last month
640
GGUF
Model size
8.19B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support