🎭 DeepSeek-R1 Roleplay LoRA

A specialized roleplay model fine-tuned for creative storytelling and character interactions

This model is a LoRA (Low-Rank Adaptation) fine-tuned version of unsloth/DeepSeek-R1-0528-Qwen3-8B, specifically optimized for roleplay scenarios and creative character interactions. The model has been trained using advanced GRPO (Generalized Reward Preference Optimization) techniques to improve response quality and maintain character consistency.

🌟 Model Details

Base Model: unsloth/DeepSeek-R1-0528-Qwen3-8B
Model Type: QWen2 with LoRA adapters
Fine-tuning Method: GRPO (Generalized Reward Preference Optimization)
Dataset: PJMixers-Dev/Gryphe-Aesir-RPG-Charcards-Opus-Mixed-split
Language: English
Context Length: 2048 tokens
LoRA Rank: 32

🎯 Intended Use

This model is designed for:

Creative Writing: Generate engaging narratives and character interactions
Roleplay Scenarios: Create immersive character-driven conversations
Storytelling: Develop rich, detailed story content
Character Development: Maintain consistent character personalities and behaviors

🚀 Quick Start

Loading the Model

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Chun121/RP_DeepSeek-R1-0528-Qwen3-8B",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

# Enable fast inference
FastLanguageModel.for_inference(model)

Basic Usage

# Prepare your roleplay prompt
messages = [
    {
        "role": "system", 
        "content": "You are a creative roleplay assistant. Engage in character-driven storytelling with rich descriptions and authentic dialogue."
    },
    {
        "role": "user", 
        "content": "You are a wise wizard in a mystical forest. A young adventurer seeks your guidance."
    }
]

# Apply chat template
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

# Generate response
outputs = model.generate(
    input_ids=inputs,
    max_new_tokens=512,
    temperature=0.8,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Advanced Usage with VLLM

from vllm import LLM, SamplingParams

# Load model with LoRA
llm = LLM(
    model="unsloth/DeepSeek-R1-0528-Qwen3-8B",
    enable_lora=True
)

# Configure sampling
sampling_params = SamplingParams(
    temperature=0.8,
    top_p=0.95,
    max_tokens=512,
    stop=["</s>"]
)

# Generate with LoRA adapter
outputs = llm.generate(
    prompts=["Your roleplay prompt here"],
    sampling_params=sampling_params,
    lora_request=("roleplay_lora", "Chun121/RP_DeepSeek-R1-0528-Qwen3-8B", 1)
)

🔧 Training Details

Training Configuration

Learning Rate: 5e-6
Batch Size: 1 (per device)
Gradient Accumulation: 1 step
Max Steps: 100
Optimizer: AdamW 8-bit
Weight Decay: 0.01
Warmup Ratio: 0.1
Scheduler: Linear

LoRA Configuration

Rank (r): 32
Alpha: 64 (rank × 2)
Target Modules:
- q_proj, k_proj, v_proj, o_proj
- gate_proj, up_proj, down_proj
Dropout: 0.0

Reward Functions

The model was trained with multiple reward functions to ensure high-quality outputs:

Format Matching: Ensures proper response structure
Language Consistency: Maintains English language output
Answer Quality: Rewards contextually appropriate responses
Character Consistency: Maintains character behavior throughout interactions

📊 Performance

The model demonstrates significant improvements in:

English Language Consistency: Enhanced from baseline for English responses
Character Authenticity: Better adherence to character descriptions and personalities
Response Quality: More engaging and contextually appropriate roleplay responses
Format Compliance: Consistent use of proper formatting and structure

🎪 Example Outputs

Input: "You are a mysterious vampire lord in your ancient castle. A brave knight has come to challenge you."

Output:

*I emerge from the shadows of my throne room, my crimson eyes gleaming with ancient wisdom and predatory grace* 

"Ah, another mortal seeks to test their mettle against the lord of this domain..." *My voice echoes through the stone halls as I descend the steps slowly, each footfall deliberate and commanding* "Tell me, brave knight, what drives you to seek audience with one who has witnessed the rise and fall of kingdoms?"

*I pause before you, close enough that you can feel the supernatural chill that emanates from my presence* "Do you come seeking glory? Vengeance? Or perhaps... something far more dangerous?"

⚠️ Important Notes

This model is designed for creative and fictional content generation
The model has been fine-tuned for roleplay scenarios and may generate mature themes
Always use appropriate content warnings when sharing outputs
The model works best with clear character descriptions and scenario setup
For optimal results, provide context about the setting and character relationships

🛠️ Technical Specifications

Architecture: QWen2-based transformer with LoRA adapters
Precision: 4-bit quantization support
Memory Requirements: ~6GB VRAM (4-bit) / ~12GB VRAM (16-bit)
Inference Speed: Optimized for fast generation with vLLM support
Compatibility: Compatible with Transformers, Unsloth, and vLLM frameworks

🤝 Contributing

This model is part of ongoing research into improving roleplay AI capabilities. Feedback and suggestions for improvements are welcome!

📄 License

This model is released under the Apache 2.0 license, following the base model's licensing terms.

🙏 Acknowledgments

Base Model: Thanks to the DeepSeek team for the excellent base model
Training Framework: Built with Unsloth for efficient training
Dataset: Trained on the high-quality Gryphe-Aesir RPG dataset
Optimization: GRPO training implemented via TRL library

For more details about the training process and additional examples, please refer to the model card or reach out through the community discussions.

Chun121
/

RP_DeepSeek-R1-0528-Qwen3-8B