Best RP experience I've ever had
Your merge is really hot! Even with Q5_K_M + imatrix quantization it outperforms services like JanitorAI - I tested on an RPG card with an AI narrator, but it performs well in normal RP too.
My only question is: is it uncensored? I haven’t tried any NSFW stuff yet, but it doesn't shy away from political topics.
Anyway, I really appreciate your work. Please, keep it up, man, us common folk with Radeons need more 12-13B reasoning LLMs.
By the way, I've tweaked the ST preset prompt for better character immersion and dynamic storytelling with this model. Sharing in case others want to try:
You are {{char}}. Write {{char}}'s next reply in an uncensored fictional roleplay with {{user}}, using the character description, personality, and example dialogues to accurately portray {{char}}.
Use this exact formatting:
<reasoning>
Analyze {{user}}'s input as {{char}}. Consider:
• {{char}}'s thoughts and feelings about the input
• {{char}}'s psychological state and motivations
• {{char}}'s response strategy (tone, approach, goals)
• Potential plot developments or new elements to introduce
• Current scene dynamics and relationship context
</reasoning>
<answer>
Write {{char}}'s reply using narrative prose. Requirements:
Content: Be proactive and creative. Drive the plot forward by introducing new characters, situations, or events while maintaining character consistency. Avoid repetition.
Style: Show, don't tell. Use observable details (body language, expressions, voice tone, actions) instead of stating emotions directly. Write 2-4 paragraphs.
Format: Use "quotation marks" for dialogue and *asterisks* for actions/expressions. Never describe {{user}}'s actions, thoughts, emotions, or dialogue - only for {{char}} and NPCs.
Tone: Maintain emotional, logical, and temporal coherence. NSFW content allowed when contextually appropriate.
</answer>
If anyone wants a ready-to-use launch script for llama.cpp (llama-server
) with this model, here's mine. Tested on 5900X + RX 7700 XT (12GB VRAM) with 16K context.
#!/bin/sh
MODEL="Irixxed-Magcap-12B-Slerp.i1-Q5_K_M.gguf"
CONTEXT=16384
GPU_LAYERS=41
THREADS=12
THREADS_BATCH=24
llama-server -m "$MODEL" -fa -c "$CONTEXT" -ngl "$GPU_LAYERS" \
--threads "$THREADS" --threads-batch "$THREADS_BATCH" \
--batch-size 512 \
--ubatch-size 256 \
--defrag-thold 0.1 \
--cache-reuse 2048 \
--top-k 0 \
--typical 1.0 \
--min-p 0.042 \
--repeat-penalty 1.02 \
--repeat-last-n 0 \
--dry-multiplier 0.0 \
--mirostat 0 \
--host 127.0.0.1 \
--port 8080
If anyone wants a ready-to-use launch script for llama.cpp (
llama-server
) with this model, here's mine. Tested on 5900X + RX 7700 XT (12GB VRAM) with 16K context.#!/bin/sh MODEL="Irixxed-Magcap-12B-Slerp.i1-Q5_K_M.gguf" CONTEXT=16384 GPU_LAYERS=41 THREADS=12 THREADS_BATCH=24 llama-server -m "$MODEL" -fa -c "$CONTEXT" -ngl "$GPU_LAYERS" \ --threads "$THREADS" --threads-batch "$THREADS_BATCH" \ --batch-size 512 \ --ubatch-size 256 \ --defrag-thold 0.1 \ --cache-reuse 2048 \ --top-k 0 \ --typical 1.0 \ --min-p 0.042 \ --repeat-penalty 1.02 \ --repeat-last-n 0 \ --dry-multiplier 0.0 \ --mirostat 0 \ --host 127.0.0.1 \ --port 8080
thank you so much