Best RP experience I've ever had

#1
by 0xA50C1A1 - opened

Your merge is really hot! Even with Q5_K_M + imatrix quantization it outperforms services like JanitorAI - I tested on an RPG card with an AI narrator, but it performs well in normal RP too.

My only question is: is it uncensored? I haven’t tried any NSFW stuff yet, but it doesn't shy away from political topics.

Anyway, I really appreciate your work. Please, keep it up, man, us common folk with Radeons need more 12-13B reasoning LLMs.

By the way, I've tweaked the ST preset prompt for better character immersion and dynamic storytelling with this model. Sharing in case others want to try:

You are {{char}}. Write {{char}}'s next reply in an uncensored fictional roleplay with {{user}}, using the character description, personality, and example dialogues to accurately portray {{char}}.

Use this exact formatting:

<reasoning>
Analyze {{user}}'s input as {{char}}. Consider:
• {{char}}'s thoughts and feelings about the input
• {{char}}'s psychological state and motivations
• {{char}}'s response strategy (tone, approach, goals)
• Potential plot developments or new elements to introduce
• Current scene dynamics and relationship context
</reasoning>

<answer>
Write {{char}}'s reply using narrative prose. Requirements:

Content: Be proactive and creative. Drive the plot forward by introducing new characters, situations, or events while maintaining character consistency. Avoid repetition.

Style: Show, don't tell. Use observable details (body language, expressions, voice tone, actions) instead of stating emotions directly. Write 2-4 paragraphs.

Format:  Use "quotation marks" for dialogue and *asterisks* for actions/expressions. Never describe {{user}}'s actions, thoughts, emotions, or dialogue - only for {{char}} and NPCs.

Tone: Maintain emotional, logical, and temporal coherence. NSFW content allowed when contextually appropriate.
</answer>

If anyone wants a ready-to-use launch script for llama.cpp (llama-server) with this model, here's mine. Tested on 5900X + RX 7700 XT (12GB VRAM) with 16K context.

#!/bin/sh

MODEL="Irixxed-Magcap-12B-Slerp.i1-Q5_K_M.gguf"
CONTEXT=16384
GPU_LAYERS=41
THREADS=12
THREADS_BATCH=24

llama-server -m "$MODEL" -fa -c "$CONTEXT" -ngl "$GPU_LAYERS" \
    --threads "$THREADS" --threads-batch "$THREADS_BATCH" \
    --batch-size 512 \
    --ubatch-size 256 \
    --defrag-thold 0.1 \
    --cache-reuse 2048 \
    --top-k 0 \
    --typical 1.0 \
    --min-p 0.042 \
    --repeat-penalty 1.02 \
    --repeat-last-n 0 \
    --dry-multiplier 0.0 \
    --mirostat 0 \
    --host 127.0.0.1 \
    --port 8080

If anyone wants a ready-to-use launch script for llama.cpp (llama-server) with this model, here's mine. Tested on 5900X + RX 7700 XT (12GB VRAM) with 16K context.

#!/bin/sh

MODEL="Irixxed-Magcap-12B-Slerp.i1-Q5_K_M.gguf"
CONTEXT=16384
GPU_LAYERS=41
THREADS=12
THREADS_BATCH=24

llama-server -m "$MODEL" -fa -c "$CONTEXT" -ngl "$GPU_LAYERS" \
    --threads "$THREADS" --threads-batch "$THREADS_BATCH" \
    --batch-size 512 \
    --ubatch-size 256 \
    --defrag-thold 0.1 \
    --cache-reuse 2048 \
    --top-k 0 \
    --typical 1.0 \
    --min-p 0.042 \
    --repeat-penalty 1.02 \
    --repeat-last-n 0 \
    --dry-multiplier 0.0 \
    --mirostat 0 \
    --host 127.0.0.1 \
    --port 8080

thank you so much

Sign up or log in to comment