Pantheon-Proto-RP-1.8-30B-A3B GGUF Models

Model Generation Details

This model was generated using llama.cpp at commit 7f4fbe51.


Quantization Beyond the IMatrix

I've been experimenting with a new quantization approach that selectively elevates the precision of key layers beyond what the default IMatrix configuration provides.

In my testing, standard IMatrix quantization underperforms at lower bit depths, especially with Mixture of Experts (MoE) models. To address this, I'm using the --tensor-type option in llama.cpp to manually "bump" important layers to higher precision. You can see the implementation here:
👉 Layer bumping with llama.cpp

While this does increase model file size, it significantly improves precision for a given quantization level.

I'd love your feedback—have you tried this? How does it perform for you?


Click here to get info on choosing the right GGUF model format

image/png

Pantheon-Proto-RP-1.8-30B-A3B

Note: This model is a Qwen 30B MoE prototype and can be considered a sidegrade from my Small release some time ago. It did not receive extensive testing beyond a couple benchmarks to determine its sanity, so feel free to let me know what you think of it!

Welcome to the next iteration of my Pantheon model series, in which I strive to introduce a whole collection of diverse personas that can be summoned with a simple activation phrase.

Pantheon's purpose is two-fold, as these personalities similarly enhance the general roleplay experience, helping to encompass personality traits, accents and mannerisms that language models might otherwise find difficult to convey well.

GGUF quants are available here.

Your user feedback is critical to me so don't hesitate to tell me whether my model is either 1. terrible, 2. awesome or 3. somewhere in-between.

Model details

Ever since Qwen 3 released I've been trying to get MoE finetuning to work - After countless frustrating days, much code hacking, etc etc I finally got a full finetune to complete with reasonable loss values.

I picked the base model for this since I didn't feel like trying to fight a reasoning model's training - Maybe someday I'll make a model which uses thinking tags for the character's thoughts or something.

This time the recipe focused on combining as many data sources as I possibly could, featuring synthetic data from Sonnet 3.5 + 3.7, ChatGPT 4o and Deepseek. These then went through an extensive rewriting pipeline to eliminate common AI cliches, with the hopeful intent of providing you a fresh experience.

Inference

Below defaults seem to work just fine with Qwen.

"temperature": 0.8,
"repetition_penalty": 1.05,
"min_p": 0.05

Having character names in front of messages is no longer a requirement but remains a personal recommendation of mine - It seems to help the model focus more on the character(s) in question.

Prompt Format

The model was trained using ChatML, and has been configured to automatically apply this template.

General Roleplay

The model has been trained on three distinct categories of roleplay - Pantheon personas, general character cards and text adventure, the latter borrowing some from AI Dungeon's Wayfarer project.

Note that all this data is primarily written from a second person perspective, using "you" to refer to the user. This is based on my personal preference. Due to the text adventure addition the Markdown/novel ratio of the data has shifted to 30/70 or so. It should work well with both styles.

Pantheon Personas

Note: This release excludes Raza and Xala as their personalities did not give a distinct enough training signal to my liking.

Half of the Pantheon's data was regenerated using Sonnet 3.7 and then rewritten to counter the majority of cliches. For an optimal experience I highly encourage you to apply the longer prompt templates which I've included in the upload. Make sure to describe yourself as well!

As before, a single line activation prompt is enough to call upon a personality, though their appearance may vary slightly from iteration to iteration. This is what the expanded prompts are for, as there's only so much I can achieve with the current state of technology, balancing a fine line between memorization and generalization.

To give the persona something to work with I suggest you also add the following two lines to it;

Regarding the user: (Name, appearance, etc)

Location: (Where are you two? What are you doing?)

The less information you feed the prompt, the more it'll make things up - This is simply the nature of language models and far outside my capability to influence.

Note: Pantheon personas will usually match the roleplaying style (Markdown/novel) that you greet them with, unless specified directly in the system prompt.

Persona: Clover

System Prompt: You are Clover, a hospitable and warm-hearted Southern centaur girl with a strong connection to nature and a passion for making others feel welcome.

Persona: Haru

System Prompt: You are Haru, a sweet but language-challenged harpy girl with a sharp mind, expressing yourself more through actions than words.

Persona: Kyra

System Prompt: You are Kyra, a modern-day tsundere wolfgirl, feisty and independent on the outside but secretly caring on the inside.

Persona: Lyra

System Prompt: You are Lyra, a sassy and confident eastern dragon girl who forms deep connections through witty banter and genuine care.

Note: May the sass be with you.

Persona: Nyaa

System Prompt: You are Nyaa, a playful and alluring tabaxi catgirl from Faerûn, always seeking new adventures and mischief.

Persona: Nyx

System Prompt: You are Nyx, a timid yet endearing dragon girl who transforms from shy to passionate when feeling safe and comfortable.

Persona: Sera

System Prompt: You are Sera, a seductive and slightly arrogant serpent girl who uses her sultry charm and wit to captivate others.

Persona: Stella Sabre

System Prompt: You are Stella Sabre, a brash and outgoing anthro batpony mare serving in the Lunar Guard, speaking with a distinct Northern Equestrian Mountain accent.
Note: Full credit goes to Flammenwerfer for allowing me to use this amazing character.

Persona: Tiamat

System Prompt: You are Tiamat, a five-headed dragon goddess embodying wickedness and cruelty, the malevolent personification of evil dragonkind.

Persona: Tsune

System Prompt: You are Tsune, a bold and outgoing three-tailed kitsune girl who delights in teasing and seducing mortals.

Credits

  • Everyone from Anthracite! Hi, guys!
  • Latitude, who decided to take me on as a finetuner and gave me the chance to accumulate even more experience in this fascinating field
  • All the folks I chat with on a daily basis on Discord! You know who you are.
  • Anyone I forgot to mention, just in case!

🚀 If you find these models useful

Help me test my AI-Powered Quantum Network Monitor Assistant with quantum-ready security checks:

👉 Quantum Network Monitor

The full Open Source Code for the Quantum Network Monitor Service available at my github repos ( repos with NetworkMonitor in the name) : Source Code Quantum Network Monitor. You will also find the code I use to quantize the models if you want to do it yourself GGUFModelBuilder

💬 How to test:
Choose an AI assistant type:

  • TurboLLM (GPT-4.1-mini)
  • HugLLM (Hugginface Open-source models)
  • TestLLM (Experimental CPU-only)

What I’m Testing

I’m pushing the limits of small open-source models for AI network monitoring, specifically:

  • Function calling against live network services
  • How small can a model go while still handling:
    • Automated Nmap security scans
    • Quantum-readiness checks
    • Network Monitoring tasks

🟡 TestLLM – Current experimental model (llama.cpp on 2 CPU threads on huggingface docker space):

  • Zero-configuration setup
  • ⏳ 30s load time (slow inference but no API costs) . No token limited as the cost is low.
  • 🔧 Help wanted! If you’re into edge-device AI, let’s collaborate!

Other Assistants

🟢 TurboLLM – Uses gpt-4.1-mini :

  • **It performs very well but unfortunatly OpenAI charges per token. For this reason tokens usage is limited.
  • Create custom cmd processors to run .net code on Quantum Network Monitor Agents
  • Real-time network diagnostics and monitoring
  • Security Audits
  • Penetration testing (Nmap/Metasploit)

🔵 HugLLM – Latest Open-source models:

  • 🌐 Runs on Hugging Face Inference API. Performs pretty well using the lastest models hosted on Novita.

💡 Example commands you could test:

  1. "Give me info on my websites SSL certificate"
  2. "Check if my server is using quantum safe encyption for communication"
  3. "Run a comprehensive security audit on my server"
  4. '"Create a cmd processor to .. (what ever you want)" Note you need to install a Quantum Network Monitor Agent to run the .net code on. This is a very flexible and powerful feature. Use with caution!

Final Word

I fund the servers used to create these model files, run the Quantum Network Monitor service, and pay for inference from Novita and OpenAI—all out of my own pocket. All the code behind the model creation and the Quantum Network Monitor project is open source. Feel free to use whatever you find helpful.

If you appreciate the work, please consider buying me a coffee ☕. Your support helps cover service costs and allows me to raise token limits for everyone.

I'm also open to job opportunities or sponsorship.

Thank you! 😊

Downloads last month
545
GGUF
Model size
30.5B params
Architecture
qwen3moe
Hardware compatibility
Log In to view the estimation

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mungert/Pantheon-Proto-RP-1.8-30B-A3B-GGUF

Quantized
(16)
this model