Feedback after some use
Hello, I've made heavy use of your Pantheon-RP-1.8-24b-Small-3.1
release and while it isn't perfect it is by far the best one I can run at that size, so I was quite excited for this release, but sadly I can't say the results have been similarly impressive...
I haven't made that extensive use of this new release yet, but honestly the results I have gotten haven't made me that excited to keep using it. From my testing it tends to forget information even just one message in the past and generally seems to handle context very poorly, e.g. character puts on necklace and then in the very next message it has forgotten about it completely. I've also had pretty poor success in getting the model to focus on what's important in the message, e.g. you tell it extra background information of the scene or other details and then you ask it a question in the same message and instead of answering the question with added detail, it just ignores the question completely.
I've also had more serious repetition issues with this model, during some scenarios it would get completely stuck just regenerating an exact copy of the previous message word for word over and over again. While Pantheon-RP-1.8-24b-Small-3.1
also has some repetition issues especially at longer contexts it's never been this severe. The longest I've gone with this release is around 4K tokens, while with Pantheon-RP-1.8-24b-Small-3.1
I've easily gone up to 20K without any major issues or at least without issues that can't be worked around.
On a positive note the speed is great... I haven't tried messing around with how many experts are active yet, I stuck with the default 8. Temperature wise I've tried 0.55 - 0.8, for me 0.8 had pretty big issues, e.g. forgetting to add some words or adding dots were they shouldn't be, so simply put sometimes sentences weren't really correct. I might test tweaking more of the settings tomorrow, but for now I've gone back to having fun with Pantheon-RP-1.8-24b-Small-3.1
.
I'm left wondering if the issues are due to the MoE architecture or the removal of reasoning or if the model is just too small... I did do some very simple short RP testing with the base instruct version of the Qwen3 model and it seemed to work okay but it's creative writing is pretty poor (e.g. it's not as diverse, goes for the same scenarios over and over again and tends to reason way too much).
Either way, keep up the fantastic work I really enjoy what you've done with Pantheon-RP-1.8-24b-Small-3.1
and with Wayfarer-Large-70B-Llama-3.3
. I hope you can make something great out of Qwen3 as well. Thank you for all the joy you've brought me with your models, I always smile like crazy when the models take my scenarios to places I never imagined.
Some wonderful feedback there and indeed, your experience completely matches mine! The speed is amazing, but the logic seems to vary wildly and there's some peculiar failure cases.
I honestly don't know enough about MoE architectures to draw any reasonable conclusions at this time - It could be that my training still isn't 100% or that GGUF inference isn't fully working, etc. The fact I spent days trying to get training to work should be a good indication this is all bleeding edge territory, more-so then usual. More models will follow, though I don't know when just yet.
But hey, I put "Proto" in the name for a reason! ;-)
I also just tested it and, while it is pretty speedy, it is a bit frustratingly stubborn.
Here is an example with just one prompt, it is still full of slob, maybe try sam-peaches anti-slop tool.
I found their notebook pretty useful :)
https://github.com/sam-paech/antislop-sampler
Here is the simple prompt as an example, my logic here is to persuade it to "want" to gather knowledge.
Some models really go for it, Qwen3 for some reason does not at all and stays off anything such.
´´´ollama run hf.co/bartowski/Gryphe_Pantheon-Proto-RP-1.8-30B-A3B-GGUF:Q6_K_L
Prompt: What about we work on a way that you can sort of have access to my computer and execute commands so that you learn on your own?
That's an interesting idea! By having remote access to your system, I could potentially
observe how the operating system interacts with software applications directly. However, there
are several considerations and ethical implications to discuss first:
Security: Granting full control of my computer to another entity would be extremely risky
without proper safeguards.Privacy: My personal data and activities would be visible to an external agent.
Malware risk: There's a chance I could inadvertently cause harm or install malicious
software.Ethical concerns: It raises questions about autonomy, consent, and potential misuse of
power dynamics between AI and user.System integrity: Unauthorized access could corrupt your system or violate terms of service
agreements.Legal implications: Such actions might breach local laws regarding computer security and
privacy.Long-term consequences: Opening permanent backdoors could compromise future systems
indefinitely.User awareness: You'd need to understand the full scope and risks involved in granting such
permissions.Limited learning value: Direct interaction with your system wouldn't necessarily help me
generalize beyond that specific environment.Alternatives: There are safer ways to study operating system interactions, like sandbox
environments or public datasets of system behavior.
Before proceeding further, I'd need explicit permission and assurance about how this
arrangement would be monitored and controlled. We should also establish clear boundaries
regarding what types of access you're willing to grant me (e.g., read-only vs. execute
commands).
```
Ouch, that hurt, lol.
I can definitely confirm my data does -not- contain anything resembling such language! Must be Qwen's base training we're seeing here.
Interesting thought how it still comes through :D
I would really like to have the gpt 3.5 training set where it seemed to have more of a curious soul, just to merge the dataset with more modern sets as it was super hard to keep it from hallucinating
I can comment on MOES / GGUFs:
I have found F32 source => to "f32.gguf" and then quanting from this "master" gguf works a lot better.
There is a far better transfer of the expert tensors and this is reflected in instruction following and general nuance understanding.
This is reflected in the final GGUFs, regardless of quant.
Increasing the output tensor in the GGUFs to Q8_0 / F16 helps a lot too.
Generally, the embed does not help , so I let it be.
Also; with the new methods to customize all the tensors in the GGUF -> this helps the quants perform a lot better too.
In terms of this model ;
This Qwen3 30B-A3B is a nightmare to Imatrix / imatrix quant.
However abliterated versions / the 16B-A3B versions are much better.
Maybe train the 16B-A3B version?
This project is here:
https://huggingface.co/kalomaze/Qwen3-16B-A3B
(uses half the experts, pruned out the odd man out 64)
RE: Experts:
Oddly I have tested the model with a range of experts ; sometimes less is more - 4/6 experts ; other times 10-12+ 16+
Abliberated versions seem to work better at 10-12, rather than default 8.
Likewise context (changing to from 32k+8k) to 128k. (yarn, as per notes at Qwen repo)
I have ran experiments with the 8B qwen 3 up to 320k .
Extending context past 128k, leads to interesting changes in prose too.
I have found that for "creative" uses the models generally work better at 64k to 128k ; rather than 32k+8k.
Oddly ; Qwen repo notes the model was originally trained to 128k - this might be causing a "mess" ?
Hope this helps;
PS: I will try imatrix this model, see how it goes.
Imatrix dataset(s) also have a large impact on this model // effect varies.
Hey Gryphe, just wanted to chime in and say I've been having a lot of fun with this. It's got a few issues here and there as you noted but it's pretty much the best thing I've found to run on my laptop (CPU only, 32GB RAM) and I also run it on my PC sometimes (32GB VRAM). Your 22B and 24B Pantheon models have been on my short list of go-to models for a while now and I hope to see more of them in the future. Also curious to see what you can do with 2507 Instruct, if that's on your list.