DavidAU/Qwen3-30B-A6B-16-Extreme · Is this a finetune?

May 14

Hello hello. In the model card, you call this a finetune, but the hash for both this model and Qwen 30B A3B are identical. Did you upload the wrong files, or is this just a config edit?

DavidAU

Owner May 15

•

edited May 15

Hey;

Config adjustment ; this is why "finetune" is in quotes.
This repo was created to make quanting easy (by quanters and using GGUF My Repo), and allow people than can not / can not easily adjust the number of experts.

CalamitousFelicitousness

May 15

This comment has been hidden (marked as Off-Topic)

haorannlp

May 15

NICE work! Any comparisons (on common benchmarks) available between 30B-A6B and 30B-A3B?

DavidAU

Owner May 15

•

edited May 15

Benchmarks will differ based on complexity of the prompt(s).
Some will go up, others down.

The reason is how a MOE operates as you increase/decrease experts, as each layer in the model contains all the experts
and each one(s) (activated) has a say on the token generation.

If there are too many experts and/or they are too diverse (not an expert on the topic, or even general expert) on the topic(s), then they will bring down the results (average out) ;
likewise, if the prompt/problem is complex more experts can better solve/answer the query.

IE: If you are trying to solve a basic math problem, too many experts may be overkill.

Likewise, if you are trying to solve a physics question, related to multiple planetary movement(s)/object(s) more experts may be able to solve it or solve it to better degree.

A lot of this depends on how the gating was set up in the model, and how extensive it is - including positive and negative gating.

One place where more experts may be of benefit is multi-turn conversations, as the complexity of the convo grows, more experts may be able to understand
the entire convo at a higher level and provide better, more insightful answers.

So to answer your question:

More complex prompts will likely lead to higher benchmarks specifically whereas other questions may lower the benchmark.
As stated on the repo card, your use case(s) may determine if this model with 16 experts is a good fit or fewer experts are better.

All of this is complicated by the reasoning systems embedded in the model.

In testing I found that more experts have some effect on the reasoning, but a far greater effect on the output post reasoning and additional multi-turn/step convos
and additional reasoning/thinking blocks in the same convo.