ubergarm/Qwen3-235B-A22B-Instruct-2507-GGUF · imatrix-eaddario-combined-all-medium-Qwen3-235B-A22B-Instruct-2507-BF16.dat ?

4 days ago

Curious to know what this imatrix is all about. :)
Which one is best to use?

Owner 4 days ago

@Thireus hey buddy! its been busy lately huh? haha... check here: https://huggingface.co/datasets/eaddario/imatrix-calibration

@eaddario gave me the secret invokation to use them:

apt-get install duckdb
duckdb -ascii -c "SELECT * FROM read_parquet('file.parquet');" > file.txt

  {
    "name": "pure-IQ4_KS",
    "ppl": "4.4156 +/- 0.02624",
    "size": 116.994,
    "bpw": 4.275,
    "legend": "pure",
    "comment": "iq4_k token_embd, iq6_k output, ubergarm-imatrix-calibration-corpus-v02.txt"
  },
  {
    "name": "eaddario-imat-pure-IQ4_KS",
    "ppl": "4.4175 +/- 0.02628",
    "size": 116.994,
    "bpw": 4.275,
    "legend": "pure",
    "comment": "iq4_k token_embd, iq6_k output, eaddario-imatrix-corpus-combined-all imatrix corpus"
  }

at least on wiki.test.raw (all english psure) the additional imatrix coverage isn't showing up. i'll keep these quants around and might try KLD or PPL with different kinds of corpus at some point

it took much longer to calculate, but none of the routed experts ended up dropping data for imatrix. my imatrix the first 50 chunks had some dropped routed experts.

interestingly this 235B was much more finickey with imatrix routed experts than the newer bigger 480B!

haven't had time to think about it much more though given rushing new 480B quants out the door lmao...

keep me posted if you find anything. the theory goes that having the bigger imatrix corpus might improve coding stuff etc. mine has some of that. here is mine explained for reference: https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a (comments at bottom)

Thireus

4 days ago

Got it, thanks for the info. Are you planning to use it for ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF then if it's supposed to improve coding perfs?

ubergarm

Owner 3 days ago

@Thireus

excellent question and one i was asking myself while waiting for the safetensors to download lol...

given the size of qwen3-coder-480b i ended up making the imatrix with a q8_0 instead of native bf16 to save some time and also used my own corpus because i wanted to release a quant before going to bed haha...

interestingly qwen3-235b was much more finnicky about getting enough data to the routed exps than bigger coder-480b and my smaller corpus didn't even drop any data and was silent after 10 chunks...

there is plenty of room for research here, but it has not been my personal interest to chase gains by tinking with imatrix corpus or methodologies so much. i'm having more fun mixing the quant types, but work on both fronts can push the pareto curve ever downward!