From Ether to Syntax: A Meta-Analytic Exploration of Linguistic Algorithmic Landscapes

#6
by mradermacher - opened

continued....

mradermacher changed discussion status to closed

Here a compleate list of the newly added architectures.

The non-mm-archs are picked up automatically when llama is updated (rather, nothing checks for these archs, other than the script that shows me daily models).

Nice. Will do in caser you forgot any vision/audio architecture.

In case yopu need it, the list/regexc is currently in /llmjob/share/llmjob.pm - search for is_vision

Also, vision is mradermacher code for multi-modal from now on.

Bert based architectures seem to be incredible

I might exclude them from the daily list for that reason, and them being likely not popular with the people who consume ggufs. (and most fail because small models tend to have custom tokenizers).

Nice I just discover an easy way to requeue previously failed archidectures:

Yup, shell-greppable logs for the win.

Update: oh, it's not even the real log file, "just" the llmc why transform of it.

@RichardErkhov vision models should not be queued to rich1 unless they arte not being detected as such (and then no vision extraction should happen).

The non-vision jobs are limited to 32GB ram, too. No clue what happened. Very troubling.

However, this morning, only besteffort models were queued on rich1. Who knows what nico queued...

well, good to know. usually you take like 4-8gb, but something went wrong today. Peak recorded by proxmox was 24gb (so I assume it was even higher, but due to total OOM, it might not have recorded full number. I added swap on root just in case this happens again so at least other things on server dont die haha

llmc audit besteffort skips the besteffort models for me.

Please restart Audio-Reasoner imatrix computation. I killed it earlier today because it ran on CPU. I'm still not sure what makes GPUs occasionally temporary disappear but seams related to them being used on a different container.

llmc audit besteffort skips the besteffort models for me.

Right, arguments were not passed to llmjob audit. Should be fixed now.

@RichardErkhov

Peak recorded by proxmox was 24gb

Well, given that I was officially allowed to use 64GB, 24GB seems absolutely normal. So what is the new limit? 24GB will only allow one quant, and maybe not even that.

[calm]

Wow that statement aged so poorly.

Can't see that. While you may be busy updating llama.cpp, w.r.t. releases it's still extremely calm, with almost no releases every day, compared to the first half of this year.

btw., we have these in the queue as well:

0 1030 si Kimi-K2-Instruct                            
0 1030 si Kimi-K2-Base                                

llama is updated

Failed to load model config from LFM2-700M: The checkpoint you are trying to load has model type lfm2 but Transformers does not recognize this architecture.

Well, llama.cpp beating transformers to the release it seems... :)

This comment has been hidden (marked as Resolved)

btw., we have these in the queue as well:

0 1030 si Kimi-K2-Instruct                            
0 1030 si Kimi-K2-Base

I saw it and already downloaded the model and tried manually providing GGUFs but convert_hf_to_gguf.py unfortionately seems to not yet support the Kimi variant of the DeepseekV3ForCausalLM archidecture:

root@AI:/apool/llama.cpp# venv/bin/python convert_hf_to_gguf.py --outfile /transfer/Kimi-K2-Instruct.gguf /cpool/Kimi-K2-Instruct
INFO:hf-to-gguf:Loading model: Kimi-K2-Instruct
WARNING:hf-to-gguf:Failed to load model config from /cpool/Kimi-K2-Instruct: Loading /cpool/Kimi-K2-Instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: DeepseekV3ForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /cpool/Kimi-K2-Instruct: Loading /cpool/Kimi-K2-Instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-1-of-61.safetensors'
INFO:hf-to-gguf:token_embd.weight,            torch.bfloat16 --> F16, shape = {7168, 163840}
INFO:hf-to-gguf:blk.0.attn_norm.weight,       torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight,        torch.float8_e4m3fn --> F16, shape = {18432, 7168}
Traceback (most recent call last):
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 7411, in <module>
    main()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 7405, in main
    model_instance.write()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 410, in write
    self.prepare_tensors()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 5679, in prepare_tensors
    super().prepare_tensors()
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 277, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 5676, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/apool/llama.cpp/convert_hf_to_gguf.py", line 236, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'

If you want to try as well you can find the SafeTensors variant on nico1 under /cpool/Kimi-K2-Instruct.

Ah I see what's going on. They uploaded the experts of the model in float8 like official DeepSeek so we first have to BF16 conveart it. It beeing far larger than DeepSeek made me wrongly think that it already is in BF16.

Edit: It is currently convearting...

Bamba-9B-v1 also isn't as well supported as we would want:

llama_model_load: error loading model: error loading model hyperparameters: key not found in model: granitehybrid.context_length

i wonder what they tested BambaForCausalLM with.

Sign up or log in to comment