mradermacher/model_requests

3 days ago

Could you convert my new model (Aurora-10M) Into GGUF and quantizize it, thanks in advance

3 days ago

It's queued! :D

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Aurora-10M-GGUF for quants to appear.

ThatHungarian

3 days ago

It looks like it failed, is there something wrong with it?

nicoboss

3 days ago

•

edited 3 days ago

Just the same issue we had last time. Basicaly you having modified the pre-tokenizer instead of using the GPT-2 pre-tokenizer. This just requires me to use a special build of llama.cpp that has your pre-tokenizer hash hardcoded for the model to conveart into GGUF. Some more manual work than usual but such tiny models are super rare so it is for sure worth it. https://huggingface.co/mradermacher/Aurora-6M-GGUF already has 170 downloads wich is crazy.

ThatHungarian

3 days ago

Yes I saw that it has 170 downloads too, I was kinda surprised, anyways thanks alot and tell me what I should do next time so it's easier for you

mradermacher

Owner 3 days ago

•

edited 3 days ago

Wow, took me ages to find the issue for this model. Unfortunately, it failed:

llama_model_load: error loading model: check_tensor_dims: tensor 'position_embd.weight' has wrong shape; expected   256,   128, got   256,   256,     1,     1

mradermacher

Owner 3 days ago

@nicoboss your llama.cpp could provide some environment-variable override for the tokenizer in convert_hd_to_gguf.py and that could be set for the job. theoretically speaking.

mradermacher

Owner 3 days ago

@ThatHungarian it is quite normal for ultra-small models to require their own specific tokenizer config, so probably there probably isn't much you can do about the pretokenizer mismatch.

The tensor shape mismatch seems to be a model bug, though (or possibly a conversion bug).

ThatHungarian

3 days ago

i dont know the only difference beatween this model and the last one in terms of stuff that could effect it is that it was trained on a gpu instead of a cpu, maybe that could be it?

ThatHungarian

2 days ago

So is there nothing that can be done to quantizize it?

nicoboss

2 days ago

So is there nothing that can be done to quantizize it?

It can be quantized but nobody can use the quants so they are pointless. You shouldn't have modified the shape of the position_embd.weight tensor as doing so broke llama.cpp compatibility. Either you make the model use the same tensor shapes as original gpt-2 or you create a pull request to llama.cpp in which you implement llama.cpp support for your model. While you are at it you could also fix the crash that occurs after running your 6M model for a while. I think it consistently crashes after exceed a certain number of tokens.

ThatHungarian

2 days ago

•

edited 2 days ago

I'm not sure I changed it so I don't know how I could change it back (perhaps it's because of some leftover from my previous model) however I think the crashing is fixed with the 10m model atleast in it's bin format as I am able to run it without crashes

ThatHungarian

2 days ago

Also I don't think the crash with the 6m model is fixable as I think that's just due to exceeding token limits (just like you said) as I didn't really implement a lot of features into that, however I'll try reshaping the 10m model later and see if i can get it back to the original form

ThatHungarian

2 days ago

could the bin file actually make things easier? i still have it

ThatHungarian

2 days ago

i wasnt able to meaningfully change anything so i just uploaded the bin file for it so they can run it that way, i cant really seem to figure out the reshaping

ThatHungarian

2 days ago

•

edited 2 days ago

actually i was able to reduce it down to 256 256 (from 256 256 1 1) however reducing it down further to 256 128 would lead to data loss, is there any way that you could try and put a flag or something while converting to make it so it recognize the correct shape, anyways i uploaded the new model to the aurora-10M repo under the name model_shape.safetensors

nicoboss

2 days ago

actually i was able to reduce it down to 256 256 (from 256 256 1 1) however reducing it down further to 256 128 would lead to data loss, is there any way that you could try and put a flag or something while converting to make it so it recognize the correct shape

The problem is not conversion and quantisation. We can produce static quants of Aurora-10M but what is the point if nobody can use them? If llama.cpp doesn't support your architecture it cannot load the model. The proper fix would be you no longer using the "GPT2LMHeadModel" architecture and instead naming it something like "AuroraGPT2LMHeadModel" or "AuroraGPT2ForCasualLM" and then follow https://github.com/ggml-org/llama.cpp/blob/master/docs/development/HOWTO-add-model.md to create a pull request to add llama.cpp support for it. The problem is by changing the tensor shape you started creating your own architecture which is incompatible with the "GPT2LMHeadModel" architecture supported by llama.cpp.

ThatHungarian

1 day ago

I'll try working on that however in meantime the bin files are usable and without any crashes

ThatHungarian

1 day ago

i think it should work now, i didnt make a pull request however i changed it to "AuroraGPT2ForCausalLM" and "aurora-gpt2" and provided the aurora_model.py file so i think it should be ready, ill be doing it for the 30m version too in a second

nicoboss

1 day ago

llama.cpp will not recognize your archidecture if you don't create a PR implementing it. Doing so should be quite stright forward as you can just take the same code path as GPT-2 and then brench wherever your archidecture differs.

ThatHungarian

1 day ago

alright ill try doing that

ThatHungarian

1 day ago

im not sure how to do it actually

mradermacher
/

model_requests

Model Request