Model Request

#1323
by ThatHungarian - opened

Could you convert my new model (Aurora-10M) Into GGUF and quantizize it, thanks in advance

It's queued! :D

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Aurora-10M-GGUF for quants to appear.

It looks like it failed, is there something wrong with it?

Just the same issue we had last time. Basicaly you having modified the pre-tokenizer instead of using the GPT-2 pre-tokenizer. This just requires me to use a special build of llama.cpp that has your pre-tokenizer hash hardcoded for the model to conveart into GGUF. Some more manual work than usual but such tiny models are super rare so it is for sure worth it. https://huggingface.co/mradermacher/Aurora-6M-GGUF already has 170 downloads wich is crazy.

Yes I saw that it has 170 downloads too, I was kinda surprised, anyways thanks alot and tell me what I should do next time so it's easier for you

Wow, took me ages to find the issue for this model. Unfortunately, it failed:

llama_model_load: error loading model: check_tensor_dims: tensor 'position_embd.weight' has wrong shape; expected   256,   128, got   256,   256,     1,     1

@nicoboss your llama.cpp could provide some environment-variable override for the tokenizer in convert_hd_to_gguf.py and that could be set for the job. theoretically speaking.

@ThatHungarian it is quite normal for ultra-small models to require their own specific tokenizer config, so probably there probably isn't much you can do about the pretokenizer mismatch.

The tensor shape mismatch seems to be a model bug, though (or possibly a conversion bug).

i dont know the only difference beatween this model and the last one in terms of stuff that could effect it is that it was trained on a gpu instead of a cpu, maybe that could be it?

So is there nothing that can be done to quantizize it?

So is there nothing that can be done to quantizize it?

It can be quantized but nobody can use the quants so they are pointless. You shouldn't have modified the shape of the position_embd.weight tensor as doing so broke llama.cpp compatibility. Either you make the model use the same tensor shapes as original gpt-2 or you create a pull request to llama.cpp in which you implement llama.cpp support for your model. While you are at it you could also fix the crash that occurs after running your 6M model for a while. I think it consistently crashes after exceed a certain number of tokens.

I'm not sure I changed it so I don't know how I could change it back (perhaps it's because of some leftover from my previous model) however I think the crashing is fixed with the 10m model atleast in it's bin format as I am able to run it without crashes

Also I don't think the crash with the 6m model is fixable as I think that's just due to exceeding token limits (just like you said) as I didn't really implement a lot of features into that, however I'll try reshaping the 10m model later and see if i can get it back to the original form

could the bin file actually make things easier? i still have it

i wasnt able to meaningfully change anything so i just uploaded the bin file for it so they can run it that way, i cant really seem to figure out the reshaping

actually i was able to reduce it down to 256 256 (from 256 256 1 1) however reducing it down further to 256 128 would lead to data loss, is there any way that you could try and put a flag or something while converting to make it so it recognize the correct shape, anyways i uploaded the new model to the aurora-10M repo under the name model_shape.safetensors

actually i was able to reduce it down to 256 256 (from 256 256 1 1) however reducing it down further to 256 128 would lead to data loss, is there any way that you could try and put a flag or something while converting to make it so it recognize the correct shape

The problem is not conversion and quantisation. We can produce static quants of Aurora-10M but what is the point if nobody can use them? If llama.cpp doesn't support your architecture it cannot load the model. The proper fix would be you no longer using the "GPT2LMHeadModel" architecture and instead naming it something like "AuroraGPT2LMHeadModel" or "AuroraGPT2ForCasualLM" and then follow https://github.com/ggml-org/llama.cpp/blob/master/docs/development/HOWTO-add-model.md to create a pull request to add llama.cpp support for it. The problem is by changing the tensor shape you started creating your own architecture which is incompatible with the "GPT2LMHeadModel" architecture supported by llama.cpp.

I'll try working on that however in meantime the bin files are usable and without any crashes

i think it should work now, i didnt make a pull request however i changed it to "AuroraGPT2ForCausalLM" and "aurora-gpt2" and provided the aurora_model.py file so i think it should be ready, ill be doing it for the 30m version too in a second

llama.cpp will not recognize your archidecture if you don't create a PR implementing it. Doing so should be quite stright forward as you can just take the same code path as GPT-2 and then brench wherever your archidecture differs.

alright ill try doing that

im not sure how to do it actually

Sign up or log in to comment