Other models updates and ggufs

#1
by Rybens - opened

When will the rest of the models be updated like this one? And when ggufs?

I make no secret that I am most interested in the Llama 3.1 8b Instruct.

By the way, great job and I'm keeping my fingers firmly crossed for your research on improving the performance of the released models.

This is basic research into LLMs.

So far, the method seems to work best with larger models, although I am waiting for the results from the Open LLM Leaderboard.

I have limited compute, so I need to focus on creating great models before I can spend that compute on generating GGUFs.

Once the tops models are found, that's the time to invest in GGUF generation.

I don't get paid for this research, but I have been sponsored with some GPU time from the appliedAI Institute.

Thank you for your reply. Once again, great job.

Could I suggest a model for you to modify?

Gemma2-2B It performs very well in my project and as if you have the ability, compute and time for this model (it is quite small) I would be very grateful. I am curious how it will perform after your modification.

Owner

I would be happy to try when I have time again.

OK, try dnhkng/RYS-gemma-2-2b-it

It tets well, but I haven't done a bibe test with it yet; so it might be unuseable!

Please report your findings in the discussion page of that model!

I need to dust off my model conversion script for ggufs, I wonder if it still works after a year hehe
My project uses ggufs to run on almost any modern computer even without a graphics card, so small models are very useful for it.

When I test RYS modification Gemma2 2B It in my project I will let you know.

Thank you very much!

Owner

It may not work, I had a report that the other gemma2 model wasn't compatible, as the number of layers has changed.

Let me know.

Right, my script gives such an error:

INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00002.safetensors'
Traceback (most recent call last):
  File "/mnt/c/Users/ryben/Documents/Nextcloud/Projects/easy-gguf/llama.cpp/convert_hf_to_gguf.py", line 3953, in <module>
    main()
  File "/mnt/c/Users/ryben/Documents/Nextcloud/Projects/easy-gguf/llama.cpp/convert_hf_to_gguf.py", line 3947, in main
    model_instance.write()
  File "/mnt/c/Users/ryben/Documents/Nextcloud/Projects/easy-gguf/llama.cpp/convert_hf_to_gguf.py", line 387, in write
    self.prepare_tensors()
  File "/mnt/c/Users/ryben/Documents/Nextcloud/Projects/easy-gguf/llama.cpp/convert_hf_to_gguf.py", line 262, in prepare_tensors
    for name, data_torch in self.get_tensors():
  File "/mnt/c/Users/ryben/Documents/Nextcloud/Projects/easy-gguf/llama.cpp/convert_hf_to_gguf.py", line 151, in get_tensors
    ctx = cast(ContextManager[Any], safe_open(self.dir_model / part_name, framework="pt", device="cpu"))
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

I have no idea if it is the fault of my script or the converting script.

@bartowski I don't want to bother you, but maybe you will know something more?

Header too large is a curious one that I've never come across which makes me think it's something wrong with the model itself

I can try conversion as well to make sure I get the same error, but I doubt it's a llama cpp issue

Oh wait, this one won't work either way because Gemma 2 models are hard coded to be recognized by layer count and this one altered the 27b layer count

@bartowski Oh sorry I didn't mention this in my previous post, but this error pops up when I convert Rys Gemma2 2B It from this repository:
https://huggingface.co/dnhkng/RYS-gemma-2-2b-it

But it's likely that what you wrote about llama.cpp, which has hardcoded the number of layers in convertion script in Gemma2 models, is the reason for the error

@Rybens I think you have some other issue present, perhaps need to update your environment? I was able to convert, but then on generation get the issue I expected (unable to recognize the gemma model cause of this assert: https://github.com/ggerganov/llama.cpp/blob/98a532d474c73d3494a5353024cb6a4fbbabbb35/src/llama.cpp#L11815)

@bartowski Yes, there was indeed something wrong on my side.

I already managed to convert the model and I get this error using Guidance in my project:

/tmp/pip-install-_tq76j88/llama-cpp-python_d7535f9fa2044e6b8f0d0909decb79d5/vendor/llama.cpp/src/llama.cpp:11815: fatal error
/home/rybens/miniconda3/envs/guidance/lib/python3.11/site-packages/llama_cpp/lib/libggml.so(+0xe204)[0xff9d68c8e204]
/home/rybens/miniconda3/envs/guidance/lib/python3.11/site-packages/llama_cpp/lib/libggml.so(ggml_abort+0x140)[0xff9d68c8f3d0]
/home/rybens/miniconda3/envs/guidance/lib/python3.11/site-packages/llama_cpp/lib/libllama.so(_ZN17llm_build_context12build_gemma2Ev+0xaf4)[0xff9d68e892c4]
/home/rybens/miniconda3/envs/guidance/lib/python3.11/site-packages/llama_cpp/lib/libllama.so(+0x52588)[0xff9d68e12588]
/home/rybens/miniconda3/envs/guidance/lib/python3.11/site-packages/llama_cpp/lib/libllama.so(llama_new_context_with_model+0xf68)[0xff9d68e31d98]
/home/rybens/miniconda3/envs/guidance/lib/python3.11/lib-dynload/../../libffi.so.8(+0xc050)[0xff9d9251c050]
/home/rybens/miniconda3/envs/guidance/lib/python3.11/lib-dynload/../../libffi.so.8(+0x9580)[0xff9d92519580]
/home/rybens/miniconda3/envs/guidance/lib/python3.11/lib-dynload/_ctypes.cpython-311-aarch64-linux-gnu.so(+0x14b6c)[0xff9d92554b6c]
/home/rybens/miniconda3/envs/guidance/lib/python3.11/lib-dynload/_ctypes.cpython-311-aarch64-linux-gnu.so(+0xd784)[0xff9d9254d784]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_MakeTpCall+0x98)[0x48549c]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyEval_EvalFrameDefault+0x31c8)[0x4292f8]
/home/rybens/miniconda3/envs/guidance/bin/python[0x570304]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_FastCallDictTstate+0x100)[0x485710]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_Call_Prepend+0x13c)[0x4859f0]
/home/rybens/miniconda3/envs/guidance/bin/python[0x4f7960]
/home/rybens/miniconda3/envs/guidance/bin/python[0x4efa14]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_MakeTpCall+0x98)[0x48549c]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyEval_EvalFrameDefault+0x31c8)[0x4292f8]
/home/rybens/miniconda3/envs/guidance/bin/python[0x570304]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_FastCallDictTstate+0x100)[0x485710]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_Call_Prepend+0x13c)[0x4859f0]
/home/rybens/miniconda3/envs/guidance/bin/python[0x4f7960]
/home/rybens/miniconda3/envs/guidance/bin/python[0x4efa14]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_Call+0x68)[0x485268]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyEval_EvalFrameDefault+0x4b64)[0x42ac94]
/home/rybens/miniconda3/envs/guidance/bin/python[0x570304]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_FastCallDictTstate+0x100)[0x485710]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_Call_Prepend+0x13c)[0x4859f0]
/home/rybens/miniconda3/envs/guidance/bin/python[0x4f7960]
/home/rybens/miniconda3/envs/guidance/bin/python[0x4efa14]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_Call+0x68)[0x485268]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyEval_EvalFrameDefault+0x4b64)[0x42ac94]
/home/rybens/miniconda3/envs/guidance/bin/python[0x570304]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_FastCallDictTstate+0x100)[0x485710]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_Call_Prepend+0x13c)[0x4859f0]
/home/rybens/miniconda3/envs/guidance/bin/python[0x4f7960]
/home/rybens/miniconda3/envs/guidance/bin/python[0x4efa14]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyObject_MakeTpCall+0x98)[0x48549c]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyEval_EvalFrameDefault+0x31c8)[0x4292f8]
/home/rybens/miniconda3/envs/guidance/bin/python[0x570304]
/home/rybens/miniconda3/envs/guidance/bin/python(PyEval_EvalCode+0xa8)[0x5703b8]
/home/rybens/miniconda3/envs/guidance/bin/python[0x5b77bc]
/home/rybens/miniconda3/envs/guidance/bin/python[0x5b7b04]
/home/rybens/miniconda3/envs/guidance/bin/python[0x5b7c34]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyRun_SimpleFileObject+0x144)[0x5bad74]
/home/rybens/miniconda3/envs/guidance/bin/python(_PyRun_AnyFileObject+0x9c)[0x5bb350]
/home/rybens/miniconda3/envs/guidance/bin/python(Py_RunMain+0x7b0)[0x5db710]
/home/rybens/miniconda3/envs/guidance/bin/python(Py_BytesMain+0x64)[0x5dbd94]
/lib/aarch64-linux-gnu/libc.so.6(+0x273fc)[0xff9db05673fc]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xff9db05674cc]
/home/rybens/miniconda3/envs/guidance/bin/python[0x4323a0]
Aborted

So it looks like llama.cpp is to blame? Convert script?

Edit: Oh so this is the fault of llama.cpp not recognizing gemma2, now I understand!

yeah basically because of how the sliding attention in gemma 2 works, if llama.cpp detects gemma2 arch, it then attempts to recognize which (2b vs 9b vs 27b) based on the layer count, this would need to be updated to... something... to fix it.. maybe ranges? maybe more logical like if greater than 27b's min layers, 27b, else if greater than 9b it's 9b, else it's 2b? maybe i'll look at making a PR

@bartowski Super, thank you so much

Owner

Please try this out before submitting a PR to modify llama.cpp!

I'm not sure this method helps with smaller models. I generally see generalized performance decreased with small models vs generalized increases on very large models.

Install the transformers library and CUDA and test it a bit first.

I will try to do it now this weekend, but I promise nothing, because I am on vacation and only have access to a laptop without a graphics card.
I will have to run the Transformers library through the CPU. Maybe it will work :D

Anyway, I will insert the report in the discussion on the model page.

@dnhkng it's only a matter of time before people self-merge Gemma 2 anyways so it's probably better to try to catch it in a less hard defined way

Sign up or log in to comment