https://huggingface.co/Downtown-Case/Star-Command-R-Lite-32B-v1

I think this can be resolved in two ways. I haven't tested either myself but it makes sense given what is said here, and given that this raw model is larger than original Command-R, and the only difference is the existence of lm_head.weight when you look at the model.safetensors.index.json of this compared to Command-R ( or Star-Command-R-32B-v1).

Option 1:
Edit the convert_hf_to_gguf.py to ignore lm_head.weight like it does for Gemma models.
Add this in the @Model .register("CohereForCausalLM") section of convert_hf_to_gguf.py

def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
        del bid  # unused

        # lm_head is not used in llama.cpp, while autoawq will include this tensor in model
        # To prevent errors, skip loading lm_head.weight.
        if name == "lm_head.weight":
            logger.debug(f"Skipping get tensor {name!r} in safetensors so that convert can end normally.")
            return []

        return [(self.map_tensor_name(name), data_torch)]

Option 2:
Delete model-00001-of-00015.safetensors and remove the reference to it in model.safetensors.index.json
This is because model-00001-of-00015.safetensors contains all of lm_head.weight and only lm_head.weight. If that were not the case, you would have to edit the safetensor file in a method as described in the link above.

mradermacher

Owner Nov 13, 2024

I'm not normally a friend of invasive changes to a model like this, but since I got such a great set of instructions I'll have to do it :) Thanks @tdh111

mradermacher

Owner Nov 13, 2024

imatrix is generating, all looks good. you can watch here http://hf.tst.eu/status.html

tdh111

Nov 13, 2024

I'm not normally a friend of invasive changes to a model like this

Sorry, I should have gone to the model maker first to ask them to fix it.

imatrix is generating, all looks good. you can watch here http://hf.tst.eu/status.html

I'm glad it worked, just out of curiosity which option did you use?

mradermacher

Owner Nov 13, 2024

#2 - its easier to let the model fail and fix it than to patch and update all servers

But maybe #1 would be a good patch for upstream llama.cpp?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment