Script adjustment suggestion: use llama-gguf-split

#301
by patf82 - opened

The current split files like "part1of3" aren't directly loadable by llama.cpp.

If the splits were created with the llama-gguf-split utility using the "-00001-of-00005.gguf" name convention (and splits between tensor boundaries) then llama.cpp could directly load the files as is.

I know adjusting scripts is always super annoying (for the dumbest reasons), but it'd be a nice touch of extra convenience.

Unfortunately not possible, see the FAQ (the model card), where this is addressed. Besides, llama.cpp could directly load the files as they are as well - it was a deliberate choice on their side to make a new file format that is incompatible to all existing split quants on hf.

mradermacher changed discussion status to closed

Sign up or log in to comment