Unknown Architecture 'hunyuan-moe' (using --jinja with llama-server)

by x-polyglot-x - opened 10 days ago

Discussion

x-polyglot-x

10 days ago

Hi unlsoth!

Thanks for uploading this model. I can't wait to try it out!

I'm running into the unknown architecture error above with this command:

llama-server -m Documents/llm/hunyuan_a13b/Hunyuan-A13B-Instruct-Q8_0-00001-of-00002.gguf -ngl 99 --jinja --temp 0.7 --top-k 20 --top-p 0.8 --repeat-penalty 1.05

I also tried again using the direct line of code you mentioned at the top of the model card page:

llama-cli -hf unsloth/Hunyuan-A13B-Instruct-GGUF:Q4_K_XL -ngl 99 --jinja --temp 0.7 --top-k 20 --top-p 0.8 --repeat-penalty 1.05

Any advice on what to try? I am on an M4 Max if that makes any difference.

Thanks again!

YearZero

10 days ago

What build of llamacpp are you using? The support for the model was added in b5843.

x-polyglot-x

10 days ago

I was on an older version. I used brew upgrade llama.cpp and that got me to version 5840 (which is the latest stable version). Sadly, that still isn't b5843. I downloaded the zip file for Mac ARM and trying to find how to install the newest version now.

Thanks for this tip!!

x-polyglot-x

10 days ago

I was able to successfully run the model after using cmake to build the latest llama.cpp.

The following command now works: llama.cpp/build/bin/llama-server -m "Documents/llm/hunyuan_a13b/Hunyuan-A13B-Instruct-Q8_0-00001-of-00002.gguf" -ngl 99 --jinja --temp 0.7 --top-k 20 --top-p 0.8 --repeat-penalty 1.05

Thank you very much for your assistance!!

x-polyglot-x changed discussion status to closed 10 days ago

x-polyglot-x changed discussion status to open 10 days ago

doc-acula

10 days ago

I was able to successfully run the model after using cmake to build the latest llama.cpp.

The following command now works: llama.cpp/build/bin/llama-server -m "Documents/llm/hunyuan_a13b/Hunyuan-A13B-Instruct-Q8_0-00001-of-00002.gguf" -ngl 99 --jinja --temp 0.7 --top-k 20 --top-p 0.8 --repeat-penalty 1.05

Thank you very much for your assistance!!

I am facing the same problem. I git cloned the llama.cpp repo. After cmake I get llama-server version 5830. git branch says I am on master.
The releases are already at 5854: https://github.com/ggml-org/llama.cpp/releases
How can I update my directory to that? Do I have to git checkout to a different branch? And if yes, to which one?

x-polyglot-x

10 days ago

•

edited 10 days ago

I followed a guide from a Medium article (link here: https://medium.com/@jackcheang5/running-llama-cpp-in-mac-22e71123b811)

NOTE: This is for Apple silicon devices!!

First, clone the git hub repository

git clone https://github.com/ggerganov/llama.cpp

then run these steps

cd llama.cpp
mkdir build
cd build
cmake .. -DCMAKE_APPLE_SILICON_PROCESSOR=arm64
make -j

after that, launch llama-server from that directory (in other words, point to it directly)

llama.cpp/build/bin/llama-server -m "Documents/llm/hunyuan_a13b/Hunyuan-A13B-Instruct-Q8_0-00001-of-00002.gguf" -ngl 99 --jinja --temp 0.7 --top-k 20 --top-p 0.8 --repeat-penalty 1.05

The code above assumes you are launching terminal and the 'llama.cpp' folder is in your user directory (default). It also assumes that your model is stores in your Documents folder (you would need to change likely both of these paths).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment