This requires a fork of llama.cpp

install

git clone https://github.com/Noeda/llama.cpp
git checkout -b dots1
cmake # your usual cmake parameters / official documentation

run

Use this following cli args to override the chat_template and special tokens:

./llama-cli -m ./dots.llm1.inst-GGUF/dots.1.instruct.q4_k.gguf-00001-of-00002.gguf --ctx-size 8192 --n-gpu-layers 64 -t 16  --temp 0.3 --chat-template "{% if messages[0]['role'] == 'system' %}<|system|>{{ messages[0]['content'] }}<|endofsystem|>{% set start_idx = 1 %}{% else %}<|system|>You are a helpful assistant.<|endofsystem|>{% set start_idx = 0 %}{% endif %}{% for idx in range(start_idx, messages|length) %}{% if messages[idx]['role'] == 'user' %}<|userprompt|>{{ messages[idx]['content'] }}<|endofuserprompt|>{% elif messages[idx]['role'] == 'assistant' %}<|response|>{{ messages[idx]['content'] }}<|endofresponse|>{% endif %}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] == 'user' %}<|response|>{% endif %}" --jinja    --override-kv tokenizer.ggml.bos_token_id=int:-1   --override-kv tokenizer.ggml.eos_token_id=int:151645   --override-kv tokenizer.ggml.pad_token_id=int:151645   --override-kv tokenizer.ggml.eot_token_id=int:151649 --override-kv tokenizer.ggml.eog_token_id=int:151649

Thanks @shigureui for posting it here

Downloads last month
1,168
GGUF
Model size
143B params
Architecture
dots1
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gghfez/dots.llm1.inst-GGUF

Quantized
(7)
this model