barandinho/TDM-8b-v0.1-Q8_0-GGUF

This model was converted to GGUF format from barandinho/TDM-8b-v0.1 using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.

IMPORTANT NOTE :

Make sure to use this model with temperature=0.6 and top_p=0.95 settings for better performance.
You can also try out top_k=20 and min_p=0.01 settings.

Recommended way of using the model is through local llm clients like LM Studio or Ollama.

Make sure to do these settings before using the model:

Set this as system prompt:

Sen TÜDÜM (TÜrkçe Düşünen Üretken Model) isimli yardımsever bir yapay zeka modelisin.
Türkçe cevap ver ve cevabını tamamla.

For enabling multi-turn conversation paste this jinja template as chat template:

{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '
' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{% set content = message['content'] %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<｜User｜>' + content + '<｜Assistant｜>'}}{%- endif %}{%- if message['role'] == 'assistant' %}{%- set content = (message.content.split('</think>')|last).lstrip() %}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if content is none %}{{'<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '
' + '```json' + '
' + tool['function']['arguments'] + '
' + '```' + '<｜tool▁call▁end｜>'}}{%- else %}{{content + '<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '
' + '```json' + '
' + tool['function']['arguments'] + '
' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'
' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '
' + '```json' + '
' + tool['function']['arguments'] + '
' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- endfor %}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + content + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{{content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + content + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'
<｜tool▁output▁begin｜>' + content + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}

Use with llama.cpp

Below is an auto-generated generic way of using the model with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo barandinho/TDM-8b-v0.1-Q8_0-GGUF --hf-file tdm-8b-v0.1-q8_0.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo barandinho/TDM-8b-v0.1-Q8_0-GGUF --hf-file tdm-8b-v0.1-q8_0.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo barandinho/TDM-8b-v0.1-Q8_0-GGUF --hf-file tdm-8b-v0.1-q8_0.gguf -p "The meaning to life and the universe is"

./llama-server --hf-repo barandinho/TDM-8b-v0.1-Q8_0-GGUF --hf-file tdm-8b-v0.1-q8_0.gguf -c 2048

barandinho
/

TDM-8b-v0.1-Q8_0-GGUF

barandinho/TDM-8b-v0.1-Q8_0-GGUF

Use with llama.cpp

CLI:

Server:

Model tree for barandinho/TDM-8b-v0.1-Q8_0-GGUF