Test on 3090 + Tesla P40 (48gb vram total) + 64gb ram (Q2K)

#6
by roadtoagi - opened

Getting around 2-3 t/s with llama.cpp. It works great! Insane progress.

Waiting for UD quants.

I'm getting an error with prompt template though

common_chat_templates_init: failed to parse chat template (defaulting to chatml): Expected value expression at row 18, column 30:
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
^
{%- set index = (messages|length - 1) - loop.index0 %}

srv init: initializing slots, n_slots = 1
slot init: id 0 | task -1 | new slot n_ctx_slot = 8192
main: model loaded
main: chat template, chat_template: {%- for message in messages -%}
{{- '<|im_start|>' + message.role + '
' + message.content + '<|im_end|>
' -}}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{- '<|im_start|>assistant
' -}}
{%- endif -%},

Maybe it will be fixed if I update llama.cpp?

Q2k seems better than free openrouter version

roadtoagi changed discussion title from Test on 3090 + Tesla P40 (48gb) + 64gb ram (Q2K) to Test on 3090 + Tesla P40 (48gb vram total) + 64gb ram (Q2K)
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment