Tool calls in llama.cpp build: 6108 (20638e4f) do not work

by jkrauss82 - opened 1 day ago

1 day ago

Thanks for providing these quants @bartowski !

This seems a capable model as I can see from its outputs in Roo code. However, tool calling is failing for now, probably due to jinja template issues I guess.

llama-server startup:

LLAMA_SET_ROWS=1 ./llama-server --port 5001 --jinja --host 0.0.0.0 --split_mode none --n_gpu_layers 256 -c 131072 --flash_attn --slots -ctk q8_0 -ctv q8_0 -np 1 --model models/aws-prototyping_codefu-7b-v0.1-Q6_K_L.gguf -dev CUDA0 --defrag-thold 0.1 -b 2048 -ub 512 --threads 5

llama.cpp info:

$ ./llama-server
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX A6000, compute capability 8.6, VMM: yes
build: 6108 (20638e4f) with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
system info: n_threads = 6, n_threads_batch = 6, total_threads = 12

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CUDA : ARCHS = 860,890,1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |

bartowski

Owner 1 day ago

May be that we need to extract the template and pass it manually as has been done with some other models

I don't personally understand WHY that's necessary sometimes, but for example I know Mistral has needed it

jkrauss82

1 day ago

I have tried supplying the chat template found on the model card as a separate file but this also did not make llama.cpp fill the tool calls list.

I can see this log line in the llama.cpp output, could be a problem? Not sure if this is coming from the template or Roo code at this point.

srv  log_server_r: request: POST /v1/chat/completions 192.168.100.107 200
check_double_bos_eos: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
srv  params_from_: Chat format: DeepSeek R1

jkrauss82

1 day ago

Examining the description of the model in the original repo from AWS I think it was never trained for tool calling, I would not investigate this further for now. Thanks for your answer @bartowski .

jkrauss82 changed discussion status to closed 1 day ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment