Promt time (ollama) on 22c xenon, 5070 ti, 128GB ram. (Q6_K_L)
#12 opened 25 days ago
by
MikeZeroTango
Template bug fixed in llama.cpp
❤️
🔥
4
5
#11 opened about 1 month ago
by
matteogeniaccio
vllm depolyment error
1
#10 opened about 1 month ago
by
Saicy
Higher than usual refusal rate with Q6_K_L quant GGUF
3
#9 opened about 1 month ago
by
smcleod

Tool use?
2
#8 opened about 1 month ago
by
johnpyp
llama.cpp fixes have just been merged
🤗
2
21
#5 opened about 1 month ago
by
Mushoz
LM Studio: unknown model architecture: 'glm4'?
5
#4 opened about 1 month ago
by
DrNicefellow

please regenerate ggufs
👍
3
1
#3 opened about 1 month ago
by
jacek2024
Broken results
➕
1
8
#2 opened about 2 months ago
by
RamoreRemora
Yarn quantization for long context
1
#1 opened about 2 months ago
by
sovetboga