Qwen3 GGUF Models
Collection
LlamaEdge compatible quants for Qwen3 models.
โข
7 items
โข
Updated
LlamaEdge version:
Prompt template
Prompt type: chatml
(for thinking)
Prompt string
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
Prompt type: qwen3-no-think
(for no thinking)
Prompt string
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message_1}<|im_end|>
<|im_start|>assistant
{assistant_message_1}<|im_end|>
<|im_start|>user
{user_message_2}<|im_end|>
<|im_start|>assistant
<think>
</think>
Context size: 128000
Run as LlamaEdge service
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Qwen3-32B-Q5_K_M.gguf \
llama-api-server.wasm \
--model-name Qwen3-32B \
--prompt-template chatml \
--ctx-size 128000
Quantized with llama.cpp b5097
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Base model
Qwen/Qwen3-32B