Llama-4-Scout-17B-16E-Instruct-GGUF

ONLY text mode of Llama-4 supported for now!

Original Model

unsloth/Llama-4-Scout-17B-16E-Instruct

Run with LlamaEdge

  • LlamaEdge version: v0.16.16 and above

  • Prompt template

    • Prompt type: llama-4-chat

    • Prompt string

      <|begin_of_text|><|header_start|>system<|header_end|>
      
      {system_prompt}<|eot|><|header_start|>user<|header_end|>
      
      {user_message_1}<|eot|><|header_start|>assistant<|header_end|>
      
      {assistant_message_1}<|eot|><|header_start|>user<|header_end|>
      
      {user_message_2}<|eot|>
      <|header_start|>assistant<|header_end|>
      
    • Example: tool use

      • Prompt with user question and tool info

        Expand to see the example
        <|begin_of_text|><|header_start|>system<|header_end|>
        
        You are a helpful assistant.<|eot|>
        <|header_start|>user<|header_end|>
        
        Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.
        
        Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.
        Do not use variables.
        
        [
          {
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "description": "Get the current weather in a given location",
              "parameters": {
                "type": "object",
                "properties": {
                  "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                  },
                  "unit": {
                    "type": "string",
                    "description": "The temperature unit to use. Infer this from the users location.",
                    "enum": [
                      "celsius",
                      "fahrenheit"
                    ]
                  }
                },
                "required": [
                  "location",
                  "unit"
                ]
              }
            }
          },
          {
            "type": "function",
            "function": {
              "name": "predict_weather",
              "description": "Predict the weather in 24 hours",
              "parameters": {
                "type": "object",
                "properties": {
                  "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                  },
                  "unit": {
                    "type": "string",
                    "description": "The temperature unit to use. Infer this from the users location.",
                    "enum": [
                      "celsius",
                      "fahrenheit"
                    ]
                  }
                },
                "required": [
                  "location",
                  "unit"
                ]
              }
            }
          },
          {
            "type": "function",
            "function": {
              "name": "sum",
              "description": "Calculate the sum of two numbers",
              "parameters": {
                "type": "object",
                "properties": {
                  "a": {
                    "type": "integer",
                    "description": "the left hand side number"
                  },
                  "b": {
                    "type": "integer",
                    "description": "the right hand side number"
                  }
                },
                "required": [
                  "a",
                  "b"
                ]
              }
            }
          }
        ]
        
        Question: How is the weather of Beijing, China in celsius?<|eot|><|header_start|>assistant<|header_end|>
        
      • Prompt with tool results

        Expand to see the example
        <|begin_of_text|><|header_start|>system<|header_end|>
        
        You are a helpful assistant.<|eot|>
        <|header_start|>user<|header_end|>
        
        How is the weather of Beijing, China in celsius?<|eot|>
        <|header_start|>assistant<|header_end|>
        
        {"name":"get_current_weather","arguments":"{\"location\":\"Beijing, China\",\"unit\":\"celsius\"}"}<|eot|>
        <|header_start|>ipython<|header_end|>
        
        {"temperature":"30","unit":"celsius"}<|eot|><|header_start|>assistant<|header_end|>
        
        Expand to see the example
  • Context size: 10M

  • Run as LlamaEdge service

    wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-4-Scout-17B-16E-Instruct-Q5_K_M.gguf \
      llama-api-server.wasm \
      --prompt-template llama-4-chat \
      --ctx-size 1000000 \
      --model-name Llama-4-Scout
    
  • Run as LlamaEdge command app

    wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-4-Scout-17B-16E-Instruct-Q5_K_M.gguf \
      llama-chat.wasm \
      --prompt-template llama-4-chat \
      --ctx-size 1000000
    

Quantized GGUF Models

Name Quant method Bits Size Use case
Llama-4-Scout-17B-16E-Instruct-Q2_K.gguf Q2_K 2 39.6 GB smallest, significant quality loss - not recommended for most purposes
Llama-4-Scout-17B-16E-Instruct-Q3_K_L-00001-of-00002.gguf Q3_K_L 3 29.8 GB small, substantial quality loss
Llama-4-Scout-17B-16E-Instruct-Q3_K_L-00002-of-00002.gguf Q3_K_L 3 26.1 GB small, substantial quality loss
Llama-4-Scout-17B-16E-Instruct-Q3_K_M-00001-of-00002.gguf Q3_K_M 3 29.8 GB very small, high quality loss
Llama-4-Scout-17B-16E-Instruct-Q3_K_M-00002-of-00002.gguf Q3_K_M 3 21.9 GB very small, high quality loss
Llama-4-Scout-17B-16E-Instruct-Q3_K_S-00001-of-00002.gguf Q3_K_S 3 29.7 GB very small, high quality loss
Llama-4-Scout-17B-16E-Instruct-Q3_K_S-00002-of-00002.gguf Q3_K_S 3 17.0 GB very small, high quality loss
Llama-4-Scout-17B-16E-Instruct-Q4_0-00001-of-00003.gguf Q4_0 4 30.0 GB legacy; small, very high quality loss - prefer using Q3_K_M
Llama-4-Scout-17B-16E-Instruct-Q4_0-00002-of-00003.gguf Q4_0 4 29.7 GB legacy; small, very high quality loss - prefer using Q3_K_M
Llama-4-Scout-17B-16E-Instruct-Q4_0-00003-of-00003.gguf Q4_0 4 1.20 GB legacy; small, very high quality loss - prefer using Q3_K_M
Llama-4-Scout-17B-16E-Instruct-Q4_K_M-00001-of-00003.gguf Q4_K_M 4 29.9 GB medium, balanced quality - recommended
Llama-4-Scout-17B-16E-Instruct-Q4_K_M-00002-of-00003.gguf Q4_K_M 4 29.8 GB medium, balanced quality - recommended
Llama-4-Scout-17B-16E-Instruct-Q4_K_M-00003-of-00003.gguf Q4_K_M 4 5.66 GB medium, balanced quality - recommended
Llama-4-Scout-17B-16E-Instruct-Q4_K_S-00001-of-00003.gguf Q4_K_S 4 29.7 GB small, greater quality loss
Llama-4-Scout-17B-16E-Instruct-Q4_K_S-00002-of-00003.gguf Q4_K_S 4 29.7 GB small, greater quality loss
Llama-4-Scout-17B-16E-Instruct-Q4_K_S-00003-of-00003.gguf Q4_K_S 4 2.04 GB small, greater quality loss
Llama-4-Scout-17B-16E-Instruct-Q5_0-00001-of-00003.gguf Q5_0 5 29.9 GB legacy; medium, balanced quality - prefer using Q4_K_M
Llama-4-Scout-17B-16E-Instruct-Q5_0-00002-of-00003.gguf Q5_0 5 29.8 GB legacy; medium, balanced quality - prefer using Q4_K_M
Llama-4-Scout-17B-16E-Instruct-Q5_0-00003-of-00003.gguf Q5_0 5 14.6 GB legacy; medium, balanced quality - prefer using Q4_K_M
Llama-4-Scout-17B-16E-Instruct-Q5_K_M-00001-of-00003.gguf Q5_K_M 5 29.8 GB large, very low quality loss - recommended
Llama-4-Scout-17B-16E-Instruct-Q5_K_M-00002-of-00003.gguf Q5_K_M 5 29.8 GB large, very low quality loss - recommended
Llama-4-Scout-17B-16E-Instruct-Q5_K_M-00003-of-00003.gguf Q5_K_M 5 16.9 GB large, very low quality loss - recommended
Llama-4-Scout-17B-16E-Instruct-Q5_K_S-00001-of-00003.gguf Q5_K_S 5 29.9 GB large, low quality loss - recommended
Llama-4-Scout-17B-16E-Instruct-Q5_K_S-00002-of-00003.gguf Q5_K_S 5 29.8 GB large, low quality loss - recommended
Llama-4-Scout-17B-16E-Instruct-Q5_K_S-00003-of-00003.gguf Q5_K_S 5 14.6 GB large, low quality loss - recommended
Llama-4-Scout-17B-16E-Instruct-Q6_K-00001-of-00003.gguf Q6_K 6 30.0 GB very large, extremely low quality loss
Llama-4-Scout-17B-16E-Instruct-Q6_K-00002-of-00003.gguf Q6_K 6 29.6 GB very large, extremely low quality loss
Llama-4-Scout-17B-16E-Instruct-Q6_K-00003-of-00003.gguf Q6_K 6 28.9 GB very large, extremely low quality loss
Llama-4-Scout-17B-16E-Instruct-Q8_0-00001-of-00004.gguf Q8_0 8 29.5 GB very large, extremely low quality loss - not recommended
Llama-4-Scout-17B-16E-Instruct-Q8_0-00002-of-00004.gguf Q8_0 8 29.7 GB very large, extremely low quality loss - not recommended
Llama-4-Scout-17B-16E-Instruct-Q8_0-00003-of-00004.gguf Q8_0 8 29.7 GB very large, extremely low quality loss - not recommended
Llama-4-Scout-17B-16E-Instruct-Q8_0-00004-of-00004.gguf Q8_0 8 25.7 GB very large, extremely low quality loss - not recommended
Llama-4-Scout-17B-16E-Instruct-f16-00001-of-00008.gguf f16 16 30.0 GB
Llama-4-Scout-17B-16E-Instruct-f16-00002-of-00008.gguf f16 16 29.5 GB
Llama-4-Scout-17B-16E-Instruct-f16-00003-of-00008.gguf f16 16 29.1 GB
Llama-4-Scout-17B-16E-Instruct-f16-00004-of-00008.gguf f16 16 29.5 GB
Llama-4-Scout-17B-16E-Instruct-f16-00005-of-00008.gguf f16 16 29.5 GB
Llama-4-Scout-17B-16E-Instruct-f16-00006-of-00008.gguf f16 16 29.1 GB
Llama-4-Scout-17B-16E-Instruct-f16-00007-of-00008.gguf f16 16 29.5 GB
Llama-4-Scout-17B-16E-Instruct-f16-00008-of-00008.gguf f16 16 9.41 GB

Quantized with llama.cpp b5074.

Downloads last month
3,115
GGUF
Model size
108B params
Architecture
llama4
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for second-state/Llama-4-Scout-17B-16E-Instruct-GGUF