NVIDIA-Nemotron-Nano-9B-v2-GGUF
Original Model
nvidia/NVIDIA-Nemotron-Nano-9B-v2
Run with LlamaEdge
- LlamaEdge version: coming soon
Prompt template
Prompt type:
nemotron-2-chat
Prompt string
Thinking
mode<SPECIAL_10>System You are a helpful assistant. <SPECIAL_11>User Hi, can you explain what quantum computing is? /think <SPECIAL_11>Assistant <think> The user is asking about quantum computing, I should give a simple explanation. </think> Quantum computing uses principles of quantum mechanics, such as superposition and entanglement, to perform computations that would be hard for classical computers. <SPECIAL_12> <SPECIAL_11>User That sounds complicated. Can you give me an example? /think <SPECIAL_11>Assistant <think> I should give a concrete example, like Shor’s algorithm or Grover’s algorithm, but in simple terms. </think> For example, a quantum computer can factor very large numbers much faster than classical computers, which could break some encryption methods. <SPECIAL_12> <SPECIAL_11>User Okay, explain it in a simpler way for a 10-year-old. /think <SPECIAL_11>Assistant <think>
No-thinking
mode<SPECIAL_10>System You are a helpful assistant. /no_think <SPECIAL_11>User Hi, can you explain what quantum computing is? /no_think <SPECIAL_11>Assistant <think></think> Quantum computing is a type of computing that uses the rules of quantum mechanics to solve certain problems faster than regular computers. <SPECIAL_12> <SPECIAL_11>User That sounds complicated. Can you give me an example? /no_think <SPECIAL_11>Assistant <think></think> For example, quantum computers could quickly factor very large numbers, which is important for cryptography. <SPECIAL_12> <SPECIAL_11>User Okay, explain it even more simply. /no_think <SPECIAL_11>Assistant <think></think>
Context size:
128000
Run as LlamaEdge service
wasmedge --dir .:. \ --nn-preload default:GGML:AUTO:NVIDIA-Nemotron-Nano-9B-v2-Q5_K_M.gguf \ llama-api-server.wasm \ --prompt-template nemotron-2-chat \ --ctx-size 128000 \ --model-name nemotron-nano-v2
Quantized GGUF Models
Name | Quant method | Bits | Size | Use case |
---|---|---|---|---|
NVIDIA-Nemotron-Nano-9B-v2-Q2_K.gguf | Q2_K | 2 | 5.01 GB | smallest, significant quality loss - not recommended for most purposes |
NVIDIA-Nemotron-Nano-9B-v2-Q3_K_L.gguf | Q3_K_L | 3 | 5.49 GB | small, substantial quality loss |
NVIDIA-Nemotron-Nano-9B-v2-Q3_K_M.gguf | Q3_K_M | 3 | 5.38 GB | very small, high quality loss |
NVIDIA-Nemotron-Nano-9B-v2-Q3_K_S.gguf | Q3_K_S | 3 | 5.13 GB | very small, high quality loss |
NVIDIA-Nemotron-Nano-9B-v2-Q4_0.gguf | Q4_0 | 4 | 5.31 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
NVIDIA-Nemotron-Nano-9B-v2-Q4_K_M.gguf | Q4_K_M | 4 | 6.53 GB | medium, balanced quality - recommended |
NVIDIA-Nemotron-Nano-9B-v2-Q4_K_S.gguf | Q4_K_S | 4 | 6.21 GB | small, greater quality loss |
NVIDIA-Nemotron-Nano-9B-v2-Q5_0.gguf | Q5_0 | 5 | 6.35 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
NVIDIA-Nemotron-Nano-9B-v2-Q5_K_M.gguf | Q5_K_M | 5 | 7.07 GB | large, very low quality loss - recommended |
NVIDIA-Nemotron-Nano-9B-v2-Q5_K_S.gguf | Q5_K_S | 5 | 6.78 GB | large, low quality loss - recommended |
NVIDIA-Nemotron-Nano-9B-v2-Q6_K.gguf | Q6_K | 6 | 9.14 GB | very large, extremely low quality loss |
NVIDIA-Nemotron-Nano-9B-v2-Q8_0.gguf | Q8_0 | 8 | 17.8 GB | very large, extremely low quality loss - not recommended |
NVIDIA-Nemotron-Nano-9B-v2-f16.gguf | f16 | 16 | 30.0 GB |
Quantized with llama.cpp b6315.
- Downloads last month
- 714
Hardware compatibility
Log In
to view the estimation
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for second-state/NVIDIA-Nemotron-Nano-9B-v2-GGUF
Base model
nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base
Finetuned
nvidia/NVIDIA-Nemotron-Nano-12B-v2
Finetuned
nvidia/NVIDIA-Nemotron-Nano-9B-v2