Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +15 -9

README.md CHANGED Viewed

@@ -111,18 +111,21 @@ This repository also includes GGUF format models optimized for use with **llama.
 | File | Size | Format | Use Case | RAM Required |
 |------|------|--------|----------|--------------|
 | `merged-sci-model.gguf` | 14GB | F16 | Maximum quality inference | ~16GB |
 | `merged-sci-model-q4_k_m.gguf` | 4.1GB | Q4_K_M | Balanced quality/performance | ~6GB |
 ### Usage with Ollama
 **1. Download and create Modelfile:**
 ```bash
-# Download the quantized model (recommended)
-wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q4_k_m.gguf
 # Create Modelfile
 cat > Modelfile << 'EOF'
-FROM ./merged-sci-model-q4_k_m.gguf
 TEMPLATE """<|im_start|>system
 You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI.<|im_end|>
 <|im_start|>user
@@ -152,12 +155,12 @@ cd llama.cpp
 make
 # Download model
-wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q4_k_m.gguf
 ```
 **2. Interactive chat:**
 ```bash
-./main -m merged-sci-model-q4_k_m.gguf \
   --temp 0.7 \
   --repeat_penalty 1.1 \
   -c 4096 \
@@ -168,7 +171,7 @@ wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-mode
 **3. Single prompt:**
 ```bash
-./main -m merged-sci-model-q4_k_m.gguf \
   --temp 0.7 \
   -c 2048 \
   -p "<|im_start|>system\nYou are a specialized medical assistant for people with spinal cord injuries.<|im_end|>\n<|im_start|>user\nWhat exercises are good for someone with paraplegia?<|im_end|>\n<|im_start|>assistant\n"
@@ -176,10 +179,13 @@ wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-mode
 ### Performance Comparison
-- **F16 Model** (`merged-sci-model.gguf`): Maximum quality, larger memory footprint
-- **Q4_K_M Model** (`merged-sci-model-q4_k_m.gguf`): 99%+ quality retention, 3.5x smaller size, recommended for most users
-Both models use the **ChatML** template format and support up to **32K context length**.
 ## Intended Use

 | File | Size | Format | Use Case | RAM Required |
 |------|------|--------|----------|--------------|
 | `merged-sci-model.gguf` | 14GB | F16 | Maximum quality inference | ~16GB |
+| `merged-sci-model-q6_k.gguf` | 5.6GB | Q6_K | High quality with good compression | ~8GB |
+| `merged-sci-model-q5_k_m.gguf` | 4.8GB | Q5_K_M | Excellent quality/size balance | ~7GB |
+| `merged-sci-model-q5_k_s.gguf` | 4.7GB | Q5_K_S | Good quality, slightly smaller | ~7GB |
 | `merged-sci-model-q4_k_m.gguf` | 4.1GB | Q4_K_M | Balanced quality/performance | ~6GB |
 ### Usage with Ollama
 **1. Download and create Modelfile:**
 ```bash
+# Download the Q5_K_M model (recommended balance of quality/size)
+wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q5_k_m.gguf
 # Create Modelfile
 cat > Modelfile << 'EOF'
+FROM ./merged-sci-model-q5_k_m.gguf
 TEMPLATE """<|im_start|>system
 You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI.<|im_end|>
 <|im_start|>user
 make
 # Download model
+wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q5_k_m.gguf
 ```
 **2. Interactive chat:**
 ```bash
+./main -m merged-sci-model-q5_k_m.gguf \
   --temp 0.7 \
   --repeat_penalty 1.1 \
   -c 4096 \
 **3. Single prompt:**
 ```bash
+./main -m merged-sci-model-q5_k_m.gguf \
   --temp 0.7 \
   -c 2048 \
   -p "<|im_start|>system\nYou are a specialized medical assistant for people with spinal cord injuries.<|im_end|>\n<|im_start|>user\nWhat exercises are good for someone with paraplegia?<|im_end|>\n<|im_start|>assistant\n"
 ### Performance Comparison
+- **F16 Model** (`merged-sci-model.gguf`): Maximum quality, largest memory footprint
+- **Q6_K Model** (`merged-sci-model-q6_k.gguf`): Near-maximum quality with 60% size reduction
+- **Q5_K_M Model** (`merged-sci-model-q5_k_m.gguf`): Excellent quality retention, good balance
+- **Q5_K_S Model** (`merged-sci-model-q5_k_s.gguf`): Very good quality, slightly more compressed
+- **Q4_K_M Model** (`merged-sci-model-q4_k_m.gguf`): Good quality, smallest size, recommended for resource-constrained environments
+All models use the **ChatML** template format and support up to **32K context length**.
 ## Intended Use