basiphobe commited on
Commit
6253946
·
verified ·
1 Parent(s): 3341050

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +15 -9
README.md CHANGED
@@ -111,18 +111,21 @@ This repository also includes GGUF format models optimized for use with **llama.
111
  | File | Size | Format | Use Case | RAM Required |
112
  |------|------|--------|----------|--------------|
113
  | `merged-sci-model.gguf` | 14GB | F16 | Maximum quality inference | ~16GB |
 
 
 
114
  | `merged-sci-model-q4_k_m.gguf` | 4.1GB | Q4_K_M | Balanced quality/performance | ~6GB |
115
 
116
  ### Usage with Ollama
117
 
118
  **1. Download and create Modelfile:**
119
  ```bash
120
- # Download the quantized model (recommended)
121
- wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q4_k_m.gguf
122
 
123
  # Create Modelfile
124
  cat > Modelfile << 'EOF'
125
- FROM ./merged-sci-model-q4_k_m.gguf
126
  TEMPLATE """<|im_start|>system
127
  You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI.<|im_end|>
128
  <|im_start|>user
@@ -152,12 +155,12 @@ cd llama.cpp
152
  make
153
 
154
  # Download model
155
- wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q4_k_m.gguf
156
  ```
157
 
158
  **2. Interactive chat:**
159
  ```bash
160
- ./main -m merged-sci-model-q4_k_m.gguf \
161
  --temp 0.7 \
162
  --repeat_penalty 1.1 \
163
  -c 4096 \
@@ -168,7 +171,7 @@ wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-mode
168
 
169
  **3. Single prompt:**
170
  ```bash
171
- ./main -m merged-sci-model-q4_k_m.gguf \
172
  --temp 0.7 \
173
  -c 2048 \
174
  -p "<|im_start|>system\nYou are a specialized medical assistant for people with spinal cord injuries.<|im_end|>\n<|im_start|>user\nWhat exercises are good for someone with paraplegia?<|im_end|>\n<|im_start|>assistant\n"
@@ -176,10 +179,13 @@ wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-mode
176
 
177
  ### Performance Comparison
178
 
179
- - **F16 Model** (`merged-sci-model.gguf`): Maximum quality, larger memory footprint
180
- - **Q4_K_M Model** (`merged-sci-model-q4_k_m.gguf`): 99%+ quality retention, 3.5x smaller size, recommended for most users
 
 
 
181
 
182
- Both models use the **ChatML** template format and support up to **32K context length**.
183
 
184
  ## Intended Use
185
 
 
111
  | File | Size | Format | Use Case | RAM Required |
112
  |------|------|--------|----------|--------------|
113
  | `merged-sci-model.gguf` | 14GB | F16 | Maximum quality inference | ~16GB |
114
+ | `merged-sci-model-q6_k.gguf` | 5.6GB | Q6_K | High quality with good compression | ~8GB |
115
+ | `merged-sci-model-q5_k_m.gguf` | 4.8GB | Q5_K_M | Excellent quality/size balance | ~7GB |
116
+ | `merged-sci-model-q5_k_s.gguf` | 4.7GB | Q5_K_S | Good quality, slightly smaller | ~7GB |
117
  | `merged-sci-model-q4_k_m.gguf` | 4.1GB | Q4_K_M | Balanced quality/performance | ~6GB |
118
 
119
  ### Usage with Ollama
120
 
121
  **1. Download and create Modelfile:**
122
  ```bash
123
+ # Download the Q5_K_M model (recommended balance of quality/size)
124
+ wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q5_k_m.gguf
125
 
126
  # Create Modelfile
127
  cat > Modelfile << 'EOF'
128
+ FROM ./merged-sci-model-q5_k_m.gguf
129
  TEMPLATE """<|im_start|>system
130
  You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI.<|im_end|>
131
  <|im_start|>user
 
155
  make
156
 
157
  # Download model
158
+ wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q5_k_m.gguf
159
  ```
160
 
161
  **2. Interactive chat:**
162
  ```bash
163
+ ./main -m merged-sci-model-q5_k_m.gguf \
164
  --temp 0.7 \
165
  --repeat_penalty 1.1 \
166
  -c 4096 \
 
171
 
172
  **3. Single prompt:**
173
  ```bash
174
+ ./main -m merged-sci-model-q5_k_m.gguf \
175
  --temp 0.7 \
176
  -c 2048 \
177
  -p "<|im_start|>system\nYou are a specialized medical assistant for people with spinal cord injuries.<|im_end|>\n<|im_start|>user\nWhat exercises are good for someone with paraplegia?<|im_end|>\n<|im_start|>assistant\n"
 
179
 
180
  ### Performance Comparison
181
 
182
+ - **F16 Model** (`merged-sci-model.gguf`): Maximum quality, largest memory footprint
183
+ - **Q6_K Model** (`merged-sci-model-q6_k.gguf`): Near-maximum quality with 60% size reduction
184
+ - **Q5_K_M Model** (`merged-sci-model-q5_k_m.gguf`): Excellent quality retention, good balance
185
+ - **Q5_K_S Model** (`merged-sci-model-q5_k_s.gguf`): Very good quality, slightly more compressed
186
+ - **Q4_K_M Model** (`merged-sci-model-q4_k_m.gguf`): Good quality, smallest size, recommended for resource-constrained environments
187
 
188
+ All models use the **ChatML** template format and support up to **32K context length**.
189
 
190
  ## Intended Use
191