basiphobe commited on
Commit
3341050
·
verified ·
1 Parent(s): acedfe5

Update README with GGUF format documentation and usage instructions

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md CHANGED
@@ -102,6 +102,85 @@ response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_spec
102
  - **LoRA Adapter**: Smaller adapter files (`adapter_model.safetensors`, `adapter_config.json`)
103
  - **Tokenizer**: Shared tokenizer files for both options
104
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  ## Intended Use
106
 
107
  This model is designed to:
 
102
  - **LoRA Adapter**: Smaller adapter files (`adapter_model.safetensors`, `adapter_config.json`)
103
  - **Tokenizer**: Shared tokenizer files for both options
104
 
105
+ ## GGUF Format Models
106
+
107
+ This repository also includes GGUF format models optimized for use with **llama.cpp**, **Ollama**, and other GGUF-compatible inference engines. These formats offer excellent performance and compatibility across different platforms.
108
+
109
+ ### Available GGUF Models
110
+
111
+ | File | Size | Format | Use Case | RAM Required |
112
+ |------|------|--------|----------|--------------|
113
+ | `merged-sci-model.gguf` | 14GB | F16 | Maximum quality inference | ~16GB |
114
+ | `merged-sci-model-q4_k_m.gguf` | 4.1GB | Q4_K_M | Balanced quality/performance | ~6GB |
115
+
116
+ ### Usage with Ollama
117
+
118
+ **1. Download and create Modelfile:**
119
+ ```bash
120
+ # Download the quantized model (recommended)
121
+ wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q4_k_m.gguf
122
+
123
+ # Create Modelfile
124
+ cat > Modelfile << 'EOF'
125
+ FROM ./merged-sci-model-q4_k_m.gguf
126
+ TEMPLATE """<|im_start|>system
127
+ You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI.<|im_end|>
128
+ <|im_start|>user
129
+ {{ .Prompt }}<|im_end|>
130
+ <|im_start|>assistant
131
+ """
132
+ PARAMETER stop "<|im_start|>"
133
+ PARAMETER stop "<|im_end|>"
134
+ PARAMETER temperature 0.7
135
+ PARAMETER top_p 0.9
136
+ EOF
137
+ ```
138
+
139
+ **2. Create and run the model:**
140
+ ```bash
141
+ ollama create sci-assistant -f Modelfile
142
+ ollama run sci-assistant "What are the signs of autonomic dysreflexia?"
143
+ ```
144
+
145
+ ### Usage with llama.cpp
146
+
147
+ **1. Install and setup:**
148
+ ```bash
149
+ # Clone and build llama.cpp
150
+ git clone https://github.com/ggerganov/llama.cpp
151
+ cd llama.cpp
152
+ make
153
+
154
+ # Download model
155
+ wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q4_k_m.gguf
156
+ ```
157
+
158
+ **2. Interactive chat:**
159
+ ```bash
160
+ ./main -m merged-sci-model-q4_k_m.gguf \
161
+ --temp 0.7 \
162
+ --repeat_penalty 1.1 \
163
+ -c 4096 \
164
+ --interactive \
165
+ --in-prefix "<|im_start|>user\n" \
166
+ --in-suffix "<|im_end|>\n<|im_start|>assistant\n"
167
+ ```
168
+
169
+ **3. Single prompt:**
170
+ ```bash
171
+ ./main -m merged-sci-model-q4_k_m.gguf \
172
+ --temp 0.7 \
173
+ -c 2048 \
174
+ -p "<|im_start|>system\nYou are a specialized medical assistant for people with spinal cord injuries.<|im_end|>\n<|im_start|>user\nWhat exercises are good for someone with paraplegia?<|im_end|>\n<|im_start|>assistant\n"
175
+ ```
176
+
177
+ ### Performance Comparison
178
+
179
+ - **F16 Model** (`merged-sci-model.gguf`): Maximum quality, larger memory footprint
180
+ - **Q4_K_M Model** (`merged-sci-model-q4_k_m.gguf`): 99%+ quality retention, 3.5x smaller size, recommended for most users
181
+
182
+ Both models use the **ChatML** template format and support up to **32K context length**.
183
+
184
  ## Intended Use
185
 
186
  This model is designed to: