Update README with GGUF format documentation and usage instructions

Browse files

Files changed (1) hide show

README.md +79 -0

README.md CHANGED Viewed

@@ -102,6 +102,85 @@ response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_spec
 - **LoRA Adapter**: Smaller adapter files (`adapter_model.safetensors`, `adapter_config.json`)
 - **Tokenizer**: Shared tokenizer files for both options
 ## Intended Use
 This model is designed to:

 - **LoRA Adapter**: Smaller adapter files (`adapter_model.safetensors`, `adapter_config.json`)
 - **Tokenizer**: Shared tokenizer files for both options
+## GGUF Format Models
+This repository also includes GGUF format models optimized for use with **llama.cpp**, **Ollama**, and other GGUF-compatible inference engines. These formats offer excellent performance and compatibility across different platforms.
+### Available GGUF Models
+| File | Size | Format | Use Case | RAM Required |
+|------|------|--------|----------|--------------|
+| `merged-sci-model.gguf` | 14GB | F16 | Maximum quality inference | ~16GB |
+| `merged-sci-model-q4_k_m.gguf` | 4.1GB | Q4_K_M | Balanced quality/performance | ~6GB |
+### Usage with Ollama
+**1. Download and create Modelfile:**
+```bash
+# Download the quantized model (recommended)
+wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q4_k_m.gguf
+# Create Modelfile
+cat > Modelfile << 'EOF'
+FROM ./merged-sci-model-q4_k_m.gguf
+TEMPLATE """<|im_start|>system
+You are a specialized medical assistant for people with spinal cord injuries. Your responses should always consider the unique needs, challenges, and medical realities of individuals living with SCI.<|im_end|>
+<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+"""
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|im_end|>"
+PARAMETER temperature 0.7
+PARAMETER top_p 0.9
+EOF
+```
+**2. Create and run the model:**
+```bash
+ollama create sci-assistant -f Modelfile
+ollama run sci-assistant "What are the signs of autonomic dysreflexia?"
+```
+### Usage with llama.cpp
+**1. Install and setup:**
+```bash
+# Clone and build llama.cpp
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
+make
+# Download model
+wget https://huggingface.co/basiphobe/sci-assistant/resolve/main/merged-sci-model-q4_k_m.gguf
+```
+**2. Interactive chat:**
+```bash
+./main -m merged-sci-model-q4_k_m.gguf \
+  --temp 0.7 \
+  --repeat_penalty 1.1 \
+  -c 4096 \
+  --interactive \
+  --in-prefix "<|im_start|>user\n" \
+  --in-suffix "<|im_end|>\n<|im_start|>assistant\n"
+```
+**3. Single prompt:**
+```bash
+./main -m merged-sci-model-q4_k_m.gguf \
+  --temp 0.7 \
+  -c 2048 \
+  -p "<|im_start|>system\nYou are a specialized medical assistant for people with spinal cord injuries.<|im_end|>\n<|im_start|>user\nWhat exercises are good for someone with paraplegia?<|im_end|>\n<|im_start|>assistant\n"
+```
+### Performance Comparison
+- **F16 Model** (`merged-sci-model.gguf`): Maximum quality, larger memory footprint
+- **Q4_K_M Model** (`merged-sci-model-q4_k_m.gguf`): 99%+ quality retention, 3.5x smaller size, recommended for most users
+Both models use the **ChatML** template format and support up to **32K context length**.
 ## Intended Use
 This model is designed to: