matrixportal
/

txgemma-9b-chat-GGUF

+---
+base_model: google/txgemma-9b-chat
+language:
+- en
+library_name: transformers
+license: other
+license_name: health-ai-developer-foundations
+license_link: https://developers.google.com/health-ai-developer-foundations/terms
+pipeline_tag: text-generation
+tags:
+- therapeutics
+- drug-development
+- llama-cpp
+- matrixportal
+extra_gated_heading: Access TxGemma on Hugging Face
+extra_gated_prompt: To access TxGemma on Hugging Face, you're required to review and
+  agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health-ai-developer-foundations/terms).
+  To do this, please ensure you're logged in to Hugging Face and click below. Requests
+  are processed immediately.
+extra_gated_button_content: Acknowledge license
+---
+# matrixportal/txgemma-9b-chat-GGUF
+   This model was converted to GGUF format from [`google/txgemma-9b-chat`](https://huggingface.co/google/txgemma-9b-chat) using llama.cpp via the ggml.ai's [all-gguf-same-where](https://huggingface.co/spaces/matrixportal/all-gguf-same-where) space.
+Refer to the [original model card](https://huggingface.co/google/txgemma-9b-chat) for more details on the model.
+## ✅ Quantized Models Download List
+### 🔍 Recommended Quantizations
+- **✨ General CPU Use:** [`Q4_K_M`](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q4_k_m.gguf) (Best balance of speed/quality)
+- **📱 ARM Devices:** [`Q4_0`](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q4_0.gguf) (Optimized for ARM CPUs)
+- **🏆 Maximum Quality:** [`Q8_0`](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q8_0.gguf) (Near-original quality)
+### 📦 Full Quantization Options
+| 🚀 Download | 🔢 Type | 📝 Notes |
+|:---------|:-----|:------|
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q2_k.gguf) | ![Q2_K](https://img.shields.io/badge/Q2_K-1A73E8) | Basic quantization |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q3_k_s.gguf) | ![Q3_K_S](https://img.shields.io/badge/Q3_K_S-34A853) | Small size |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q3_k_m.gguf) | ![Q3_K_M](https://img.shields.io/badge/Q3_K_M-FBBC05) | Balanced quality |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q3_k_l.gguf) | ![Q3_K_L](https://img.shields.io/badge/Q3_K_L-4285F4) | Better quality |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q4_0.gguf) | ![Q4_0](https://img.shields.io/badge/Q4_0-EA4335) | Fast on ARM |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q4_k_s.gguf) | ![Q4_K_S](https://img.shields.io/badge/Q4_K_S-673AB7) | Fast, recommended |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q4_k_m.gguf) | ![Q4_K_M](https://img.shields.io/badge/Q4_K_M-673AB7) ⭐ | Best balance |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q5_0.gguf) | ![Q5_0](https://img.shields.io/badge/Q5_0-FF6D01) | Good quality |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q5_k_s.gguf) | ![Q5_K_S](https://img.shields.io/badge/Q5_K_S-0F9D58) | Balanced |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q5_k_m.gguf) | ![Q5_K_M](https://img.shields.io/badge/Q5_K_M-0F9D58) | High quality |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q6_k.gguf) | ![Q6_K](https://img.shields.io/badge/Q6_K-4285F4) 🏆 | Very good quality |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q8_0.gguf) | ![Q8_0](https://img.shields.io/badge/Q8_0-EA4335) ⚡ | Fast, best quality |
+| [Download](https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-f16.gguf) | ![F16](https://img.shields.io/badge/F16-000000) | Maximum accuracy |
+💡 **Tip:** Use `F16` for maximum precision when quality is critical
+# GGUF Model Quantization & Usage Guide with llama.cpp
+## What is GGUF and Quantization?
+**GGUF** (GPT-Generated Unified Format) is an efficient model file format developed by the `llama.cpp` team that:
+- Supports multiple quantization levels
+- Works cross-platform
+- Enables fast loading and inference
+**Quantization** converts model weights to lower precision data types (e.g., 4-bit integers instead of 32-bit floats) to:
+- Reduce model size
+- Decrease memory usage
+- Speed up inference
+- (With minor accuracy trade-offs)
+## Step-by-Step Guide
+### 1. Prerequisites
+```bash
+# System updates
+sudo apt update && sudo apt upgrade -y
+# Dependencies
+sudo apt install -y build-essential cmake python3-pip
+# Clone and build llama.cpp
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
+make -j4
+```
+### 2. Using Quantized Models from Hugging Face
+My automated quantization script produces models in this format:
+```
+https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q4_k_m.gguf
+```
+Download your quantized model directly:
+```bash
+wget https://huggingface.co/matrixportal/txgemma-9b-chat-GGUF/resolve/main/txgemma-9b-chat-q4_k_m.gguf
+```
+### 3. Running the Quantized Model
+Basic usage:
+```bash
+./main -m txgemma-9b-chat-q4_k_m.gguf -p "Your prompt here" -n 128
+```
+Example with a creative writing prompt:
+```bash
+./main -m txgemma-9b-chat-q4_k_m.gguf        -p "[INST] Write a short poem about AI quantization in the style of Shakespeare [/INST]"        -n 256 -c 2048 -t 8 --temp 0.7
+```
+Advanced parameters:
+```bash
+./main -m txgemma-9b-chat-q4_k_m.gguf        -p "Question: What is the GGUF format?
+Answer:"        -n 256 -c 2048 -t 8 --temp 0.7 --top-k 40 --top-p 0.9
+```
+### 4. Python Integration
+Install the Python package:
+```bash
+pip install llama-cpp-python
+```
+Example script:
+```python
+from llama_cpp import Llama
+# Initialize the model
+llm = Llama(
+    model_path="txgemma-9b-chat-q4_k_m.gguf",
+    n_ctx=2048,
+    n_threads=8
+)
+# Run inference
+response = llm(
+    "[INST] Explain GGUF quantization to a beginner [/INST]",
+    max_tokens=256,
+    temperature=0.7,
+    top_p=0.9
+)
+print(response["choices"][0]["text"])
+```
+## Performance Tips
+1. **Hardware Utilization**:
+   - Set thread count with `-t` (typically CPU core count)
+   - Compile with CUDA/OpenCL for GPU support
+2. **Memory Optimization**:
+   - Lower quantization (like q4_k_m) uses less RAM
+   - Adjust context size with `-c` parameter
+3. **Speed/Accuracy Balance**:
+   - Higher bit quantization is slower but more accurate
+   - Reduce randomness with `--temp 0` for consistent results
+## FAQ
+**Q: What quantization levels are available?**
+A: Common options include q4_0, q4_k_m, q5_0, q5_k_m, q8_0
+**Q: How much performance loss occurs with q4_k_m?**
+A: Typically 2-5% accuracy reduction but 4x smaller size
+**Q: How to enable GPU support?**
+A: Build with `make LLAMA_CUBLAS=1` for NVIDIA GPUs
+## Useful Resources
+1. [llama.cpp GitHub](https://github.com/ggerganov/llama.cpp)
+2. [GGUF Format Specs](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md)
+3. [Hugging Face Model Hub](https://huggingface.co/models)