Swahili Gemma 1B - GGUF

Quantized GGUF versions of Swahili Gemma 1B, a fine-tuned Gemma 3 1B instruction model specialized for English-to-Swahili translation and Swahili conversational AI. The model accepts input in both English and Swahili but outputs responses exclusively in Swahili.

πŸ“Š Translation Performance

Translation Performance Comparison

Model Comparison

Model Parameters BLEU chrF++ Efficiency*
Gemma 3 4B 4B 10.9 44.1 2.7
Swahili Gemma 1B 1B 27.6 56.8 27.6
Gemma 3 27B 27B 29.4 60.0 1.1
GPT-5 Mini ~8B 31.8 62.4 4.0
Gemini 2.0 Flash Large 35.6 64.6 N/A

*Efficiency = BLEU Score / Parameters (in billions)

Key Performance Insights

🎯 Efficiency Leader: Achieves the highest BLEU-to-parameter ratio (27.6 BLEU per billion parameters)
πŸš€ Size Advantage: Outperforms Gemma 3 4B (4x larger) by 153% on BLEU score
πŸ’Ž Competitive Quality: Achieves 94% of Gemma 3 27B performance with 27x fewer parameters
⚑ Practical Deployment: Runs efficiently on consumer hardware while maintaining quality

Evaluation Details

  • Dataset: FLORES-200 Englishβ†’Swahili (1,012 translation pairs)
  • Metrics: BLEU (bilingual evaluation understudy) and chrF++ (character F-score)
  • Evaluation: Zero-shot translation performance

πŸš€ Quick Start

# Download the recommended Q4_K_M quantization
pip install huggingface_hub

# Python download
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="CraneAILabs/swahili-gemma-1b-GGUF",
    local_dir="swahili-gemma-1b-GGUF",
    allow_patterns=["Q4_K_M/*"]  # Download only Q4_K_M version
)

πŸ“Š Available Quantizations

Quantization Folder File Size Quality Use Case
F32 F32/ ~3.8GB Highest Research & benchmarking
F16 F16/ ~1.9GB Highest Maximum quality inference
Q8_0 Q8_0/ ~1.0GB Very High Production with ample resources
Q5_K_M Q5_K_M/ ~812MB High Balanced quality/size
Q4_K_M Q4_K_M/ ~769MB Good Recommended for most users
Q4_K_S Q4_K_S/ ~745MB Good Resource-constrained environments
Q3_K_M Q3_K_M/ ~689MB Fair Mobile/edge deployment
Q2_K Q2_K/ ~658MB Lower Minimal resource usage

πŸ’» Usage with llama.cpp

Basic Translation

# English to Swahili translation
./llama-cli \
  --model swahili-gemma-1b-GGUF/Q4_K_M/swahili-gemma-1b-q4_k_m.gguf \
  --prompt "Translate to Swahili: Hello, how are you today?" \
  --temp 0.3 \
  --top-p 0.95 \
  --top-k 64 \
  --repeat-penalty 1.1 \
  -n 128

πŸ”§ Usage with Ollama

# Create model from GGUF
ollama create swahili-gemma-1b -f Modelfile

# Use for translation
ollama run swahili-gemma-1b "Translate to Swahili: Good morning!"

# Use for conversation  
ollama run swahili-gemma-1b "Hujambo! Je, unaweza kunisaidia?"

Modelfile Example

FROM swahili-gemma-1b-GGUF/Q4_K_M/swahili-gemma-1b-q4_k_m.gguf

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"

🐍 Usage with Python (llama-cpp-python)

from llama_cpp import Llama

# Initialize model
llm = Llama(
    model_path="swahili-gemma-1b-GGUF/Q4_K_M/swahili-gemma-1b-q4_k_m.gguf",
    n_ctx=2048,
    n_threads=8,
    verbose=False
)

# Generate translation
response = llm(
    "Translate to Swahili: Hello, how are you today?",
    max_tokens=128,
    temperature=0.3,
    top_p=0.95,
    top_k=64,
    repeat_penalty=1.1
)

print(response['choices'][0]['text'])

🌍 Language Capabilities

  • Input Languages: English + Swahili
  • Output Language: Swahili only
  • Primary Focus: English-to-Swahili translation and Swahili conversation

πŸ“Š Performance Metrics

Translation Quality (BLEU Scores)

Model BLEU Score chrF++
πŸ₯‡ Swahili Gemma 1B 23.64 52.26
πŸ₯ˆ ChatGPT-4o-latest [TBD] [TBD]
πŸ₯‰ Other Models [TBD] [TBD]

Evaluated on 1,012 English-to-Swahili translation samples.

🎯 Capabilities

  • Translation: English-to-Swahili translation
  • Conversational AI: Natural dialogue in Swahili
  • Summarization: Text summarization in Swahili
  • Writing: Creative and informational writing in Swahili
  • Question Answering: General knowledge responses in Swahili

πŸ’‘ Recommended Parameters

# Optimal settings for translation tasks
--temp 0.3
--top-p 0.95
--top-k 64
--repeat-penalty 1.1
--ctx-size 2048

πŸ”— Related Models

πŸ› οΈ Technical Details

  • Base Model: google/gemma-3-1b-it
  • Architecture: Gemma 3
  • Context Length: 4,096 tokens
  • Quantization: GGML format with multiple precision levels
  • Compatible: llama.cpp, Ollama, Jan, LM Studio, and other GGUF engines

🎨 Use Cases

  • Offline Translation: Run Swahili translation without internet
  • Local AI Assistant: Swahili conversational AI on your machine
  • Educational Tools: Language learning applications
  • Content Creation: Generate Swahili content locally
  • Research: Swahili language model experiments

⚠️ Limitations

  • Language Output: Responds only in Swahili
  • Quantization Trade-offs: Lower bit quantizations may reduce quality
  • Context Limit: 4K tokens for optimal performance
  • Specialized Tasks: May need fine-tuning for specific domains

πŸ“„ License

This model is released under the Gemma Terms of Use. Please review the terms before use.

πŸ™ Acknowledgments

  • Google: For the Gemma 3 base model, support and guidance.
  • Community: For Swahili language resources and datasets
  • Gilbert Korir (Msingi AI, Nairobi, Kenya)
  • Alfred Malengo Kondoro (Hanyang University, Seoul, South Korea)

Citation

If you use these GGUF quantizations in your research or applications, please cite:

@misc{crane_ai_labs_2025,
    author    = {Bakunga Bronson and Kato Steven Mubiru and Lwanga Caleb and Gimei Alex and Kavuma Lameck and Roland Ganafa and Sibomana Glorry and Atuhaire Collins and JohnRoy Nangeso and Tukamushaba Catherine},
    title     = {Swahili Gemma: A Fine-tuned Gemma 3 1B Model for Swahili conversational AI},
    year      = {2025},
    url       = {https://huggingface.co/CraneAILabs/swahili-gemma-1b},
    organization = {Crane AI Labs}
}

Built with ❀️ by Crane AI Labs

Swahili Gemma - Your helpful Swahili AI companion, optimized for local deployment

Downloads last month
47
GGUF
Model size
1,000M params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for CraneAILabs/swahili-gemma-1b-GGUF

Quantized
(1)
this model

Collection including CraneAILabs/swahili-gemma-1b-GGUF