File size: 3,911 Bytes
9a8f6c0 4c9712a 9a8f6c0 c7319ec 9a8f6c0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
language:
- sw
- en
license: apache-2.0
base_model:
- google/gemma-2-2b
library_name: transformers
---
# PAWA: Swahili SML for Various Tasks
---
## Overview
**PAWA** is a Swahili-specialized language model designed to excel in tasks requiring nuanced understanding and interaction in Swahili and English. It leverages supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) for improved performance and consistency. Below are the detailed model specifications, installation steps, usage examples, and its intended applications.
---
### Model Details
- **Model Name**: Pawa-Gemma-Swahili-2B
- **Model Type**: PAWA
- **Architecture**:
- 2B Parameter Gemma-2 Base Model
- Enhanced with Swahili SFT and DPO datasets.
- **Languages Supported**:
- Swahili
- English
- Custom tokenizer for multi-language flexibility.
- **Primary Use Cases**:
- Contextually rich Swahili-focused tasks.
- General assistance and chat-based interactions.
- **License**: Custom/Contact Author for terms of use.
---
### Installation and Setup
Ensure the necessary libraries are installed and up-to-date:
```bash
!pip uninstall transformers -y && pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git"
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install datasets
```
---
### Model Loading
You can load the model using the following code snippet:
```python
from unsloth import FastLanguageModel
import torch
model_name = "sartifyllc/Pawa-kaggle-gemma-2b"
max_seq_length = 2048
dtype = None
load_in_4bit = False
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
)
```
---
### Chat Template Configuration
For a seamless conversational experience, configure the tokenizer with the appropriate chat template:
```python
from unsloth.chat_templates import get_chat_template
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
tokenizer = get_chat_template(
tokenizer,
chat_template="chatml", # Supports templates like zephyr, chatml, mistral, etc.
mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, # ShareGPT style
map_eos_token=True, # Maps <|im_end|> to </s>
)
```
---
### Usage Example
Generate a short story in Swahili:
```python
messages = [{"from": "human", "value": "Tengeneza hadithi fupi"}]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True)
```
---
### Training and Fine-Tuning Details
- **Base Model**: Gemma-2-2B
- **Continue Pre-Training**: 3B Swahili Tokens
- **Fine-tuning**: Enhanced with Swahili SFT datasets for improved contextual understanding.
- **Optimization**: Includes DPO for deterministic and consistent responses.
---
### Intended Use Cases
- **General Assistance**:
Provides structured answers for general-purpose use.
- **Interactive Q&A**:
Designed for general-purpose chat environments.
- **RAG (Retrieval-Augmented Generation)**:
Works best for RAG and specific use cases.
---
### Limitations
- **Biases**:
The model may exhibit biases inherent in its fine-tuning datasets.
- **Generalization**:
May struggle with tasks outside the trained domain.
- **Hardware Requirements**:
- Optimal performance requires GPUs with high memory (e.g., Tesla V100 or T4).
- Supports 4-bit quantization for reduced memory usage.
Feel free to reach out for further guidance or collaboration opportunities regarding PAWA! |