File size: 3,911 Bytes
9a8f6c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c9712a
9a8f6c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7319ec
9a8f6c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
language:
- sw
- en
license: apache-2.0
base_model:
- google/gemma-2-2b
library_name: transformers
---

# PAWA: Swahili SML for Various Tasks

---

## Overview

**PAWA** is a Swahili-specialized language model designed to excel in tasks requiring nuanced understanding and interaction in Swahili and English. It leverages supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) for improved performance and consistency. Below are the detailed model specifications, installation steps, usage examples, and its intended applications.

---
### Model Details

- **Model Name**: Pawa-Gemma-Swahili-2B
- **Model Type**: PAWA  
- **Architecture**:  
  - 2B Parameter Gemma-2 Base Model  
  - Enhanced with Swahili SFT and DPO datasets.  
- **Languages Supported**:  
  - Swahili  
  - English  
  - Custom tokenizer for multi-language flexibility.  
- **Primary Use Cases**:  
  - Contextually rich Swahili-focused tasks.  
  - General assistance and chat-based interactions.  
- **License**: Custom/Contact Author for terms of use.  

---
### Installation and Setup
Ensure the necessary libraries are installed and up-to-date:

```bash
!pip uninstall transformers -y && pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git"
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install datasets
```
---
### Model Loading
You can load the model using the following code snippet:

```python
from unsloth import FastLanguageModel
import torch

model_name = "sartifyllc/Pawa-kaggle-gemma-2b"
max_seq_length = 2048  
dtype = None  
load_in_4bit = False  

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)
```

---
### Chat Template Configuration
For a seamless conversational experience, configure the tokenizer with the appropriate chat template:
```python
from unsloth.chat_templates import get_chat_template
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

tokenizer = get_chat_template(
    tokenizer,
    chat_template="chatml",  # Supports templates like zephyr, chatml, mistral, etc.
    mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},  # ShareGPT style
    map_eos_token=True,  # Maps <|im_end|> to </s>
)
```
---
### Usage Example
Generate a short story in Swahili:

```python
messages = [{"from": "human", "value": "Tengeneza hadithi fupi"}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True)
```
---
### Training and Fine-Tuning Details

- **Base Model**: Gemma-2-2B
- **Continue Pre-Training**: 3B Swahili Tokens
- **Fine-tuning**: Enhanced with Swahili SFT datasets for improved contextual understanding.  
- **Optimization**: Includes DPO for deterministic and consistent responses.  

---

### Intended Use Cases

- **General Assistance**:  
  Provides structured answers for general-purpose use.  

- **Interactive Q&A**:  
  Designed for general-purpose chat environments.  

- **RAG (Retrieval-Augmented Generation)**:  
  Works best for RAG and specific use cases.

---
### Limitations

- **Biases**:  
  The model may exhibit biases inherent in its fine-tuning datasets.

- **Generalization**:  
  May struggle with tasks outside the trained domain.

- **Hardware Requirements**:  
  - Optimal performance requires GPUs with high memory (e.g., Tesla V100 or T4).  
  - Supports 4-bit quantization for reduced memory usage.


Feel free to reach out for further guidance or collaboration opportunities regarding PAWA!