File size: 6,906 Bytes
4551c7a
 
 
 
 
 
 
 
 
 
 
 
 
 
290cc1f
4b3b2af
4551c7a
 
 
36c20e6
02496a2
36c20e6
4551c7a
 
 
f084f4d
 
 
0cc1428
 
 
 
 
 
 
 
 
 
 
 
 
097139b
 
 
 
 
 
0cc1428
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4551c7a
 
 
 
 
 
 
 
 
 
 
 
 
 
77499e5
4551c7a
 
2ed5a49
 
 
 
 
4551c7a
 
 
2ed5a49
4551c7a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77499e5
4551c7a
2ed5a49
4551c7a
 
 
 
 
 
2ed5a49
 
 
 
 
4551c7a
 
 
2ed5a49
4551c7a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
license: gemma
language:
- tr
pipeline_tag: text-generation
base_model: google/gemma2-9b
tags:
- Turkish
- gemma2
- DPO
- SFT
- conversational
- instruction
---

<img src="./Turkish_Gemma.png"/>

# Turkish-Gemma-9b-v0.1

This is the Turkish-Gemma-9b-v0.1. This model is based on Gemma-2-9b, and was developed through a combination of continual pre-training, supervised fine-tuning (SFT), direct preference optimization (DPO), and model merging.

The Turkish-Gemma-9b-v0.1 is designed for Turkish text generation tasks, providing coherent, contextually relevant continuations and answers. Due to the diverse nature of the training data—which includes large-scale pre-training corpora, instruction-tuning data, and human preference data—the model may exhibit biases. Users should be aware of these and deploy the model responsibly.

You can easily demo the model here: https://cosmos.yildiz.edu.tr/cosmosgemma

To evaluate model performance, we compiled a dataset of 1,450 carefully designed questions across diverse categories. Each question was reviewed and rated by 18 human annotators, allowing for a reliable comparison across multiple models.

The table below summarizes the evaluation results:

### 🏆 Model Comparison: Win Rates

| Model Name                                   | Win Rate        |
|---------------------------------------------|-----------------|
| Qwen/Qwen3-30B-A3B                           | 62.39%          |
| gpt-4o-mini                                  | 62.12%          |
| google/gemma-3-12b-it                        | 61.61%          |
| google/gemma-2-27b-it                        | 57.91%          |
| **ytu-ce-cosmos/Turkish-Gemma-9b-v0.1**      | **57.30%**      |
| google/gemma-2-9b-it                         | 54.13%          |
| ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1      | 36.89%          |


### Voting Metodology

A question and two answers from different models were presented to human judges. The judges selected the better answer based on their preferences. For example, in the question below, the judge selected the answer on the right:
![Alt text](https://i.imgur.com/AcR9ymM.png)

### 📊 Turkish Evaluation Benchmark Results (via `malhajar17/lm-evaluation-harness_turkish`)

| Model Name                                   | Average | MMLU  | Truthful_QA | ARC   | Hellaswag | Gsm8K | Winogrande |
|---------------------------------------------|---------|-------|--------------|-------|-----------|-------|------------|
| Qwen/Qwen2.5-72B-Instruct                    | 67.69   | 77.28 | 59.86        | 61.52 | 61.98     | 83.6  | 61.92      |
| google/gemma-3-27b-it                        | 67.36   | 70.2  | 57.06        | 66.98 | 66.58     | 77.52 | 65.8       |
| google/gemma-2-27b-it                        | 65.57   | 66.49 | 57.45        | 63.65 | 63.86     | 76.54 | 65.4       |
| meta-llama/Llama-3-1-70B-Instruct            | 63.92   | 74.00 | 51.41        | 59.64 | 64.31     | 66.13 | 66.90      |
| Qwen/Qwen2.5-32B-Instruct                    | 63.74   | 70.93 | 57.87        | 57.00 | 57.04     | 77.83 | 61.77      |
| **ytu-ce-cosmos/Turkish-Gemma-9b-v0.1**      | **63.31** | **63.85** | **54.21**    | **59.64** | **64.19** | **73.42** | **64.53** |
| google/gemma-3-12b-it                        | 62.94   | 63.92 | 57.16        | 60.67 | 62.00     | 72.06 | 61.77      |
| Qwen/Qwen2.5-14B-it                          | 60.34   | 65.28 | 59.00        | 50.00 | 52.22     | 76.77 | 58.77      |
| google/gemma-2-9b-it                         | 59.14   | 61.07 | 55.77        | 56.31 | 56.48     | 63.10 | 62.09      |
| ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1      | 55.03   | 51.97 | 57.56        | 51.02 | 52.96     | 59.87 | 57.77      |
| Qwen/Qwen2.5-7B-Instruct                     | 53.42   | 56.31 | 55.99        | 42.06 | 44.71     | 64.16 | 59.66      |



#### Transformers pipeline

```python
import transformers
import torch
model_id = "ytu-ce-cosmos/Turkish-Gemma-9b-v0.1"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)
messages = [
    {"role": "user", "content": "İsmi RD olan bir fonksiyon ona verilen sayının çarpmaya göre tersini döndürmektedir. Örneğin RD(3)=1/3. Buna göre RD(X)=X ifadesini doğru yapan kaç X değeri vardır?"}
]

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = pipeline(
    messages,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][-1])
# RD(X) = X ifadesi, bir sayının çarpmaya göre tersinin kendisiyle eşit olması anlamına gelir. Yani, X ile 1/X aynı olmalıdır. Bu durum yalnızca X'in karesi 1 olduğunda gerçekleşir:

# X² = 1

# Bu denklemin çözümleri:

# X = 1 ve X = -1

# Dolayısıyla, RD(X) = X eşitliğini sağlayan *iki* X değeri vardır: *1* ve *-1*.
```

#### Transformers AutoModelForCausalLM

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "ytu-ce-cosmos/Turkish-Gemma-9b-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "İsmi RD olan bir fonksiyon ona verilen sayının çarpmaya göre tersini döndürmektedir. Örneğin RD(3)=1/3. Buna göre RD(X)=X ifadesini doğru yapan kaç X değeri vardır?"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=False,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
# RD(X) = X ifadesi, bir sayının çarpmaya göre tersinin kendisiyle eşit olması anlamına gelir. Yani, X ile 1/X aynı olmalıdır. Bu durum yalnızca X'in karesi 1 olduğunda gerçekleşir:

# X² = 1

# Bu denklemin çözümleri:

# X = 1 ve X = -1

# Dolayısıyla, RD(X) = X eşitliğini sağlayan *iki* X değeri vardır: *1* ve *-1*.

```


# Acknowledgments
- Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
- Computing resources used in this work were provided by the National Center for High Performance Computing of Turkey (UHeM) under grant numbers 1016912023 and 
1018512024

### Contact
COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department <br>
https://cosmos.yildiz.edu.tr/ <br>
[email protected]

---
license: gemma2
---