Turkish-Gemma-9b-v0.1

This is the Turkish-Gemma-9b-v0.1. This model is based on Gemma-2-9b, and was developed through a combination of continual pre-training, supervised fine-tuning (SFT), direct preference optimization (DPO), and model merging.

The Turkish-Gemma-9b-v0.1 is designed for Turkish text generation tasks, providing coherent, contextually relevant continuations and answers. Due to the diverse nature of the training data—which includes large-scale pre-training corpora, instruction-tuning data, and human preference data—the model may exhibit biases. Users should be aware of these and deploy the model responsibly.

You can easily demo the model here (Coming soon!): https://cosmos.yildiz.edu.tr/cosmosllm

To evaluate model performance, we compiled a dataset of 1,450 carefully designed questions across diverse categories. Each question was reviewed and rated by 18 human annotators, allowing for a reliable comparison across multiple models.

The table below summarizes the evaluation results:

🏆 Model Comparison: Win Rates

Model Name	Win Rate
Qwen/Qwen3-30B-A3B	62.39%
gpt-4o-mini	62.12%
google/gemma-3-12b-it	61.61%
google/gemma-2-27b-it	57.91%
ytu-ce-cosmos/Turkish-Gemma-9b-v0.1	57.30%
google/gemma-2-9b-it	54.13%
ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1	36.89%

Voting Metodology

A question and two answers from different models were presented to human judges. The judges selected the better answer based on their preferences. For example, in the question below, the judge selected the answer on the right:

📊 Turkish Evaluation Benchmark Results (via `malhajar17/lm-evaluation-harness_turkish`)

Model Name	Average	MMLU	Truthful_QA	ARC	Hellaswag	Gsm8K	Winogrande
Qwen/Qwen2.5-72B-Instruct	67.69	77.28	59.86	61.52	61.98	83.6	61.92
google/gemma-3-27b-it	67.36	70.2	57.06	66.98	66.58	77.52	65.8
google/gemma-2-27b-it	65.57	66.49	57.45	63.65	63.86	76.54	65.4
meta-llama/Llama-3-1-70B-Instruct	63.92	74.00	51.41	59.64	64.31	66.13	66.90
Qwen/Qwen2.5-32B-Instruct	63.74	70.93	57.87	57.00	57.04	77.83	61.77
ytu-ce-cosmos/Turkish-Gemma-9b-v0.1	63.31	63.85	54.21	59.64	64.19	73.42	64.53
google/gemma-3-12b-it	62.94	63.92	57.16	60.67	62.00	72.06	61.77
Qwen/Qwen2.5-14B-it	60.34	65.28	59.00	50.00	52.22	76.77	58.77
google/gemma-2-9b-it	59.14	61.07	55.77	56.31	56.48	63.10	62.09
ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1	55.03	51.97	57.56	51.02	52.96	59.87	57.77
Qwen/Qwen2.5-7B-Instruct	53.42	56.31	55.99	42.06	44.71	64.16	59.66

Transformers pipeline

import transformers
import torch
model_id = "ytu-ce-cosmos/Turkish-Gemma-9b-v0.1"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)
messages = [
    {"role": "user", "content": "İsmi RD olan bir fonksiyon ona verilen sayının çarpmaya göre tersini döndürmektedir. Örneğin RD(3)=1/3. Buna göre RD(X)=X ifadesini doğru yapan kaç X değeri vardır?"}
]

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = pipeline(
    messages,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][-1])
# RD(X) = X ifadesi, bir sayının çarpmaya göre tersinin kendisiyle eşit olması anlamına gelir. Yani, X ile 1/X aynı olmalıdır. Bu durum yalnızca X'in karesi 1 olduğunda gerçekleşir:

# X² = 1

# Bu denklemin çözümleri:

# X = 1 ve X = -1

# Dolayısıyla, RD(X) = X eşitliğini sağlayan *iki* X değeri vardır: *1* ve *-1*.

Transformers AutoModelForCausalLM

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "ytu-ce-cosmos/Turkish-Gemma-9b-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "İsmi RD olan bir fonksiyon ona verilen sayının çarpmaya göre tersini döndürmektedir. Örneğin RD(3)=1/3. Buna göre RD(X)=X ifadesini doğru yapan kaç X değeri vardır?"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<end_of_turn>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=False,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
# RD(X) = X ifadesi, bir sayının çarpmaya göre tersinin kendisiyle eşit olması anlamına gelir. Yani, X ile 1/X aynı olmalıdır. Bu durum yalnızca X'in karesi 1 olduğunda gerçekleşir:

# X² = 1

# Bu denklemin çözümleri:

# X = 1 ve X = -1

# Dolayısıyla, RD(X) = X eşitliğini sağlayan *iki* X değeri vardır: *1* ve *-1*.

Acknowledgments

Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
Computing resources used in this work were provided by the National Center for High Performance Computing of Turkey (UHeM) under grant numbers 1016912023 and 1018512024

Contact

COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department
https://cosmos.yildiz.edu.tr/
[email protected]

ytu-ce-cosmos
/

Turkish-Gemma-9b-v0.1

Turkish-Gemma-9b-v0.1

🏆 Model Comparison: Win Rates

Voting Metodology

📊 Turkish Evaluation Benchmark Results (via `malhajar17/lm-evaluation-harness_turkish`)

Transformers pipeline

Transformers AutoModelForCausalLM

Acknowledgments

Contact

license: gemma2

Model tree for ytu-ce-cosmos/Turkish-Gemma-9b-v0.1

Turkish-Gemma-9b-v0.1

🏆 Model Comparison: Win Rates

Voting Metodology

📊 Turkish Evaluation Benchmark Results (via malhajar17/lm-evaluation-harness_turkish)

Transformers pipeline

Transformers AutoModelForCausalLM

Acknowledgments

Contact

license: gemma2

Model tree for ytu-ce-cosmos/Turkish-Gemma-9b-v0.1

📊 Turkish Evaluation Benchmark Results (via `malhajar17/lm-evaluation-harness_turkish`)