Model Overview

RefinedNeuro/RN_TR_R2 is a Turkishโ€language reasoning model fine-tuned from ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions.


Evaluation

image/png

Benchmark: RN_TR_R2_Benchmark_Results (Open-ended Turkish culture & reasoning questions)

Model Name Score (%) Description
RefinedNeuro/RN_TR_R2 82.4 Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data
Qwen3-8B 76.8 Off-the-shelf Qwen3-8B
RefinedNeuro/RN_TR_R1 64.8 Baseline Turkish-fine-tuned model
ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 63.2 Turkish-adapted Llama-8B via direct preference optimization
Meta-Llama-3.1-8B-Instruct 63.2 Metaโ€™s Llama-3.1-8B with instruct tuning

RN_TR_R2 achieves a 17.6 pp improvement over the RN_TR_R1 baseline.


Intended Use

  • Primary use cases: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture).
  • Not recommended: Generating creative fiction or tasks requiring memorized facts outside its training scope.
  • Known limitations: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs.

Training Data

  • Base model: ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1

  • Fine-tuning dataset: RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset

    • Domains: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics
    • Grade levels: 6thโ€“12th grade
    • Prompt variations (e.g., long/detailed vs. short/easy; answer must be a single number)
    • Size: 13 000 examples (~9 million tokens)
    • License: Creative Commons

Training Configuration

  • Method: GRPO (Generalized Reinforcement-Preference Optimization) via unsloth
  • Hardware: 2ร— RTX 3090 GPUs for 1 week (~336 GPU-hours)
  • Batch size: 1
  • Context length: 4 096 tokens (max_seq_length=8 192 at inference)
  • Optimizer & scheduler: Cosine LR schedule, peak LR=5ร—10โปโถ
  • Library: unsloth==2025.3.18

Usage Example

pip install unsloth==2025.3.18
from unsloth import FastLanguageModel
from transformers import TextStreamer

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="RefinedNeuro/RN_TR_R2",
    max_seq_length=8192,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)

SYSTEM_PROMPT = """
Sana sorulan sorulara cevap verirken, adฤฑm adฤฑm dรผลŸรผn ve gerekรงelendir.
ร–nce <think></think> taglarฤฑ arasฤฑnda adฤฑm adฤฑm dรผลŸรผn.
Sonra dรผลŸรผnme adฤฑmlarฤฑnฤฑ รถzetle.
En son \\boxed{} iรงerisinde sorunun cevabฤฑnฤฑ ver.
"""

def answer_question(question: str):
    messages = [
        {'role': 'system', 'content': SYSTEM_PROMPT},
        {'role': 'user',   'content': question},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    ).to("cuda")

    streamer = TextStreamer(tokenizer)
    print("-" * 60)
    _ = model.generate(
        input_ids=inputs,
        streamer=streamer,
        max_new_tokens=2048,
        temperature=0.6,
        top_p=0.1,
    )

# Examples
answer_question("Tรผrkiye'nin baลŸkenti neredir?")
answer_question("6 ve 2'nin รงarpฤฑmฤฑ nedir?")

Sample Q&A

> **Q:** f(x) = 3xโด - 2xยณ + 5x - 7 fonksiyonunun x = 1 noktasฤฑndaki tรผrevini bulunuz.
>
> **A:** <think>

1. Adฤฑm: f(x) = 3xโด - 2xยณ + 5x - 7 fonksiyonunun tรผrevini bulmak iรงin, รถnce f(x)'nin tรผrevini bulmamฤฑz gerekmektedir.

2. Adฤฑm: Tรผrevi bulmak iรงin fonksiyonu parรงalara ayฤฑrarak analiz edeceฤŸiz.

f(x) = 3xโด - 2xยณ + 5x - 7

3. Adฤฑm: Tรผrevi bulmak iรงin her terimi ayrฤฑ ayrฤฑ tรผrevini buluyoruz.

- 3xโด: Tรผrevi 12xยณ'dรผr.
- -2xยณ: Tรผrevi -6xยฒ'dir.
- 5x: Tรผrevi 5'dir.
- -7: Tรผrevi 0'dฤฑr (รงรผnkรผ sabit bir sayฤฑdฤฑr).

4. Adฤฑm: Tรผrevi bulmak iรงin elde edilen tรผrevleri birleลŸtiriyoruz.

f'(x) = 12xยณ - 6xยฒ + 5

5. Adฤฑm: ลžimdi x = 1 yerine koyarak f'(x)'nin tรผrevini hesaplayacaฤŸฤฑz.

f'(1) = 12(1)ยณ - 6(1)ยฒ + 5

f'(1) = 12(1) - 6(1) + 5

f'(1) = 12 - 6 + 5

f'(1) = 11

6. Adฤฑm: x = 1 noktasฤฑndaki tรผrevi 11 olarak bulduk.

\boxed{11}

</think>

\boxed{11}

Citation

If you use this model, please cite:

@misc{farsak2025rn_tr_r2,
  title        = {RefinedNeuro/RN\_TR\_R2: A Turkish Reasoning Model},
  author       = {Mehmet Can Farsak and Serhat Atayeter and RefinedNeuro},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/RefinedNeuro/RN_TR_R2}},
  license      = {Apache-2.0}
}

License

Apache 2.0 โ€” the most permissive open-source license.


Authors

  • Authors: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro
Downloads last month
1,395
Safetensors
Model size
8.03B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RefinedNeuro/RN_TR_R2

Dataset used to train RefinedNeuro/RN_TR_R2

Space using RefinedNeuro/RN_TR_R2 1