VolkanSimsir/LLaMA-3-8B-GRPO-math-tr

Model Details

Base Model: ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
Training Time: 3 hours with A40

Lora config:

lora_r: 32
lora_alpha:32

Model Description

This model has been trained using the GRPO (Guided Reward Preference Optimization) method with the Turkish GSM8K dataset to enhance its mathematical reasoning capabilities. However, as with similar large language models, its responses may contain errors or biases. Therefore, outputs should be carefully evaluated, especially in applications where accuracy is critical, and additional verification steps are recommended.

Example Usage

Install

!pip install -U transformers bitsandbytes

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "VolkanSimsir/LLaMA-3-8B-GRPO-math-tr"
tokenizer = AutoTokenizer.from_pretrained(model_id)

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype="float16",  
    bnb_4bit_quant_type="nf4",        
    bnb_4bit_use_double_quant=True
)


model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    device_map="auto"
)


question = """ Randy'nin çiftliğinde 60 mango ağacı var.
Ayrıca mango ağaçlarının yarısından 5 tane daha az Hindistan cevizi ağacı var.
Randy'nin çiftliğinde toplam kaç ağaç var?
"""

inputs = tokenizer(question, return_tensors="pt").to(0)
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation

Citation Information

@software{VolkanSimsir,
  author = {VolkanSimsir},
  title = {VolkanSimsir/LLaMA-3-8B-GRPO-math-tr},
  year = 2025,
  url = {https://huggingface.co/VolkanSimsir/LLaMA-3-8B-GRPO-math-tr}
}

Contact

-Volkan Simsir

VolkanSimsir
/

LLaMA-3-8B-GRPO-math-tr

Model Details

Model Description

Example Usage

Evaluation

Citation Information

Contact

Model tree for VolkanSimsir/LLaMA-3-8B-GRPO-math-tr

Space using VolkanSimsir/LLaMA-3-8B-GRPO-math-tr 1

Collection including VolkanSimsir/LLaMA-3-8B-GRPO-math-tr

Llama-3