Model Card for Model ID

Model Details

  • Base Model: ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1

  • Training Time: 3 hours with A40

Lora config:

  • lora_r: 32
  • lora_alpha:32

Model Description

​This model has been trained using the GRPO (Guided Reward Preference Optimization) method with the Turkish GSM8K dataset to enhance its mathematical reasoning capabilities. However, as with similar large language models, its responses may contain errors or biases. Therefore, outputs should be carefully evaluated, especially in applications where accuracy is critical, and additional verification steps are recommended.

Example Usage

Install

!pip install -U transformers bitsandbytes
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "VolkanSimsir/LLaMA-3-8B-GRPO-math-tr"
tokenizer = AutoTokenizer.from_pretrained(model_id)

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype="float16",  
    bnb_4bit_quant_type="nf4",        
    bnb_4bit_use_double_quant=True
)


model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    device_map="auto"
)


question = """ Randy'nin çiftliğinde 60 mango ağacı var.
Ayrıca mango ağaçlarının yarısından 5 tane daha az Hindistan cevizi ağacı var.
Randy'nin çiftliğinde toplam kaç ağaç var?
"""

inputs = tokenizer(question, return_tensors="pt").to(0)
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation

Citation Information

@software{VolkanSimsir,
  author = {VolkanSimsir},
  title = {VolkanSimsir/LLaMA-3-8B-GRPO-math-tr},
  year = 2025,
  url = {https://huggingface.co/VolkanSimsir/LLaMA-3-8B-GRPO-math-tr}
}

Contact

-Volkan Simsir

Downloads last month
50
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VolkanSimsir/LLaMA-3-8B-GRPO-math-tr

Space using VolkanSimsir/LLaMA-3-8B-GRPO-math-tr 1

Collection including VolkanSimsir/LLaMA-3-8B-GRPO-math-tr