Llama-3
Collection
2 items
•
Updated
Base Model: ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
Training Time: 3 hours with A40
Lora config:
This model has been trained using the GRPO (Guided Reward Preference Optimization) method with the Turkish GSM8K dataset to enhance its mathematical reasoning capabilities. However, as with similar large language models, its responses may contain errors or biases. Therefore, outputs should be carefully evaluated, especially in applications where accuracy is critical, and additional verification steps are recommended.
Install
!pip install -U transformers bitsandbytes
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "VolkanSimsir/LLaMA-3-8B-GRPO-math-tr"
tokenizer = AutoTokenizer.from_pretrained(model_id)
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype="float16",
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config,
device_map="auto"
)
question = """ Randy'nin çiftliğinde 60 mango ağacı var.
Ayrıca mango ağaçlarının yarısından 5 tane daha az Hindistan cevizi ağacı var.
Randy'nin çiftliğinde toplam kaç ağaç var?
"""
inputs = tokenizer(question, return_tensors="pt").to(0)
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
@software{VolkanSimsir,
author = {VolkanSimsir},
title = {VolkanSimsir/LLaMA-3-8B-GRPO-math-tr},
year = 2025,
url = {https://huggingface.co/VolkanSimsir/LLaMA-3-8B-GRPO-math-tr}
}
Base model
meta-llama/Meta-Llama-3-8B