This model is a fine-tuned version of Qwen/Qwen2-0.5B-Instruct. It has been trained using TRL with ORPO (Odds Ratio Preference Optimization).
Model Details
- Base Model: Qwen/Qwen2-0.5B-Instruct
- Training Method: ORPO (Odds Ratio Preference Optimization)
- Training Dataset: trl-lib/ultrafeedback_binarized
- Training Time: 1 hour 32 minutes
- Hardware: Single GPU
Training Metrics
- Training Loss: 6.386
- Train Samples per Second: 11.152
- Train Steps per Second: 0.697
- Final Epoch: 1.0
Quick Start (running on CPU)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import logging
logging.basicConfig(level=logging.INFO)
def load_model():
"""Load the model and tokenizer."""
logger.info("Loading model...")
model = AutoModelForCausalLM.from_pretrained("iben/Abuja-01", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("iben/Abuja-01", trust_remote_code=True)
return model, tokenizer
View the full code on GitHub Gist
Training Details
This model was trained using ORPO (Odds Ratio Preference Optimization), a method that doesn't require a reference model. The training configuration included:
- Learning Rate: 1e-5
- Batch Size: 4
- Gradient Accumulation Steps: 4
- Training Epochs: 1
Framework Versions
- TRL: 0.13.0
- Transformers: 4.48.1
- PyTorch: 2.5.1+cu121
- Datasets: 3.2.0
- Tokenizers: 0.21.0
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support