Qwen2.5-3B-UFO
This model is based on Qwen2.5-3B-Instruct and trained with PPO (Proximal Policy Optimization) on the MetaMathQA dataset for mathematical reasoning.
Github: https://github.com/lichengliu03/unary-feedback
Model Info
- Base model: Qwen/Qwen2.5-3B-Instruct
- Training method: PPO (full-parameter fine-tuning, not LoRA)
- Training data: MATH_MetaMathQA
- Training steps: 200 steps
- Framework: VERL
- Tensor parallel: 2x GPU distributed training
- Model size: ~6GB
Training Config
- Micro Batch Size: 1 per GPU
- PPO Mini Batch Size: 8
- Actor Learning Rate: auto
- Critic Learning Rate: auto
- KL Penalty: 0.001
- Clip Ratio: 0.2-0.28
- Temperature: 1.0 (train), 0.5 (eval)
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("LichengLiu03/qwen2.5-3b-ppo-metamath-full")
model = AutoModelForCausalLM.from_pretrained(
"LichengLiu03/qwen2.5-3b-ppo-metamath-full",
torch_dtype=torch.float16,
device_map="auto"
)
# Example math problem
prompt = "Solve this math problem: If a circle has a radius of 5cm, what is its area?"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate answer
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Features
This model is optimized for mathematical reasoning with PPO, and compared to the base model, it improves:
- โ Math problem understanding
- โ Logical reasoning accuracy
- โ Clarity of solution steps
- โ Calculation accuracy
Technical Details
- Tensor parallel training: 2 GPUs, distributed
- Memory optimization: gradient checkpointing and mixed precision
- Reward modeling: based on MetaMathQA correctness and reasoning quality
- Policy optimization: PPO for stable training
Limitations
- Mainly optimized for mathematical reasoning
- May not perform as well on general tasks
- Recommended for math, logic, and reasoning tasks
License
This model is licensed under Apache 2.0.
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support