Mathmate-7B-DELLA-ORPO

Mathmate-7B-DELLA-ORPO is a finetuned version of Haleshot/Mathmate-7B-DELLA using the ORPO (Odds Ratio Preference Optimization) technique. This model has been specifically tuned to improve its performance on mathematical reasoning tasks based on human preferences.

Model Details

Base Model: Haleshot/Mathmate-7B-DELLA
Finetuning Method: ORPO (Odds Ratio Preference Optimization)
Training Dataset: argilla/distilabel-math-preference-dpo

Finetuning

This model was finetuned using the ORPO technique, which is an extension of DPO (Direct Preference Optimization) that can work with ranked preferences instead of just binary ones. The process was adapted from the tutorial "Fine-tune Llama 3 with ORPO" by Maxime Labonne, with some custom modifications to the code.

Dataset

The model was finetuned on the argilla/distilabel-math-preference-dpo dataset. This dataset contains mathematical problems along with multiple solution attempts, ranked by human preference. This allowed the model to learn from human judgments about what constitutes a good mathematical explanation or solution.

Usage

Here's an example of how to use the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "Haleshot/Mathmate-7B-DELLA-ORPO"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

def generate_response(prompt, max_length=512):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_length=max_length, num_return_sequences=1, do_sample=True, temperature=0.7)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
prompt = "Solve the following equation: 2x + 5 = 13"
response = generate_response(prompt)
print(response)

Limitations

While this model has been finetuned on mathematical problems, it may still make mistakes or provide incorrect solutions. Always verify the model's output, especially for critical applications or complex mathematical problems.

References

Maxime Labonne. (2024). Fine-tune Llama 3 with ORPO
Argilla. distilabel-math-preference-dpo dataset
Haleshot. Mathmate-7B-DELLA

Citation

If you use this model in your research, please cite:

@misc{mathmate-7b-della-orpo,
  author = {Haleshot},
  title = {Mathmate-7B-DELLA-ORPO},
  year = {2024},
  publisher = {HuggingFace},
  journal = {HuggingFace Hub},
  howpublished = {\url{https://huggingface.co/Haleshot/Mathmate-7B-DELLA-ORPO}},
}

Acknowledgements

Special thanks to Maxime Labonne for the ORPO finetuning tutorial, and to the Argilla team for providing the dataset used in this finetuning process.

Downloads last month: 2

Safetensors

Model size

7B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Haleshot/Mathmate-7B-DELLA-ORPO

Base model

Haleshot/Mathmate-7B-DELLA

Finetuned

(1)

this model

Finetunes

2 models

Haleshot
/

Mathmate-7B-DELLA-ORPO