Model Card for Model ID

This is llama-3-8b ORPO finetuning for the italian language over a concatenation of two datasets:

The other two differences with diegobit/llama-3-8b-Instruct-bnb-4bit-ita-orpo are:

  • the starting model, not instruct, astronomer/Llama-3-8B-Special-Tokens-Adjusted instead of unsloth/llama-3-8b-Instruct-bnb-4bit
  • no loading in 4bits
  • given the increased need of GPU memory, the sequence max length used for finetuning is 4096

Model Details

Model Description

  • Developed by: Diego Giorgini
  • Funded by: AI Technologies SRL - www.aitechnologies.it
  • Language(s) (NLP): Italian
  • License: llama3
  • Finetuned from model: astronomer/Llama-3-8B-Special-Tokens-Adjusted

Training Details

Environment

unsloth: 2024.5
torch: 2.2

Training Data

  • mii-community/ultrafeedback-preferences-translated-ita is a selection of 55k rows of the ultrafeedback dataset, translated into italian with argotranslate.
  • efederici/alpaca-vs-alpaca-orpo-dpo: The Alpaca vs. Alpaca dataset is a curated blend of the Alpaca dataset and the Alpaca GPT-4 dataset, both available on HuggingFace Datasets. It uses the standard GPT dataset as the 'rejected' answer, steering the model towards the GPT-4 answer, which is considered as the 'chosen' one.

Training Procedure

Preprocessing [optional]

  • No preprocessing has been performed, except for formatting with the llama3 chat_template from unsloth:

    tokenizer = get_chat_template(tokenizer, chat_template = "llama-3")

Training Hyperparameters

  • Training regime: bf16

  • Model loading parameters:

max_seq_length = 4096
dtype = None
load_in_4bit = False
  • PEFT parameters:
r = 64  
lora_alpha = 64  
lora_dropout = 0  
bias = "none"  
random_state = 3407  
use_rslora = False  
loftq_config = None
  • ORPOConfig parameters:
max_length = 4096  
max_prompt_length = max_seq_length//2  
max_completion_length = max_seq_length//2  
warmup_ratio = 0.1  
weight_decay = 0.01  
per_device_train_batch_size = 1  
gradient_accumulation_steps = 16  
learning_rate=8e-6  
beta = 0.1  
optim = "paged_adamw_8bit"  
lr_scheduler_type = "linear"  
num_train_epochs = 1

Speeds, Sizes, Times

19h on an A100-40GB

Model Card Contact

[email protected]

Downloads last month
28
Safetensors
Model size
4.65B params
Tensor type
BF16
·
F32
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train diegobit/llama-3-8b-ita-4k-orpo-v3