File size: 2,723 Bytes
b3642b2 a52f733 b3642b2 a52f733 b3642b2 a52f733 b3642b2 a52f733 b3642b2 a52f733 b3642b2 a52f733 b3642b2 a52f733 b3642b2 a52f733 b3642b2 a52f733 b3642b2 a52f733 b3642b2 bfab24b b3642b2 a52f733 b3642b2 a52f733 b3642b2 a52f733 b3642b2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
base_model: meta-llama/Meta-Llama-3-8B-Instruct
tags:
- alignment-handbook
- generated_from_trainer
datasets:
- princeton-nlp/llama3-ultrafeedback-armorm
model-index:
- name: Sunshine279/gammaPO-llama-3-8b-instruct
results: []
license: mit
---
## Model Details
### Model Description
We fine-tuned [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on [princeton-nlp/llama3-ultrafeedback-armorm](https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback-armorm) with the gamma-SimPO objective.
- Developed by: Jie Sun, Junkang Wu, Jiancan Wu, Zhibo Zhu, Xingyu Lu, Jun Zhou, Lintao Ma, Xiang Wang
- Model type: Causal Language Model
- License: gemma
- Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct
### Model Sources
- Repository: https://github.com/sunjie279/gammaPO
- Paper: https://arxiv.org/pdf/2506.03690
### How to Get Started with the Model
```python
import torch
from transformers import pipeline
model_id = "Sunshine279/gammaPO-llama-3-8b-instruct"
generator = pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda",
)
outputs = generator([{"role": "user", "content": "What's the difference between llamas and alpacas?"}],
do_sample=False,
eos_token_id=[generator.tokenizer.convert_tokens_to_ids("<end_of_turn>"), generator.tokenizer.eos_token_id],
max_new_tokens=200)
print(outputs[0]['generated_text'])
```
## Training details
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 1.1544 | 0.8550 | 400 | 1.1389 | -20.8453 | -29.4063 | 0.8679 | 8.5610 | -2.9406 | -2.0845 | -1.7197 | -1.7101 |
### Framework versions
- Transformers 4.42.4
- Pytorch 2.3.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
|