File size: 2,649 Bytes
67ae990 0e99e25 e56dc6d 0e99e25 67ae990 0e99e25 e56dc6d 0e99e25 e56dc6d 0e99e25 e56dc6d 0e99e25 e56dc6d 890c019 e56dc6d 0e99e25 e56dc6d 0e99e25 e56dc6d 0e99e25 e56dc6d 0e99e25 e56dc6d 0e99e25 e56dc6d 0e99e25 e56dc6d 0e99e25 e56dc6d 0e99e25 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
base_model: google/gemma-2-9b-it
tags:
- alignment-handbook
- generated_from_trainer
datasets:
- princeton-nlp/gemma2-ultrafeedback-armorm
model-index:
- name: Sunshine279/gammaPO-gemma-2-9b-it
results: []
license: mit
---
## Model Details
### Model Description
We fine-tuned [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) on [princeton-nlp/gemma2-ultrafeedback-armorm](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) with the gamma-SimPO objective.
- Developed by: Jie Sun, Junkang Wu, Jiancan Wu, Zhibo Zhu, Xingyu Lu, Jun Zhou, Lintao Ma, Xiang Wang
- Model type: Causal Language Model
- License: gemma
- Finetuned from model: google/gemma-2-9b-it
### Model Sources
- Repository: https://github.com/sunjie279/gammaPO
- Paper: https://arxiv.org/pdf/2506.03690
### How to Get Started with the Model
```python
import torch
from transformers import pipeline
model_id = "Sunshine279/gammaPO-gemma-2-9b-it"
generator = pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda",
)
outputs = generator([{"role": "user", "content": "What's the difference between llamas and alpacas?"}],
do_sample=False,
eos_token_id=[generator.tokenizer.convert_tokens_to_ids("<end_of_turn>"), generator.tokenizer.eos_token_id],
max_new_tokens=200)
print(outputs[0]['generated_text'])
```
## Training detail
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-07
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 2.5863 | 0.8594 | 400 | 2.5622 | -18.1350 | -23.0307 | 0.7828 | 4.8958 | -2.3031 | -1.8135 | -15.8316 | -15.8114 |
### Framework versions
- Transformers 4.42.4
- Pytorch 2.3.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1 |