Gemma RLAIF
Collection
9 items
•
Updated
This model is a fine-tuned version of lewtun/gemma-7b-sft-full-deita-10k-v0 on the argilla/dpo-mix-7k dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.2569 | 1.9 | 100 | 1.0775 | -6.7226 | -12.3440 | 0.7396 | 5.6214 | -482.4095 | -470.2108 | 99.2330 | 105.2899 |
Base model
google/gemma-7b