darshan8950's picture
darshan8950/mistarl7b_finetuned
4f096e3
metadata
license: apache-2.0
base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: openhermes-mistral-dpo-gptq
    results: []

openhermes-mistral-dpo-gptq

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1934
  • Rewards/chosen: 1.5646
  • Rewards/rejected: -0.8402
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.4048
  • Logps/rejected: -45.4271
  • Logps/chosen: -277.5632
  • Logits/rejected: -1.3185
  • Logits/chosen: -2.0273

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6717 0.01 10 0.5892 0.1318 -0.0930 0.75 0.2248 -37.9552 -291.8918 -1.2049 -2.0262
0.5298 0.02 20 0.3756 0.6965 -0.3112 1.0 1.0078 -40.1375 -286.2441 -1.2959 -2.0576
0.3325 0.03 30 0.2663 1.1580 -0.4907 1.0 1.6486 -41.9316 -281.6295 -1.3242 -2.0582
0.2179 0.04 40 0.2153 1.4040 -0.7133 1.0 2.1173 -44.1586 -279.1697 -1.3211 -2.0374
0.1683 0.06 50 0.1934 1.5646 -0.8402 1.0 2.4048 -45.4271 -277.5632 -1.3185 -2.0273

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.16.1
  • Tokenizers 0.15.0