qwen_unl_entropy / README.md
yakazimir's picture
End of training
14521f3 verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_unl_entropy
    results: []

qwen_unl_entropy

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6475
  • Rewards/chosen: -1.3030
  • Rewards/rejected: -1.4992
  • Rewards/accuracies: 0.5712
  • Rewards/margins: 0.1962
  • Logps/rejected: -1.4992
  • Logps/chosen: -1.3030
  • Logits/rejected: 0.0833
  • Logits/chosen: 0.0165

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.6549 0.2141 400 1.6939 -1.3375 -1.4631 0.5564 0.1256 -1.4631 -1.3375 0.3664 0.2799
1.6692 0.4282 800 1.6718 -1.3151 -1.4532 0.5579 0.1381 -1.4532 -1.3151 0.3708 0.2889
1.6206 0.6422 1200 1.6640 -1.3083 -1.4522 0.5564 0.1438 -1.4522 -1.3083 0.3523 0.2714
1.6566 0.8563 1600 1.6600 -1.3096 -1.4585 0.5593 0.1488 -1.4585 -1.3096 0.3578 0.2764
1.7104 1.0704 2000 1.6553 -1.3006 -1.4569 0.5660 0.1563 -1.4569 -1.3006 0.2528 0.1781
1.6123 1.2845 2400 1.6521 -1.3029 -1.4743 0.5668 0.1713 -1.4743 -1.3029 0.1650 0.0956
1.6688 1.4986 2800 1.6486 -1.3000 -1.4729 0.5690 0.1729 -1.4729 -1.3000 0.1751 0.1050
1.6012 1.7127 3200 1.6495 -1.3009 -1.4722 0.5668 0.1713 -1.4722 -1.3009 0.2139 0.1401
1.5646 1.9267 3600 1.6478 -1.2987 -1.4778 0.5705 0.1791 -1.4778 -1.2987 0.1771 0.1052
1.5351 2.1408 4000 1.6470 -1.3020 -1.4952 0.5712 0.1932 -1.4952 -1.3020 0.1238 0.0547
1.5307 2.3549 4400 1.6469 -1.3054 -1.5043 0.5712 0.1988 -1.5043 -1.3054 0.0587 -0.0064
1.5433 2.5690 4800 1.6472 -1.3037 -1.5017 0.5727 0.1980 -1.5017 -1.3037 0.1609 0.0880
1.5671 2.7831 5200 1.6473 -1.3030 -1.4994 0.5720 0.1964 -1.4994 -1.3030 0.0927 0.0252
1.5482 2.9972 5600 1.6475 -1.3030 -1.4992 0.5712 0.1962 -1.4992 -1.3030 0.0833 0.0165

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1