zephyr-7b-ultra-p-0.04

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5016
Rewards/chosen: -0.5725
Rewards/rejected: -2.2075
Rewards/accuracies: 0.7188
Rewards/margins: 1.6350
Logps/rejected: -269.5538
Logps/chosen: -235.7328
Logits/rejected: -2.5465
Logits/chosen: -2.6177

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 1
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.569	0.1030	100	0.5464	-0.2783	-0.9172	0.6953	0.6389	-256.6510	-232.7902	-2.5836	-2.6494
0.5376	0.2060	200	0.5246	-0.5656	-1.4359	0.7031	0.8703	-261.8382	-235.6634	-2.4669	-2.5324
0.53	0.3090	300	0.5364	-0.6183	-1.3601	0.6875	0.7418	-261.0804	-236.1908	-2.4854	-2.5577
0.5092	0.4120	400	0.5188	-0.6586	-2.1894	0.7266	1.5308	-269.3731	-236.5937	-2.5850	-2.6537
0.5039	0.5150	500	0.5133	-0.4455	-1.8944	0.7109	1.4490	-266.4232	-234.4622	-2.5959	-2.6648
0.5018	0.6180	600	0.5124	-0.3893	-1.9586	0.7266	1.5693	-267.0651	-233.9008	-2.5504	-2.6196
0.5162	0.7210	700	0.5112	-0.4435	-1.9493	0.7188	1.5058	-266.9722	-234.4430	-2.5634	-2.6316
0.5264	0.8240	800	0.5078	-0.5335	-2.2073	0.7344	1.6738	-269.5521	-235.3425	-2.5636	-2.6340
0.4775	0.9270	900	0.5023	-0.5305	-2.1520	0.7266	1.6216	-268.9995	-235.3122	-2.5423	-2.6135

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.20.0

tongliuphysics
/

zephyr-7b-ultra-p-0.04

zephyr-7b-ultra-p-0.04

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tongliuphysics/zephyr-7b-ultra-p-0.04

Evaluation results