gpt2_ACoT

This model is a fine-tuned version of facebook/opt-125m on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 10
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
21.1265	1.0	276	2.5543
18.2347	2.0	552	2.2861
16.8533	3.0	828	2.1435
15.6224	4.0	1104	2.0605
14.7096	5.0	1380	2.0051
14.1231	6.0	1656	1.9733
13.5673	7.0	1932	1.9566
13.1727	8.0	2208	1.9493
13.0597	9.0	2484	1.9492
12.9195	10.0	2760	1.9497