music-generation

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 20
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
3.7238	0.9217	100	2.8460
2.4643	1.8387	200	1.8829
1.8339	2.7558	300	1.4234
1.5013	3.6728	400	1.2203
1.3125	4.5899	500	1.0966
1.1899	5.5069	600	1.0028
1.0982	6.4240	700	0.9353
1.0302	7.3410	800	0.8779
0.9766	8.2581	900	0.8276
0.9243	9.1751	1000	0.7757
0.8825	10.0922	1100	0.7345
0.845	11.0092	1200	0.7000
0.8083	11.9309	1300	0.6624
0.7784	12.8479	1400	0.6328
0.7502	13.7650	1500	0.6052
0.7281	14.6820	1600	0.5816
0.7072	15.5991	1700	0.5622
0.6903	16.5161	1800	0.5486
0.6796	17.4332	1900	0.5386
0.6705	18.3502	2000	0.5335
0.6646	19.2673	2100	0.5312