Built with Axolotl

bcf502a9-30b0-411f-911e-b137fff629d3

This model is a fine-tuned version of EleutherAI/pythia-410m-deduped on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 4.2967

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.000214
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 140
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss
No log 0.0002 1 6.0451
12.0004 0.0082 50 5.0641
11.026 0.0164 100 4.6035
11.1297 0.0246 150 4.6697
10.8286 0.0328 200 4.5791
11.4491 0.0410 250 4.4946
11.2258 0.0492 300 4.3788
10.5566 0.0573 350 4.4341
10.7347 0.0655 400 4.3284
10.5176 0.0737 450 4.3048
10.7225 0.0819 500 4.2967

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lesso14/bcf502a9-30b0-411f-911e-b137fff629d3

Adapter
(232)
this model