myBit-Llama2-jp-127M-2B4TLike

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.7946

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0024
  • train_batch_size: 12
  • eval_batch_size: 12
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 96
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 750
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
5.5592 0.0555 500 4.6971
3.9123 0.1111 1000 4.2204
3.5329 0.1666 1500 3.8957
3.3627 0.2222 2000 3.7229
3.2594 0.2777 2500 3.5721
3.1944 0.3333 3000 3.3995
3.1391 0.3888 3500 3.2851
3.1046 0.4443 4000 3.1923
3.0648 0.4999 4500 3.1243
3.0325 0.5554 5000 3.0711
2.9877 0.6110 5500 3.0184
2.9752 0.6665 6000 2.9568
2.9901 0.7221 6500 2.9294
2.9769 0.7776 7000 2.9101
2.9432 0.8331 7500 2.8855
2.9199 0.8887 8000 2.8612
2.8869 0.9442 8500 2.8358
2.8574 0.9998 9000 2.7946

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
1
Safetensors
Model size
128M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support