myBit-Llama2-jp-127M-2B4TLike

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.9064

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0024
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 96
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 750
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
5.5863 0.0485 500 4.1196
3.6525 0.0971 1000 3.4908
3.3021 0.1456 1500 3.2915
3.1654 0.1942 2000 3.1933
3.0956 0.2427 2500 3.1410
3.049 0.2913 3000 3.1199
2.9894 0.3398 3500 3.1277
2.9502 0.3884 4000 3.0834
2.9436 0.4369 4500 3.0831
2.922 0.4855 5000 3.1344
2.9031 0.5340 5500 3.1086
2.8728 0.5826 6000 3.1801
2.8516 0.6311 6500 3.2534
2.822 0.6796 7000 3.3787
2.7995 0.7282 7500 3.5255
2.7703 0.7767 8000 3.8455
2.7304 0.8253 8500 4.3736
2.6836 0.8738 9000 5.1791
2.6269 0.9224 9500 6.1786
2.5171 0.9709 10000 7.9064

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
11
Safetensors
Model size
128M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support