gemma-3-4b-it-arkey_emails-qlora

This model is a fine-tuned version of google/gemma-3-4b-it on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 64
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 5 3.9168
No log 2.0 10 3.5567
No log 3.0 15 3.2439
No log 4.0 20 3.0028
No log 5.0 25 2.8257
No log 6.0 30 2.6974
No log 7.0 35 2.6045
No log 8.0 40 2.5319
No log 9.0 45 2.4673
No log 10.0 50 2.4065
No log 11.0 55 2.3477
No log 12.0 60 2.2918
No log 13.0 65 2.2379
No log 14.0 70 2.1864
No log 15.0 75 2.1368
No log 16.0 80 2.0882
No log 17.0 85 2.0429
No log 18.0 90 1.9999
No log 19.0 95 1.9580
No log 20.0 100 1.9175
No log 21.0 105 1.8778
No log 22.0 110 1.8372
No log 23.0 115 1.7964
No log 24.0 120 1.7642
No log 25.0 125 1.7354
No log 26.0 130 1.7083
No log 27.0 135 1.6831
No log 28.0 140 1.6598
No log 29.0 145 1.6386
No log 30.0 150 1.6197
No log 31.0 155 1.6029
No log 32.0 160 1.5880
No log 33.0 165 1.5750
No log 34.0 170 1.5638
No log 35.0 175 1.5543
No log 36.0 180 1.5466
No log 37.0 185 1.5407
No log 38.0 190 1.5365
No log 39.0 195 1.5339
No log 40.0 200 1.5329

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.1.2
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
80
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for xaviergillard/gemma-3-4b-it-arkey_emails-qlora

Adapter
(25)
this model