yspkm's picture
Training completed!
4a83891 verified
metadata
license: gemma
base_model: google/gemma-2-9b-it
tags:
  - generated_from_trainer
model-index:
  - name: gemma-2-9b-it-lora-commonsense
    results: []

Visualize in Weights & Biases

gemma-2-9b-it-lora-commonsense

This model is a fine-tuned version of google/gemma-2-9b-it on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8229

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
1.0188 0.1503 200 0.9641
0.9971 0.3007 400 0.9404
0.9827 0.4510 600 0.9288
0.9748 0.6013 800 0.9194
0.971 0.7516 1000 0.9055
0.957 0.9020 1200 0.8970
0.9005 1.0523 1400 0.8874
0.8876 1.2026 1600 0.8748
0.8782 1.3529 1800 0.8640
0.8896 1.5033 2000 0.8489
0.8814 1.6536 2200 0.8417
0.8666 1.8039 2400 0.8325
0.8674 1.9542 2600 0.8307
0.8116 2.1046 2800 0.8366
0.8032 2.2549 3000 0.8291
0.8103 2.4052 3200 0.8265
0.8165 2.5556 3400 0.8245
0.8085 2.7059 3600 0.8242
0.8121 2.8562 3800 0.8229

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1