Llama-3.1-8B-Instruct-sft-5e-3-epoch-100-xsum

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the meng-lab/Llama-3.1-8B-Instruct-xsum dataset. It achieves the following results on the evaluation set:

Loss: 6.7117
Loss Layer 4 Head: 1.7377
Loss Layer 8 Head: 1.4957
Loss Layer 12 Head: 1.4384
Loss Layer 16 Head: 0.9421
Loss Layer 20 Head: 0.5804
Loss Layer 24 Head: 0.3724
Loss Layer 28 Head: 0.1958

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 1
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 32
total_train_batch_size: 128
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Loss Layer 4 Head	Loss Layer 8 Head	Loss Layer 12 Head	Loss Layer 16 Head	Loss Layer 20 Head	Loss Layer 24 Head	Loss Layer 28 Head
9.417	9.5522	200	10.6034	2.1800	2.1484	1.8370	1.5560	0.8850	0.7908	1.1904
7.0666	19.1045	400	8.3242	2.0259	1.8363	1.7901	1.0876	0.8469	0.4822	0.2917
6.5999	28.6567	600	7.8689	1.9122	1.7362	1.7044	1.0472	0.6722	0.4620	0.3698
5.8586	38.2090	800	7.5812	2.0916	1.5734	1.6211	1.0056	0.6192	0.4660	0.2400
5.4725	47.7612	1000	7.0153	1.8457	1.5162	1.4691	0.9794	0.6236	0.3980	0.2260
5.3026	57.3134	1200	7.0204	1.9164	1.5058	1.5172	0.9522	0.5897	0.3804	0.2035
4.9989	66.8657	1400	6.7446	1.7458	1.5005	1.4430	0.9468	0.5843	0.3757	0.1990
4.9163	76.4179	1600	6.7228	1.7406	1.4972	1.4401	0.9436	0.5816	0.3734	0.1968
4.9194	85.9701	1800	6.7132	1.7381	1.4960	1.4385	0.9424	0.5807	0.3726	0.1959
4.9063	95.5224	2000	6.7117	1.7377	1.4957	1.4384	0.9421	0.5804	0.3724	0.1958

Framework versions

Transformers 4.43.2
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.19.1

meng-lab
/

llama_3.1_8b_instruct_paradec_xsum

Llama-3.1-8B-Instruct-sft-5e-3-epoch-100-xsum

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for meng-lab/llama_3.1_8b_instruct_paradec_xsum

Dataset used to train meng-lab/llama_3.1_8b_instruct_paradec_xsum

Collection including meng-lab/llama_3.1_8b_instruct_paradec_xsum

AdaDecode

Evaluation results