AdaDecode
Collection
9 items
•
Updated
This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the meng-lab/Llama-3.1-8B-Instruct-xsum dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Loss Layer 4 Head | Loss Layer 8 Head | Loss Layer 12 Head | Loss Layer 16 Head | Loss Layer 20 Head | Loss Layer 24 Head | Loss Layer 28 Head |
---|---|---|---|---|---|---|---|---|---|---|
9.417 | 9.5522 | 200 | 10.6034 | 2.1800 | 2.1484 | 1.8370 | 1.5560 | 0.8850 | 0.7908 | 1.1904 |
7.0666 | 19.1045 | 400 | 8.3242 | 2.0259 | 1.8363 | 1.7901 | 1.0876 | 0.8469 | 0.4822 | 0.2917 |
6.5999 | 28.6567 | 600 | 7.8689 | 1.9122 | 1.7362 | 1.7044 | 1.0472 | 0.6722 | 0.4620 | 0.3698 |
5.8586 | 38.2090 | 800 | 7.5812 | 2.0916 | 1.5734 | 1.6211 | 1.0056 | 0.6192 | 0.4660 | 0.2400 |
5.4725 | 47.7612 | 1000 | 7.0153 | 1.8457 | 1.5162 | 1.4691 | 0.9794 | 0.6236 | 0.3980 | 0.2260 |
5.3026 | 57.3134 | 1200 | 7.0204 | 1.9164 | 1.5058 | 1.5172 | 0.9522 | 0.5897 | 0.3804 | 0.2035 |
4.9989 | 66.8657 | 1400 | 6.7446 | 1.7458 | 1.5005 | 1.4430 | 0.9468 | 0.5843 | 0.3757 | 0.1990 |
4.9163 | 76.4179 | 1600 | 6.7228 | 1.7406 | 1.4972 | 1.4401 | 0.9436 | 0.5816 | 0.3734 | 0.1968 |
4.9194 | 85.9701 | 1800 | 6.7132 | 1.7381 | 1.4960 | 1.4385 | 0.9424 | 0.5807 | 0.3726 | 0.1959 |
4.9063 | 95.5224 | 2000 | 6.7117 | 1.7377 | 1.4957 | 1.4384 | 0.9421 | 0.5804 | 0.3724 | 0.1958 |
Base model
meta-llama/Llama-3.1-8B